为什么更简单的循环速度更慢?

Kel*_*ndy 62 python performance cpython python-internals python-3.11

调用 with n = 10**8,对我来说,简单循环始终比复杂循环慢得多,我不明白为什么:

def simple(n):
    while n:
        n -= 1

def complex(n):
    while True:
        if not n:
            break
        n -= 1
Run Code Online (Sandbox Code Playgroud)

有时以秒为单位:

def simple(n):
    while n:
        n -= 1

def complex(n):
    while True:
        if not n:
            break
        n -= 1
Run Code Online (Sandbox Code Playgroud)

这是字节码的循环部分,如下所示dis.dis(simple)

  6     >>    6 LOAD_FAST                0 (n)
              8 LOAD_CONST               1 (1)
             10 BINARY_OP               23 (-=)
             14 STORE_FAST               0 (n)

  5          16 LOAD_FAST                0 (n)
             18 POP_JUMP_BACKWARD_IF_TRUE     7 (to 6)
Run Code Online (Sandbox Code Playgroud)

对于complex

 10     >>    4 LOAD_FAST                0 (n)
              6 POP_JUMP_FORWARD_IF_TRUE     2 (to 12)

 11           8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

 12     >>   12 LOAD_FAST                0 (n)
             14 LOAD_CONST               2 (1)
             16 BINARY_OP               23 (-=)
             20 STORE_FAST               0 (n)

  9          22 JUMP_BACKWARD           10 (to 4)
Run Code Online (Sandbox Code Playgroud)

所以看起来复杂的每次迭代都会做更多的工作(两次跳转而不是一次)。那为什么会更快呢?

似乎是Python 3.11的现象,请参阅评论。

基准脚本(在线尝试!):

simple 4.340795516967773
complex 3.6490490436553955
simple 4.374553918838501
complex 3.639145851135254
simple 4.336690425872803
complex 3.624480724334717
Python: 3.11.4 (main, Sep  9 2023, 15:09:21) [GCC 13.2.1 20230801]
Run Code Online (Sandbox Code Playgroud)

Mec*_*Pig 64

我检查了字节码(python 3.11.6)的源代码,发现在反编译的字节码中,似乎只会JUMP_BACKWARD执行一个warmup函数,当执行足够多的次数时,它将触发python 3.11中的专门化:

\n
PyObject* _Py_HOT_FUNCTION\n_PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int throwflag)\n{\n    /* ... */\n        TARGET(JUMP_BACKWARD) {\n            _PyCode_Warmup(frame->f_code);\n            JUMP_TO_INSTRUCTION(JUMP_BACKWARD_QUICK);\n        }\n    /* ... */\n}\n
Run Code Online (Sandbox Code Playgroud)\n
static inline void\n_PyCode_Warmup(PyCodeObject *code)\n{\n    if (code->co_warmup != 0) {\n        code->co_warmup++;\n        if (code->co_warmup == 0) {\n            _PyCode_Quicken(code);\n        }\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n

所有字节码中,只有JUMP_BACKWARDRESUMEwill 调用_PyCode_Warmup().

\n

专业化似乎可以加快使用多个字节码的速度,从而显着提高速度:

\n
void\n_PyCode_Quicken(PyCodeObject *code)\n{\n    /* ... */\n            switch (opcode) {\n                case EXTENDED_ARG:  /* ... */\n                case JUMP_BACKWARD: /* ... */\n                case RESUME:        /* ... */\n                case LOAD_FAST:     /* ... */\n                case STORE_FAST:    /* ... */\n                case LOAD_CONST:    /* ... */\n            }\n    /* ... */\n}\n
Run Code Online (Sandbox Code Playgroud)\n

执行一次后,while的字节码complex改变了,而simple没有:

\n
In [_]: %timeit -n 1 -r 1 complex(10 ** 8)\n2.7 s \xc2\xb1 0 ns per loop (mean \xc2\xb1 std. dev. of 1 run, 1 loop each)\n\nIn [_]: dis(complex, adaptive=True)\n  5           0 RESUME_QUICK             0\n\n  6           2 NOP\n\n  7           4 LOAD_FAST                0 (n)\n              6 POP_JUMP_FORWARD_IF_TRUE     2 (to 12)\n\n  8           8 LOAD_CONST               0 (None)\n             10 RETURN_VALUE\n\n  9     >>   12 LOAD_FAST__LOAD_CONST     0 (n)\n             14 LOAD_CONST               2 (1)\n             16 BINARY_OP_SUBTRACT_INT    23 (-=)\n             20 STORE_FAST               0 (n)\n\n  6          22 JUMP_BACKWARD_QUICK     10 (to 4)\n\n
Run Code Online (Sandbox Code Playgroud)\n
In [_]: %timeit -n 1 -r 1 simple(10 ** 8)\n4.78 s \xc2\xb1 0 ns per loop (mean \xc2\xb1 std. dev. of 1 run, 1 loop each)\n\nIn [_]: dis(simple, adaptive=True)\n  1           0 RESUME                   0\n\n  2           2 LOAD_FAST                0 (n)\n              4 POP_JUMP_FORWARD_IF_FALSE     9 (to 24)\n\n  3     >>    6 LOAD_FAST                0 (n)\n              8 LOAD_CONST               1 (1)\n             10 BINARY_OP               23 (-=)\n             14 STORE_FAST               0 (n)\n\n  2          16 LOAD_FAST                0 (n)\n             18 POP_JUMP_BACKWARD_IF_TRUE     7 (to 6)\n             20 LOAD_CONST               0 (None)\n             22 RETURN_VALUE\n        >>   24 LOAD_CONST               0 (None)\n             26 RETURN_VALUE\n\n
Run Code Online (Sandbox Code Playgroud)\n

  • @KellyBundy 这两个函数的字节码在 python 3.12 中都发生了变化,它们都包含“JUMP_BACKWARD”并在运行后触发专门化。在 3.12 中,“simple”比“complex”具有更好的性能。 (9认同)