Python内置和函数与循环性能

Mic*_*hal 10 python performance sum

我注意到,sum当汇总1 000 000个整数列表时,Python的内置函数比for循环快大约3倍:

import timeit

def sum1():
    s = 0
    for i in range(1000000):
        s += i
    return s

def sum2():
    return sum(range(1000000))

print 'For Loop Sum:', timeit.timeit(sum1, number=10)
print 'Built-in Sum:', timeit.timeit(sum2, number=10)

# Prints:
# For Loop Sum: 0.751425027847
# Built-in Sum: 0.266746997833
Run Code Online (Sandbox Code Playgroud)

这是为什么?如何sum实施?

Mar*_*ers 18

速度差实际上大于3倍,但是你通过首先创建一个包含100万个整数的巨大内存列表来减慢任一版本的速度.将那些时间试验分开:

>>> import timeit
>>> def sum1(lst):
...     s = 0
...     for i in lst:
...         s += i
...     return s
... 
>>> def sum2(lst):
...     return sum(lst)
... 
>>> values = range(1000000)
>>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
3.457869052886963
>>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
0.6696369647979736
Run Code Online (Sandbox Code Playgroud)

现在速度差异已经上升到5倍以上.

for环所解释的Python字节码执行.sum()完全用C代码循环.解释的字节码和C代码之间的速度差异很大.

此外,C代码确保不创建新的Python对象,如果它可以保持C类型的总和; 这适用于intfloat结果.

反汇编的Python版本执行此操作:

>>> import dis
>>> def sum1():
...     s = 0
...     for i in range(1000000):
...         s += i
...     return s
... 
>>> dis.dis(sum1)
  2           0 LOAD_CONST               1 (0)
              3 STORE_FAST               0 (s)

  3           6 SETUP_LOOP              30 (to 39)
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               2 (1000000)
             15 CALL_FUNCTION            1
             18 GET_ITER            
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_FAST               1 (i)

  4          25 LOAD_FAST                0 (s)
             28 LOAD_FAST                1 (i)
             31 INPLACE_ADD         
             32 STORE_FAST               0 (s)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK           

  5     >>   39 LOAD_FAST                0 (s)
             42 RETURN_VALUE        
Run Code Online (Sandbox Code Playgroud)

除了解释器循环比C慢,INPLACE_ADD它将创建一个新的整数对象(过去255,CPython将小int对象缓存为单例).

您可以在Python mercurial代码存储库中看到C实现,但它在注释中明确指出:

/* Fast addition by keeping temporary sums in C instead of new Python objects.
   Assumes all inputs are the same type.  If the assumption fails, default
   to the more general routine.
*/
Run Code Online (Sandbox Code Playgroud)


DrV*_*DrV 5

As dwanderson suggested, Numpy is one alternative. It is, indeed, if you want to do some maths. See this benchmark:

import numpy as np

r = range(1000000)       # 12.5 ms
s = sum(r)               # 7.9 ms

ar = np.arange(1000000)  # 0.5 ms
as = np.sum(ar)          # 0.6 ms
Run Code Online (Sandbox Code Playgroud)

So both creating the list and summing it is much faster with numpy. This is mostly because the numpy.array is designed for this and is much more efficient than the list.

However, if we have a python list, then numpy is very slow, as its conversion from a list into a numpy.array is sluggish:

r = range(1000000)
ar = np.array(r)         # 102 ms
Run Code Online (Sandbox Code Playgroud)