Mic*_*hal 10 python performance sum
我注意到,sum当汇总1 000 000个整数列表时,Python的内置函数比for循环快大约3倍:
import timeit
def sum1():
s = 0
for i in range(1000000):
s += i
return s
def sum2():
return sum(range(1000000))
print 'For Loop Sum:', timeit.timeit(sum1, number=10)
print 'Built-in Sum:', timeit.timeit(sum2, number=10)
# Prints:
# For Loop Sum: 0.751425027847
# Built-in Sum: 0.266746997833
Run Code Online (Sandbox Code Playgroud)
这是为什么?如何sum实施?
Mar*_*ers 18
速度差实际上大于3倍,但是你通过首先创建一个包含100万个整数的巨大内存列表来减慢任一版本的速度.将那些时间试验分开:
>>> import timeit
>>> def sum1(lst):
... s = 0
... for i in lst:
... s += i
... return s
...
>>> def sum2(lst):
... return sum(lst)
...
>>> values = range(1000000)
>>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
3.457869052886963
>>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
0.6696369647979736
Run Code Online (Sandbox Code Playgroud)
现在速度差异已经上升到5倍以上.
甲for环所解释的Python字节码执行.sum()完全用C代码循环.解释的字节码和C代码之间的速度差异很大.
此外,C代码确保不创建新的Python对象,如果它可以保持C类型的总和; 这适用于int和float结果.
反汇编的Python版本执行此操作:
>>> import dis
>>> def sum1():
... s = 0
... for i in range(1000000):
... s += i
... return s
...
>>> dis.dis(sum1)
2 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (s)
3 6 SETUP_LOOP 30 (to 39)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 2 (1000000)
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 16 (to 38)
22 STORE_FAST 1 (i)
4 25 LOAD_FAST 0 (s)
28 LOAD_FAST 1 (i)
31 INPLACE_ADD
32 STORE_FAST 0 (s)
35 JUMP_ABSOLUTE 19
>> 38 POP_BLOCK
5 >> 39 LOAD_FAST 0 (s)
42 RETURN_VALUE
Run Code Online (Sandbox Code Playgroud)
除了解释器循环比C慢,INPLACE_ADD它将创建一个新的整数对象(过去255,CPython将小int对象缓存为单例).
您可以在Python mercurial代码存储库中看到C实现,但它在注释中明确指出:
/* Fast addition by keeping temporary sums in C instead of new Python objects.
Assumes all inputs are the same type. If the assumption fails, default
to the more general routine.
*/
Run Code Online (Sandbox Code Playgroud)
As dwanderson suggested, Numpy is one alternative. It is, indeed, if you want to do some maths. See this benchmark:
import numpy as np
r = range(1000000) # 12.5 ms
s = sum(r) # 7.9 ms
ar = np.arange(1000000) # 0.5 ms
as = np.sum(ar) # 0.6 ms
Run Code Online (Sandbox Code Playgroud)
So both creating the list and summing it is much faster with numpy. This is mostly because the numpy.array is designed for this and is much more efficient than the list.
However, if we have a python list, then numpy is very slow, as its conversion from a list into a numpy.array is sluggish:
r = range(1000000)
ar = np.array(r) # 102 ms
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
11508 次 |
| 最近记录: |