ppo*_*one 3 python performance python-2.7
以下2个代码片段(A和B)都返回2个字典的交集.
以下两个代码片段都应该在O(n)中运行并输出相同的结果.然而,pythonic的代码片段B似乎运行得更快. 这些代码片段来自Python Cookbook.
代码片段A:
def simpleway():
result = []
for k in to500.keys():
if evens.has_key(k):
result.append(k)
return result
Run Code Online (Sandbox Code Playgroud)
代码片段B:
def pythonicsimpleway():
return [k for k in to500 if k in evens]
Run Code Online (Sandbox Code Playgroud)
一些设置逻辑和用于计时两个函数的函数=>
to500 = {}
for i in range(500): to500[i] = 1
evens = {}
for i in range(0,1000,2): evens[i] = 1
def timeo(fun, n=1000):
def void(): pass
start = time.clock()
for i in range(n): void()
stend = time.clock()
overhead = stend - start
start = time.clock()
for i in range(n): fun()
stend = time.clock()
thetime = stend - start
return fun.__name__, thetime - overhead
Run Code Online (Sandbox Code Playgroud)
Python 2.7.5使用2.3 Ghz Ivy Bridge四核处理器(OS X 10.8.4)
我明白了
>>> timeo(simpleway)
('simpleway', 0.08928500000000028)
>>> timeo(pythonicsimpleway)
('pythonicsimpleway', 0.04579400000000078)
Run Code Online (Sandbox Code Playgroud)
他们并没有做同样的事情; 第一个做了很多工作:
.has_key()和.append()方法,然后调用它们.这需要每次调用都有一个堆栈推送和弹出.列表推导在一次操作中创建python列表对象之前收集C数组中的所有生成元素.
这两个函数确实产生相同的结果,一个是不必要的慢.
如果您想深入了解详细信息,请使用该dis模块查看字节码反汇编:
>>> dis.dis(simpleway)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (result)
3 6 SETUP_LOOP 51 (to 60)
9 LOAD_GLOBAL 0 (to500)
12 LOAD_ATTR 1 (keys)
15 CALL_FUNCTION 0
18 GET_ITER
>> 19 FOR_ITER 37 (to 59)
22 STORE_FAST 1 (k)
4 25 LOAD_GLOBAL 2 (evens)
28 LOAD_ATTR 3 (has_key)
31 LOAD_FAST 1 (k)
34 CALL_FUNCTION 1
37 POP_JUMP_IF_FALSE 19
5 40 LOAD_FAST 0 (result)
43 LOAD_ATTR 4 (append)
46 LOAD_FAST 1 (k)
49 CALL_FUNCTION 1
52 POP_TOP
53 JUMP_ABSOLUTE 19
56 JUMP_ABSOLUTE 19
>> 59 POP_BLOCK
6 >> 60 LOAD_FAST 0 (result)
63 RETURN_VALUE
>>> dis.dis(pythonicsimpleway)
2 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (to500)
6 GET_ITER
>> 7 FOR_ITER 24 (to 34)
10 STORE_FAST 0 (k)
13 LOAD_FAST 0 (k)
16 LOAD_GLOBAL 1 (evens)
19 COMPARE_OP 6 (in)
22 POP_JUMP_IF_FALSE 7
25 LOAD_FAST 0 (k)
28 LIST_APPEND 2
31 JUMP_ABSOLUTE 7
>> 34 RETURN_VALUE
Run Code Online (Sandbox Code Playgroud)
对于显式for循环,每次迭代的字节码指令的数量要大得多.该simpleway循环必须执行每次迭代11个指令(如果.has_key()为真),对7的名单理解,其中额外的指令大多覆盖LOAD_ATTR和CALL_FUNCTION.
如果要更快地创建第一个版本,请替换.has_key()为in测试,直接在字典上循环并将.append()属性缓存在局部变量中:
def simpleway_optimized():
result = []
append = result.append
for k in to500:
if k in evens:
append(k)
return result
Run Code Online (Sandbox Code Playgroud)
然后使用该timeit模块正确测试时序(重复运行,为您的平台最精确的计时器):
>>> timeit('f()', 'from __main__ import evens, to500, simpleway as f', number=10000)
1.1673870086669922
>>> timeit('f()', 'from __main__ import evens, to500, pythonicsimpleway as f', number=10000)
0.5441269874572754
>>> timeit('f()', 'from __main__ import evens, to500, simpleway_optimized as f', number=10000)
0.6551430225372314
Run Code Online (Sandbox Code Playgroud)
这里simpleway_optimized接近列表理解方法的速度,但后者仍然可以通过一步构建python列表对象来获胜.