Python的[<generator expression>]至少比list(<generator expression>)快3倍?

hal*_*fak 15 python performance profiling

似乎在生成器表达式(test1)周围使用[]表现得比将它放在list()(test2)中要好得多.当我只是将列表传递给list()以进行浅拷贝(test3)时,速度就不存在了.为什么是这样?

证据:

from timeit import Timer

t1 = Timer("test1()", "from __main__ import test1")
t2 = Timer("test2()", "from __main__ import test2")
t3 = Timer("test3()", "from __main__ import test3")

x = [34534534, 23423523, 77645645, 345346]

def test1():
    [e for e in x]

print t1.timeit()
#0.552290201187


def test2():
    list(e for e in x)

print t2.timeit()
#2.38739395142

def test3():
    list(x)

print t3.timeit()
#0.515818119049
Run Code Online (Sandbox Code Playgroud)

机器:64位AMD,Ubuntu 8.04,Python 2.7(r27:82500)

Kat*_*iel 34

好吧,我的第一步是独立设置两个测试,以确保这不是例如定义函数的顺序的结果.

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "[e for e in x]"
1000000 loops, best of 3: 0.638 usec per loop

>python -mtimeit "x=[34534534, 23423523, 77645645, 345346]" "list(e for e in x)"
1000000 loops, best of 3: 1.72 usec per loop
Run Code Online (Sandbox Code Playgroud)

果然,我可以复制这个.好的,下一步是查看字节码,看看实际发生了什么:

>>> import dis
>>> x=[34534534, 23423523, 77645645, 345346]
>>> dis.dis(lambda: [e for e in x])
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x0000000001F8B330, file "<stdin>", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (x)
              9 GET_ITER
             10 CALL_FUNCTION            1
             13 RETURN_VALUE
>>> dis.dis(lambda: list(e for e in x))
  1           0 LOAD_GLOBAL              0 (list)
              3 LOAD_CONST               0 (<code object <genexpr> at 0x0000000001F8B9B0, file "<stdin>", line 1>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (x)
             12 GET_ITER
             13 CALL_FUNCTION            1
             16 CALL_FUNCTION            1
             19 RETURN_VALUE
Run Code Online (Sandbox Code Playgroud)

请注意,第一个方法直接创建列表,而第二个方法创建一个genexpr对象并将其传递给全局list.这可能是开销所在.

还要注意,差异大约是一微秒,即完全无关紧要.


其他有趣的数据

这仍然适用于非平凡的列表

>python -mtimeit "x=range(100000)" "[e for e in x]"
100 loops, best of 3: 8.51 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x)"
100 loops, best of 3: 11.8 msec per loop
Run Code Online (Sandbox Code Playgroud)

对于不那么简单的地图功能:

>python -mtimeit "x=range(100000)" "[2*e for e in x]"
100 loops, best of 3: 12.8 msec per loop

>python -mtimeit "x=range(100000)" "list(2*e for e in x)"
100 loops, best of 3: 16.8 msec per loop
Run Code Online (Sandbox Code Playgroud)

和(虽然不太强烈)如果我们过滤列表:

>python -mtimeit "x=range(100000)" "[e for e in x if e%2]"
100 loops, best of 3: 14 msec per loop

>python -mtimeit "x=range(100000)" "list(e for e in x if e%2)"
100 loops, best of 3: 16.5 msec per loop
Run Code Online (Sandbox Code Playgroud)

  • 很好的问题方法和解释 (2认同)

Ben*_*mes 9

list(e for e in x)是不是列表理解,这是一个genexpr对象(e for e in x)被创建并传递给list工厂函数.据推测,对象创建和方法调用会产生开销.