qww*_*wwq 3 python python-itertools python-2.7
在itertools的python文档中,它为推进迭代器n步骤提供了以下"配方":
def consume(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
Run Code Online (Sandbox Code Playgroud)
我想知道为什么这个配方与这样的东西根本不同(除了消耗整个迭代器的处理):
def other_consume(iterable, n):
for i in xrange(n):
next(iterable, None)
Run Code Online (Sandbox Code Playgroud)
我曾经timeit
证实,正如预期的那样,上述方法要慢得多.配方中有什么可以实现这种卓越的性能?我得到它使用islice
,但看islice
,它看起来与上面的代码基本相同:
def islice(iterable, *args):
s = slice(*args)
it = iter(xrange(s.start or 0, s.stop or sys.maxint, s.step or 1))
nexti = next(it)
### it seems as if this loop yields from the iterable n times via enumerate
### how is this different from calling next n times?
for i, element in enumerate(iterable):
if i == nexti:
yield element
nexti = next(it)
Run Code Online (Sandbox Code Playgroud)
注意:即使代替进口islice
从itertools
我使用Python相当于从上面显示的文档定义它,配方仍然较快..
编辑:timeit
这里的代码:
timeit.timeit('a = iter([random() for i in xrange(1000000)]); consume(a, 1000000)', setup="from __main__ import consume,random", number=10)
timeit.timeit('a = iter([random() for i in xrange(1000000)]); other_consume(a, 1000000)', setup="from __main__ import other_consume,random", number=10)
Run Code Online (Sandbox Code Playgroud)
other_consume
每次运行时都慢约2.5倍
配方更快的原因是它的关键部分(islice
,deque
)是用C实现的,而不是用纯Python实现的.部分原因是C环比快for i in xrange(n)
.另一部分是Python函数调用(例如next()
)比它们的C等价物更昂贵.
itertools.islice
您从文档中复制的版本不正确,其显然性能很好,因为使用它的消费函数不会消耗任何东西.(出于这个原因,我没有在下面显示该版本的测试结果,虽然它非常快!:)
这里有几个不同的实现,所以我们可以测试最快的:
import collections
from itertools import islice
# this is the official recipe
def consume_itertools(iterator, n):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
# your initial version, using a for loop on a range
def consume_qwwqwwq(iterator, n):
for i in xrange(n):
next(iterator, None)
# a slightly better version, that only has a single loop:
def consume_blckknght(iterator, n):
if n <= 0:
return
for i, v in enumerate(iterator, start=1):
if i == n:
break
Run Code Online (Sandbox Code Playgroud)
我的系统上的计时(Windows 7上的Python 2.7.3 64位):
>>> test = 'consume(iter(xrange(100000)), 1000)'
>>> timeit.timeit(test, 'from consume import consume_itertools as consume')
7.623556181657534
>>> timeit.timeit(test, 'from consume import consume_qwwqwwq as consume')
106.8907442334584
>>> timeit.timeit(test, 'from consume import consume_blckknght as consume')
56.81081856366518
Run Code Online (Sandbox Code Playgroud)
我的评估是,一个几乎空的Python循环比C中的等效循环运行时间长七到八倍.一次循环两个序列(除了循环之外再consume_qwwqwwq
调用next )会使成本大约翻倍.iterator
for
xrange