(这个问题与这一个和这个问题有关,但那些是预先走动的发电机,这正是我想要避免的)
我想把一个发电机拆分成块.要求是:
我试过以下代码:
def head(iterable, max=10):
for cnt, el in enumerate(iterable):
yield el
if cnt >= max:
break
def chunks(iterable, size=10):
i = iter(iterable)
while True:
yield head(i, size)
# Sample generator: the real data is much more complex, and expensive to compute
els = xrange(7)
for n, chunk in enumerate(chunks(els, 3)):
for el in chunk:
print 'Chunk %3d, value %d' % (n, el)
Run Code Online (Sandbox Code Playgroud)
这有点工作:
Chunk 0, value 0
Chunk 0, value 1
Chunk 0, value 2
Chunk 1, value 3
Chunk 1, value 4
Chunk 1, value 5
Chunk 2, value 6
^CTraceback (most recent call last):
File "xxxx.py", line 15, in <module>
for el in chunk:
File "xxxx.py", line 2, in head
for cnt, el in enumerate(iterable):
KeyboardInterrupt
Run Code Online (Sandbox Code Playgroud)
Buuuut ......它永远不会停止(我必须按^C)因为while True.我想在生成器被消耗时停止该循环,但我不知道如何检测这种情况.我试过提出异常:
class NoMoreData(Exception):
pass
def head(iterable, max=10):
for cnt, el in enumerate(iterable):
yield el
if cnt >= max:
break
if cnt == 0 : raise NoMoreData()
def chunks(iterable, size=10):
i = iter(iterable)
while True:
try:
yield head(i, size)
except NoMoreData:
break
# Sample generator: the real data is much more complex, and expensive to compute
els = xrange(7)
for n, chunk in enumerate(chunks(els, 2)):
for el in chunk:
print 'Chunk %3d, value %d' % (n, el)
Run Code Online (Sandbox Code Playgroud)
但是这个例外只是在消费者的背景下提出,这不是我想要的(我想保持消费者代码干净)
Chunk 0, value 0
Chunk 0, value 1
Chunk 0, value 2
Chunk 1, value 3
Chunk 1, value 4
Chunk 1, value 5
Chunk 2, value 6
Traceback (most recent call last):
File "xxxx.py", line 22, in <module>
for el in chunk:
File "xxxx.py", line 9, in head
if cnt == 0 : raise NoMoreData
__main__.NoMoreData()
Run Code Online (Sandbox Code Playgroud)
如何在chunks不走动的情况下检测到发电机在功能中耗尽?
tob*_*s_k 63
一种方法是窥视第一个元素(如果有的话),然后创建并返回实际的生成器.
def head(iterable, max=10):
first = next(iterable) # raise exception when depleted
def head_inner():
yield first # yield the extracted first element
for cnt, el in enumerate(iterable):
yield el
if cnt + 1 >= max: # cnt + 1 to include first
break
return head_inner()
Run Code Online (Sandbox Code Playgroud)
只需在chunk生成器中使用它,StopIteration就像捕获自定义异常一样捕获异常.
更新:这是另一个版本,itertools.islice用于替换大部分head功能和for循环.这个简单for的事实,循环做同样的事情为笨重的while-try-next-except-break原代码构造,所以结果是很多的可读性.
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator: # stops when iterator is depleted
def chunk(): # construct generator for next chunk
yield first # yield element from for loop
for more in islice(iterator, size - 1):
yield more # yield more elements from the iterator
yield chunk() # in outer generator, yield next chunk
Run Code Online (Sandbox Code Playgroud)
我们可以比这更短,itertools.chain用来替换内部发电机:
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
Run Code Online (Sandbox Code Playgroud)
Mos*_*oye 10
另一种创建组/块而不是预先运行生成器的方法是使用itertools.groupby一个使用itertools.count对象的键函数.由于count对象独立于可迭代,因此可以在不知道可迭代内容的情况下容易地生成块.
每次迭代都groupby调用对象的next方法,count并通过对当前计数值进行整数除以块的大小来生成组/块密钥(后面是块中的项).
from itertools import groupby, count
def chunks(iterable, size=10):
c = count()
for _, g in groupby(iterable, lambda _: next(c)//size):
yield g
Run Code Online (Sandbox Code Playgroud)
生成器函数g 产生的每个组/块是迭代器.但是,由于groupby对所有组使用共享迭代器,因此组迭代器不能存储在列表或任何容器中,每个组迭代器应在下一个之前使用.
我可以提出最快的解决方案,感谢(在CPython中)使用纯粹的C级内置.通过这样做,不需要Python字节代码来生成每个块(除非底层生成器是用Python实现的),这具有巨大的性能优势.它确实在返回之前遍历每个块,但是它不会在它将要返回的块之外进行任何预先行走:
# Py2 only to get generator based map
from future_builtins import map
from itertools import islice, repeat, starmap, takewhile
# operator.truth is *significantly* faster than bool for the case of
# exactly one positional argument
from operator import truth
def chunker(n, iterable): # n is size of each chunk; last chunk may be smaller
return takewhile(truth, map(tuple, starmap(islice, repeat((iter(iterable), n)))))
Run Code Online (Sandbox Code Playgroud)
由于这有点密集,展开版本的插图:
def chunker(n, iterable):
iterable = iter(iterable)
while True:
x = tuple(islice(iterable, n))
if not x:
return
yield x
Run Code Online (Sandbox Code Playgroud)
如果需要chunker,enumerate可以将呼叫包含在内.