使用itertools.tee复制嵌套迭代器(即itertools.groupby)

Question

使用itertools.tee复制嵌套迭代器(即itertools.groupby)

Dee*_*ace 7 python iterator python-itertools

我正在读一个文件(在做一些昂贵的逻辑时),我需要在不同的函数中迭代几次,所以我真的只想读取和解析文件一次.

解析函数解析文件并返回一个itertools.groupby对象.

def parse_file():
    ...
    return itertools.groupby(lines, key=keyfunc)

Run Code Online (Sandbox Code Playgroud)

我考虑过做以下事情:

csv_file_content = read_csv_file()

file_content_1, file_content_2 = itertools.tee(csv_file_content, 2)

foo(file_content_1)
bar(file_content_2)

Run Code Online (Sandbox Code Playgroud)

但是,itertools.tee似乎只能"复制"外部迭代器,而内部(嵌套)迭代器仍然引用原始(因此在迭代返回的^第一个迭代器后它将耗尽itertools.tee).

独立MCVE:

from itertools import groupby, tee

li = [{'name': 'a', 'id': 1},
      {'name': 'a', 'id': 2},
      {'name': 'b', 'id': 3},
      {'name': 'b', 'id': 4},
      {'name': 'c', 'id': 5},
      {'name': 'c', 'id': 6}]

groupby_obj = groupby(li, key=lambda x:x['name'])
tee_obj1, tee_obj2 = tee(groupby_obj, 2)

print(id(tee_obj1))
for group, data in tee_obj1:
    print(group)
    print(id(data))
    for i in data:
        print(i)

print('----')

print(id(tee_obj2))
for group, data in tee_obj2:
    print(group)
    print(id(data))
    for i in data:
        print(i)

Run Code Online (Sandbox Code Playgroud)

输出

2380054450440
a
2380053623136
{'name': 'a', 'id': 1}
{'name': 'a', 'id': 2}
b
2380030915976
{'name': 'b', 'id': 3}
{'name': 'b', 'id': 4}
c
2380054184344
{'name': 'c', 'id': 5}
{'name': 'c', 'id': 6}
----
2380064387336
a
2380053623136  # same ID as above
b
2380030915976  # same ID as above 
c
2380054184344  # same ID as above

Run Code Online (Sandbox Code Playgroud)

我们如何有效地复制嵌套迭代器？

Answer 1

iGi*_*ian 2

似乎grouped_object( class 'itertools.groupby') 被消耗一次，即使在itertools.tee. 同样的并行分配也grouped_object不起作用：

tee_obj1, tee_obj2 = groupby_obj, groupby_obj

Run Code Online (Sandbox Code Playgroud)

有效的是以下内容的深层副本grouped_object：

tee_obj1, tee_obj2 = copy.deepcopy(groupby_obj), groupby_obj

Run Code Online (Sandbox Code Playgroud)

“似乎grouped_objectct（类'itertools.groupby'）被消耗一次，即使在itertools.tee中”我不认为这是真的，否则`abc`不会被第二次输出。尽管我希望使用比“deepcopy”更优雅的东西，但我会接受这个答案。谢谢！ (2认同)

归档时间：	6 年，10 月前
查看次数：	114 次
最近记录：	6 年，10 月前