zip()替代迭代两个迭代

Aus*_*son 13 python

我有两个大的(~100 GB)文本文件,必须同时迭代.

Zip适用于较小的文件,但我发现它实际上是从我的两个文件中创建一个行列表.这意味着每一行都存储在内存中.我不需要多次对这些行做任何事情.

handle1 = open('filea', 'r'); handle2 = open('fileb', 'r')

for i, j in zip(handle1, handle2):
    do something with i and j.
    write to an output file.
    no need to do anything with i and j after this.
Run Code Online (Sandbox Code Playgroud)

是否有zip()的替代品作为生成器,允许我迭代这两个文件,而不使用> 200GB的RAM?

Anu*_*yal 22

itertools有一个功能izip,这样做

from itertools import izip
for i, j in izip(handle1, handle2):
    ...
Run Code Online (Sandbox Code Playgroud)

如果您使用的文件大小不同izip_longest,izip则会停在较小的文件中.


Joh*_*ooy 16

你可以使用像这样的izip_longest用空行填充较短的文件

python 2.6中

from itertools import izip_longest
with handle1 as open('filea', 'r'):
    with handle2 as open('fileb', 'r'): 
        for i, j in izip_longest(handle1, handle2, fillvalue=""):
            ...
Run Code Online (Sandbox Code Playgroud)

或者在Python 3+中

from itertools import zip_longest
with handle1 as open('filea', 'r'), handle2 as open('fileb', 'r'): 
    for i, j in zip_longest(handle1, handle2, fillvalue=""):
        ...
Run Code Online (Sandbox Code Playgroud)