我有两个大的(~100 GB)文本文件,必须同时迭代.
Zip适用于较小的文件,但我发现它实际上是从我的两个文件中创建一个行列表.这意味着每一行都存储在内存中.我不需要多次对这些行做任何事情.
handle1 = open('filea', 'r'); handle2 = open('fileb', 'r')
for i, j in zip(handle1, handle2):
do something with i and j.
write to an output file.
no need to do anything with i and j after this.
Run Code Online (Sandbox Code Playgroud)
是否有zip()的替代品作为生成器,允许我迭代这两个文件,而不使用> 200GB的RAM?
Anu*_*yal 22
from itertools import izip
for i, j in izip(handle1, handle2):
...
Run Code Online (Sandbox Code Playgroud)
如果您使用的文件大小不同izip_longest,izip则会停在较小的文件中.
Joh*_*ooy 16
你可以使用像这样的izip_longest用空行填充较短的文件
在python 2.6中
from itertools import izip_longest
with handle1 as open('filea', 'r'):
with handle2 as open('fileb', 'r'):
for i, j in izip_longest(handle1, handle2, fillvalue=""):
...
Run Code Online (Sandbox Code Playgroud)
或者在Python 3+中
from itertools import zip_longest
with handle1 as open('filea', 'r'), handle2 as open('fileb', 'r'):
for i, j in zip_longest(handle1, handle2, fillvalue=""):
...
Run Code Online (Sandbox Code Playgroud)