逐批读取文件中的多行

Question

逐批读取文件中的多行

flu*_*y03 3 python io readfile python-2.7

我想知道是否有一种方法可以逐批读取文件中的多行。例如：

with open(filename, 'rb') as f:
    for n_lines in f:
        process(n_lines)

Run Code Online (Sandbox Code Playgroud)

在此函数中，我想做的是：对于每次迭代，将从文件中逐批读取下n行。

因为一个文件太大。我想做的是部分阅读。

Answer 1

Sha*_*ger 5

itertools.islice和两个arg iter可以用来完成此操作，但这有点有趣：

from itertools import islice

n = 5  # Or whatever chunk size you want
with open(filename, 'rb') as f:
    for n_lines in iter(lambda: tuple(islice(f, n)), ()):
        process(n_lines)

Run Code Online (Sandbox Code Playgroud)

这将一次保持islice离线n状态（tuple用于实际上强制读取整个块），直到f用完为止，此时它将停止。n如果文件中的行数不是的偶数倍，则最后一块将小于行n。如果希望所有行都是单个字符串，请将for循环更改为：

    # The b prefixes are ignored on 2.7, and necessary on 3.x since you opened
    # the file in binary mode
    for n_lines in iter(lambda: b''.join(islice(f, n)), b''):

Run Code Online (Sandbox Code Playgroud)

另一种方法是izip_longest为此目的使用，它避免了lambda功能：

from future_builtins import map  # Only on Py2
from itertools import izip_longest  # zip_longest on Py3

    # gets tuples possibly padded with empty strings at end of file
    for n_lines in izip_longest(*[f]*n, fillvalue=b''):

    # Or to combine into a single string:
    for n_lines in map(b''.join, izip_longest(*[f]*n, fillvalue=b'')):

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，3 月前
查看次数：	4230 次
最近记录：	7 年，4 月前