为什么逐行复制文件会极大地影响Python中的复制速度？

Question

为什么逐行复制文件会极大地影响Python中的复制速度？

不久前,我制作了一个类似于此的Python脚本:

with open("somefile.txt", "r") as f, open("otherfile.txt", "a") as w:
    for line in f:
        w.write(line)

Run Code Online (Sandbox Code Playgroud)

当然,这对100mb文件的工作速度非常慢.

但是,我改变了程序来做到这一点

ls = []
with open("somefile.txt", "r") as f, open("otherfile.txt", "a") as w:
    for line in f:
        ls.append(line)
        if len(ls) == 100000:
            w.writelines(ls)
            del ls[:]

Run Code Online (Sandbox Code Playgroud)

并且文件复制速度更快.我的问题是,为什么第二种方法工作得更快,即使程序复制相同数量的行(虽然收集它们并逐个打印)？

Answer 1

Kas*_*mvd -1

这是因为在第一部分中，您必须write在每次迭代中为所有行调用该方法，这使得您的程序需要花费很多时间来运行。writelines()但在第二个代码中，虽然你浪费了更多内存，但它的性能更好，因为你每 100000行调用该方法。

让我们看看这是源代码，这是writelines函数的源代码：

def writelines(self, list_of_data):
    """Write a list (or any iterable) of data bytes to the transport.

    The default implementation concatenates the arguments and
    calls write() on the result.
    """
    if not _PY34:
        # In Python 3.3, bytes.join() doesn't handle memoryview.
        list_of_data = (
            bytes(data) if isinstance(data, memoryview) else data
            for data in list_of_data)
    self.write(b''.join(list_of_data))

Run Code Online (Sandbox Code Playgroud)

正如您所看到的，它连接了所有列表项并调用该write函数一次。

请注意，在这里加入数据需要时间，但比为write每一行调用函数的时间要少。但是由于您在中使用 python 3.4，它一次写入一行而不是加入它们，所以它会比write在中快得多这个案例：

cStringIO.writelines()现在接受任何可迭代的参数并一次写入一行，而不是连接它们并写入一次。对进行了并行更改StringIO.writelines()。节省内存并适合与生成器表达式一起使用。

归档时间：	10 年，6 月前
查看次数：	171 次
最近记录：	10 年，6 月前