为什么readline()效率不高

Alo*_*kur 0 python

我在许多地方都读过,阅读文件的最佳方法是: -

with open(filename) as fo:
    for line in fo:
      print fo
Run Code Online (Sandbox Code Playgroud)

因为它在内存中一次只能读取一行,它允许我们一次处理一行然后读取下一行.

我相信对于fo.readline()来说也应该如此,它也应该在内存中一次只能读取一行.

Aprt处理文件结束并自动关闭文件对象,你们看到了其他任何优势吗?

for line in fo:
      print fo
Run Code Online (Sandbox Code Playgroud)

过度

fo.readline()
Run Code Online (Sandbox Code Playgroud)

unu*_*tbu 6

根据以下文档file.next:

为了使for循环成为循环文件行的最有效方式(一种非常常见的操作),该next()方法使用隐藏的预读缓冲区.使用预读缓冲区的结果是,将next()与其他文件方法(如readline())结合使用是行不通的.

for line in fo隐含地调用fo.next(),所以,

for line in fo:
    ...
Run Code Online (Sandbox Code Playgroud)

使用隐藏的预读缓冲区,提高了I/O性能.如果使用 readline(),则不会获得预读缓冲区的性能优势.


让我们在随机(480K)文件上测试上述声明:

def using_next(filename):
    with open(filename, 'r') as f:
        for line in f:
            pass

def using_iter_next(filename):
    with open(filename, 'r') as f:
        for line in iter(f.next, ''):
            pass

def using_iter_readline(filename):
    with open(filename, 'r') as f:
        for line in iter(f.readline, ''):
            pass

def using_while_readline(filename):
    with open(filename, 'r') as f:
        while True:
            line = f.readline()
            if not line:
                break

In [164]: %timeit using_next('data')
1000 loops, best of 3: 320 µs per loop

In [173]: %timeit using_iter_next('data')
1000 loops, best of 3: 531 µs per loop

In [171]: %timeit using_iter_readline('data')
1000 loops, best of 3: 1.91 ms per loop

In [170]: %timeit using_while_readline('data')
100 loops, best of 3: 2.21 ms per loop
Run Code Online (Sandbox Code Playgroud)