我需要逐行读取一个大文件.假设文件超过5GB,我需要读取每一行,但显然我不想使用,readlines()因为它会在内存中创建一个非常大的列表.
以下代码如何适用于此案例?xreadlines本身是一个一个地读入记忆吗?是否需要生成器表达式?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Run Code Online (Sandbox Code Playgroud)
另外,我可以做什么来以相反的顺序读取它,就像Linux tail命令一样?
我发现:
http://code.google.com/p/pytailer/
和
两者都运作得很好!
我正在为Web应用程序编写一个日志文件查看器,为此我想通过日志文件的行分页.文件中的项目是基于行的,底部是最新项目.
所以我需要一种tail()方法,可以n从底部读取行并支持偏移量.我想出的是这样的:
def tail(f, n, offset=0):
"""Reads a n lines from f with an offset of offset lines."""
avg_line_length = 74
to_read = n + offset
while 1:
try:
f.seek(-(avg_line_length * to_read), 2)
except IOError:
# woops. apparently file is smaller than what we want
# to step back, go to the beginning instead
f.seek(0)
pos = f.tell()
lines = f.read().splitlines()
if len(lines) >= to_read or pos == 0:
return lines[-to_read:offset and -offset or None]
avg_line_length …Run Code Online (Sandbox Code Playgroud) 我需要获取文件中前一行的值,并在迭代文件时将其与当前行进行比较.该文件是巨大的,所以我无法读取整个或随机访问行号,linecache因为库函数仍然将整个文件读入内存.
编辑我很抱歉我忘了提到我必须向后阅读文件.
EDIT2
我尝试过以下方法:
f = open("filename", "r")
for line in reversed(f.readlines()): # this doesn't work because there are too many lines to read into memory
line = linecache.getline("filename", num_line) # this also doesn't work due to the same problem above.
Run Code Online (Sandbox Code Playgroud) 我有一个调用的遗留代码class TiffFile(file)。python3的调用方式是什么?
我尝试在 python2 中替换以下内容:
class TiffFile(file):
def __init__(self, path):
file.__init__(self, path, 'r+b')
Run Code Online (Sandbox Code Playgroud)
在python3中是这样的:
class TiffFile(RawIOBase):
def __init__(self, path):
super(TiffFile, self).__init__(path, 'r+b')
Run Code Online (Sandbox Code Playgroud)
但现在我得到了TypeError: object.__init__() takes no parameters
python ×4
file ×3
file-io ×1
logfiles ×1
loops ×1
porting ×1
python-2.7 ×1
python-2to3 ×1
python-3.x ×1
reverse ×1
tail ×1