小智 133
作为生成器编写的正确,有效的答案.
import os
def reverse_readline(filename, buf_size=8192):
"""A generator that returns the lines of a file in reverse order"""
with open(filename) as fh:
segment = None
offset = 0
fh.seek(0, os.SEEK_END)
file_size = remaining_size = fh.tell()
while remaining_size > 0:
offset = min(file_size, offset + buf_size)
fh.seek(file_size - offset)
buffer = fh.read(min(remaining_size, buf_size))
remaining_size -= buf_size
lines = buffer.split('\n')
# The first line of the buffer is probably not a complete line so
# we'll save it and append it to the last line of the next buffer
# we read
if segment is not None:
# If the previous chunk starts right from the beginning of line
# do not concat the segment to the last line of new chunk.
# Instead, yield the segment first
if buffer[-1] != '\n':
lines[-1] += segment
else:
yield segment
segment = lines[0]
for index in range(len(lines) - 1, 0, -1):
if lines[index]:
yield lines[index]
# Don't yield None if the file was empty
if segment is not None:
yield segment
Run Code Online (Sandbox Code Playgroud)
Mat*_*ner 69
for line in reversed(open("filename").readlines()):
print line.rstrip()
Run Code Online (Sandbox Code Playgroud)
在Python 3中:
for line in reversed(list(open("filename"))):
print(line.rstrip())
Run Code Online (Sandbox Code Playgroud)
Ber*_*pac 20
这样的事情怎么样:
import os
def readlines_reverse(filename):
with open(filename) as qfile:
qfile.seek(0, os.SEEK_END)
position = qfile.tell()
line = ''
while position >= 0:
qfile.seek(position)
next_char = qfile.read(1)
if next_char == "\n":
yield line[::-1]
line = ''
else:
line += next_char
position -= 1
yield line[::-1]
if __name__ == '__main__':
for qline in readlines_reverse(raw_input()):
print qline
Run Code Online (Sandbox Code Playgroud)
由于文件是按相反的顺序逐字读取的,因此只要单个行适合内存,它甚至可以在非常大的文件上工作.
use*_*751 18
你也可以使用python模块file_read_backwards.
安装后,通过pip install file_read_backwards(v1.2.1),您可以通过以下内容高效的方式向后(按行)读取整个文件:
#!/usr/bin/env python2.7
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
for l in frb:
print l
Run Code Online (Sandbox Code Playgroud)
它支持"utf-8","latin-1"和"ascii"编码.
python3也支持.更多文档可以在http://file-read-backwards.readthedocs.io/en/latest/readme.html找到
Aza*_*kov 13
接受的答案不适用于大文件无法放入内存的情况(这并不罕见)。
正如其他人所指出的,@srohde 的答案看起来不错,但它还有下一个问题:
utf-8编码和非 ascii 内容的文件,例如?
Run Code Online (Sandbox Code Playgroud)
通过buf_size等于1并且将有
?
Run Code Online (Sandbox Code Playgroud)
当然,文本可能更大,但buf_size可能会被拾取,因此会导致上述混淆错误,
因此,考虑到所有这些问题,我编写了单独的函数:
首先让我们定义下一个效用函数:
ceil_division用于与天花板进行分隔(与//带有地板的标准分隔相比,可以在此线程中找到更多信息)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 0: invalid start byte
Run Code Online (Sandbox Code Playgroud)
split 用于通过给定的分隔符从右端拆分字符串并能够保留它:
def ceil_division(left_number, right_number):
"""
Divides given numbers with ceiling.
"""
return -(-left_number // right_number)
Run Code Online (Sandbox Code Playgroud)
read_batch_from_end 从二进制流的右端读取批处理
def split(string, separator, keep_separator):
"""
Splits given string by given separator.
"""
parts = string.split(separator)
if keep_separator:
*parts, last_part = parts
parts = [part + separator for part in parts]
if last_part:
return parts + [last_part]
return parts
Run Code Online (Sandbox Code Playgroud)
之后,我们可以定义以相反顺序读取字节流的函数,例如
def read_batch_from_end(byte_stream, size, end_position):
"""
Reads batch from the end of given byte stream.
"""
if end_position > size:
offset = end_position - size
else:
offset = 0
size = end_position
byte_stream.seek(offset)
return byte_stream.read(size)
Run Code Online (Sandbox Code Playgroud)
最后一个用于反转文本文件的函数可以定义为:
import functools
import itertools
import os
from operator import methodcaller, sub
def reverse_binary_stream(byte_stream, batch_size=None,
lines_separator=None,
keep_lines_separator=True):
if lines_separator is None:
lines_separator = (b'\r', b'\n', b'\r\n')
lines_splitter = methodcaller(str.splitlines.__name__,
keep_lines_separator)
else:
lines_splitter = functools.partial(split,
separator=lines_separator,
keep_separator=keep_lines_separator)
stream_size = byte_stream.seek(0, os.SEEK_END)
if batch_size is None:
batch_size = stream_size or 1
batches_count = ceil_division(stream_size, batch_size)
remaining_bytes_indicator = itertools.islice(
itertools.accumulate(itertools.chain([stream_size],
itertools.repeat(batch_size)),
sub),
batches_count)
try:
remaining_bytes_count = next(remaining_bytes_indicator)
except StopIteration:
return
def read_batch(position):
result = read_batch_from_end(byte_stream,
size=batch_size,
end_position=position)
while result.startswith(lines_separator):
try:
position = next(remaining_bytes_indicator)
except StopIteration:
break
result = (read_batch_from_end(byte_stream,
size=batch_size,
end_position=position)
+ result)
return result
batch = read_batch(remaining_bytes_count)
segment, *lines = lines_splitter(batch)
yield from lines[::-1]
for remaining_bytes_count in remaining_bytes_indicator:
batch = read_batch(remaining_bytes_count)
lines = lines_splitter(batch)
if batch.endswith(lines_separator):
yield segment
else:
lines[-1] += segment
segment, *lines = lines
yield from lines[::-1]
yield segment
Run Code Online (Sandbox Code Playgroud)
我使用fsutil命令生成了 4 个文件:
我还重构了@srohde 解决方案以使用文件对象而不是文件路径。
import codecs
def reverse_file(file, batch_size=None,
lines_separator=None,
keep_lines_separator=True):
encoding = file.encoding
if lines_separator is not None:
lines_separator = lines_separator.encode(encoding)
yield from map(functools.partial(codecs.decode,
encoding=encoding),
reverse_binary_stream(
file.buffer,
batch_size=batch_size,
lines_separator=lines_separator,
keep_lines_separator=keep_lines_separator))
Run Code Online (Sandbox Code Playgroud)
注意:我已经使用collections.dequeclass 来耗尽发电机。
对于 Windows 10 上的 PyPy 3.5:
from timeit import Timer
repeats_count = 7
number = 1
create_setup = ('from collections import deque\n'
'from __main__ import reverse_file, reverse_readline\n'
'file = open("{}")').format
srohde_solution = ('with file:\n'
' deque(reverse_readline(file,\n'
' buf_size=8192),'
' maxlen=0)')
azat_ibrakov_solution = ('with file:\n'
' deque(reverse_file(file,\n'
' lines_separator="\\n",\n'
' keep_lines_separator=False,\n'
' batch_size=8192), maxlen=0)')
print('reversing empty file by "srohde"',
min(Timer(srohde_solution,
create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing empty file by "Azat Ibrakov"',
min(Timer(azat_ibrakov_solution,
create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "srohde"',
min(Timer(srohde_solution,
create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "Azat Ibrakov"',
min(Timer(azat_ibrakov_solution,
create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "srohde"',
min(Timer(srohde_solution,
create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "Azat Ibrakov"',
min(Timer(azat_ibrakov_solution,
create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "srohde"',
min(Timer(srohde_solution,
create_setup('large.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "Azat Ibrakov"',
min(Timer(azat_ibrakov_solution,
create_setup('large.txt')).repeat(repeats_count, number)))
Run Code Online (Sandbox Code Playgroud)
对于 Windows 10 上的 CPython 3.5:
reversing empty file by "srohde" 8.31e-05
reversing empty file by "Azat Ibrakov" 0.00016090000000000028
reversing tiny file (1MB) by "srohde" 0.160081
reversing tiny file (1MB) by "Azat Ibrakov" 0.09594989999999998
reversing small file (10MB) by "srohde" 8.8891863
reversing small file (10MB) by "Azat Ibrakov" 5.323388100000001
reversing large file (50MB) by "srohde" 186.5338368
reversing large file (50MB) by "Azat Ibrakov" 99.07450229999998
Run Code Online (Sandbox Code Playgroud)
因此,正如我们所见,它的性能与原始解决方案相似,但更通用且没有上面列出的缺点。
我已将此添加到具有许多经过良好测试的功能/迭代实用程序0.3.0的lz软件包版本(需要Python 3.5 +)中。
可以像
reversing empty file by "srohde" 3.600000000000001e-05
reversing empty file by "Azat Ibrakov" 4.519999999999958e-05
reversing tiny file (1MB) by "srohde" 0.01965560000000001
reversing tiny file (1MB) by "Azat Ibrakov" 0.019207699999999994
reversing small file (10MB) by "srohde" 3.1341862999999996
reversing small file (10MB) by "Azat Ibrakov" 3.0872588000000007
reversing large file (50MB) by "srohde" 82.01206720000002
reversing large file (50MB) by "Azat Ibrakov" 82.16775059999998
Run Code Online (Sandbox Code Playgroud)
它支持所有标准编码(也许除了utf-7因为我很难定义生成可使用它编码的字符串的策略)。
for line in reversed(open("file").readlines()):
print line.rstrip()
Run Code Online (Sandbox Code Playgroud)
如果您使用的是Linux,则可以使用tac命令.
$ tac file
Run Code Online (Sandbox Code Playgroud)
import re
def filerev(somefile, buffer=0x20000):
somefile.seek(0, os.SEEK_END)
size = somefile.tell()
lines = ['']
rem = size % buffer
pos = max(0, (size // buffer - 1) * buffer)
while pos >= 0:
somefile.seek(pos, os.SEEK_SET)
data = somefile.read(rem + buffer) + lines[0]
rem = 0
lines = re.findall('[^\n]*\n?', data)
ix = len(lines) - 2
while ix > 0:
yield lines[ix]
ix -= 1
pos -= buffer
else:
yield lines[0]
with open(sys.argv[1], 'r') as f:
for line in filerev(f):
sys.stdout.write(line)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
124421 次 |
| 最近记录: |