jch*_*chl 81
with open(filename) as f:
while True:
c = f.read(1)
if not c:
print "End of file"
break
print "Read a character:", c
Run Code Online (Sandbox Code Playgroud)
Raj*_*Raj 37
首先打开一个文件:
with open("filename") as fileobj:
for line in fileobj:
for ch in line:
print ch
Run Code Online (Sandbox Code Playgroud)
Esc*_*alo 14
我喜欢接受的答案:它很简单,可以完成工作.我还想提供一个替代实现:
def chunks(filename, buffer_size=4096):
"""Reads `filename` in chunks of `buffer_size` bytes and yields each chunk
until no more characters can be read; the last chunk will most likely have
less than `buffer_size` bytes.
:param str filename: Path to the file
:param int buffer_size: Buffer size, in bytes (default is 4096)
:return: Yields chunks of `buffer_size` size until exhausting the file
:rtype: str
"""
with open(filename, "rb") as fp:
chunk = fp.read(buffer_size)
while chunk:
yield chunk
chunk = fp.read(buffer_size)
def chars(filename, buffersize=4096):
"""Yields the contents of file `filename` character-by-character. Warning:
will only work for encodings where one character is encoded as one byte.
:param str filename: Path to the file
:param int buffer_size: Buffer size for the underlying chunks,
in bytes (default is 4096)
:return: Yields the contents of `filename` character-by-character.
:rtype: char
"""
for chunk in chunks(filename, buffersize):
for char in chunk:
yield char
def main(buffersize, filenames):
"""Reads several files character by character and redirects their contents
to `/dev/null`.
"""
for filename in filenames:
with open("/dev/null", "wb") as fp:
for char in chars(filename, buffersize):
fp.write(char)
if __name__ == "__main__":
# Try reading several files varying the buffer size
import sys
buffersize = int(sys.argv[1])
filenames = sys.argv[2:]
sys.exit(main(buffersize, filenames))
Run Code Online (Sandbox Code Playgroud)
我建议的代码与您接受的答案基本相同:从文件中读取给定的字节数.不同之处在于它首先读取了大量数据(4006是X86的一个很好的默认值,但你可能想尝试1024或8192;你的页面大小的任何倍数),然后它产生那个块中的字符一个一个人.
对于较大的文件,我提供的代码可能更快.以托尔斯泰为例,以战争与和平的全文为例.这些是我的计时结果(使用OS X 10.7.4的Mac Book Pro; so.py是我给我粘贴的代码的名称):
$ time python so.py 1 2600.txt.utf-8
python so.py 1 2600.txt.utf-8 3.79s user 0.01s system 99% cpu 3.808 total
$ time python so.py 4096 2600.txt.utf-8
python so.py 4096 2600.txt.utf-8 1.31s user 0.01s system 99% cpu 1.318 total
Run Code Online (Sandbox Code Playgroud)
现在:不要将缓冲区大小4096视为普遍真理; 看看我得到的不同大小的结果(缓冲区大小(字节)与墙上时间(秒)):
2 2.726
4 1.948
8 1.693
16 1.534
32 1.525
64 1.398
128 1.432
256 1.377
512 1.347
1024 1.442
2048 1.316
4096 1.318
Run Code Online (Sandbox Code Playgroud)
正如你所看到的,你可以在早些时候开始看到收益(我的时间可能非常不准确); 缓冲区大小是性能和内存之间的权衡.默认值4096只是一个合理的选择,但与往常一样,先测量.
Python本身可以在交互模式下为您提供帮助:
>>> help(file.read)
Help on method_descriptor:
read(...)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
Run Code Online (Sandbox Code Playgroud)
我今天在观看 Raymond Hettinger 的Transforming Code into Beautiful, Idiomatic Python 时学到了一个新的习语:
import functools
with open(filename) as f:
f_read_ch = functools.partial(f.read, 1)
for ch in iter(f_read_ch, ''):
print 'Read a character:', repr(ch)
Run Code Online (Sandbox Code Playgroud)