jch*_*chl 81

with open(filename) as f:
  while True:
    c = f.read(1)
    if not c:
      print "End of file"
      break
    print "Read a character:", c
Run Code Online (Sandbox Code Playgroud)

  • 由于这是一次读取一个字节,非ASCII编码不会失败吗? (35认同)
  • 问题和答案令人困惑的字符和字节概念.如果文件是每个字符编码的单个字节,如Ascii和许多其他编码,那么是的,您通过读取单个字节大小的块读取单个字符串,否则如果编码需要每个字符多于一个字节,那么您是只读一个字节而不是一个字符. (3认同)
  • 对于 David Chouinard 的问题:此代码段在 Python 3 中正确工作,文件采用 UTF-8 编码。例如,如果您有 Windows-1250 编码的文件,只需将第一行更改为 `with open(filename, encoding='Windows-1250') as f:` (3认同)
  • 那就对了。因此,我经常执行`result = open(filename).read()`,然后逐个字符读取`result`。 (2认同)

Raj*_*Raj 37

首先打开一个文件:

with open("filename") as fileobj:
    for line in fileobj:  
       for ch in line: 
           print ch
Run Code Online (Sandbox Code Playgroud)

  • 您可能一次读取一个文件的一个原因是该文件太大而无法放入内存中.但上面的答案假设每一行都可以适合内存. (7认同)
  • 由于OP从未提到一次读取整个文件一个字符,因此这种方法不是最佳的,因为整个文件可能包含在一行中;在这种情况下,在完成字符处理之前,需要花费相当多的时间来读取整行。在这些情况下,最好对部分读取使用 f.read(1)。 (2认同)

Esc*_*alo 14

我喜欢接受的答案:它很简单,可以完成工作.我还想提供一个替代实现:

def chunks(filename, buffer_size=4096):
    """Reads `filename` in chunks of `buffer_size` bytes and yields each chunk
    until no more characters can be read; the last chunk will most likely have
    less than `buffer_size` bytes.

    :param str filename: Path to the file
    :param int buffer_size: Buffer size, in bytes (default is 4096)
    :return: Yields chunks of `buffer_size` size until exhausting the file
    :rtype: str

    """
    with open(filename, "rb") as fp:
        chunk = fp.read(buffer_size)
        while chunk:
            yield chunk
            chunk = fp.read(buffer_size)

def chars(filename, buffersize=4096):
    """Yields the contents of file `filename` character-by-character. Warning:
    will only work for encodings where one character is encoded as one byte.

    :param str filename: Path to the file
    :param int buffer_size: Buffer size for the underlying chunks,
    in bytes (default is 4096)
    :return: Yields the contents of `filename` character-by-character.
    :rtype: char

    """
    for chunk in chunks(filename, buffersize):
        for char in chunk:
            yield char

def main(buffersize, filenames):
    """Reads several files character by character and redirects their contents
    to `/dev/null`.

    """
    for filename in filenames:
        with open("/dev/null", "wb") as fp:
            for char in chars(filename, buffersize):
                fp.write(char)

if __name__ == "__main__":
    # Try reading several files varying the buffer size
    import sys
    buffersize = int(sys.argv[1])
    filenames  = sys.argv[2:]
    sys.exit(main(buffersize, filenames))
Run Code Online (Sandbox Code Playgroud)

我建议的代码与您接受的答案基本相同:从文件中读取给定的字节数.不同之处在于它首先读取了大量数据(4006是X86的一个很好的默认值,但你可能想尝试1024或8192;你的页面大小的任何倍数),然后它产生那个块中的字符一个一个人.

对于较大的文件,我提供的代码可能更快.以托尔斯泰为例,以战争与和平的全文为例.这些是我的计时结果(使用OS X 10.7.4的Mac Book Pro; so.py是我给我粘贴的代码的名称):

$ time python so.py 1 2600.txt.utf-8
python so.py 1 2600.txt.utf-8  3.79s user 0.01s system 99% cpu 3.808 total
$ time python so.py 4096 2600.txt.utf-8
python so.py 4096 2600.txt.utf-8  1.31s user 0.01s system 99% cpu 1.318 total
Run Code Online (Sandbox Code Playgroud)

现在:不要将缓冲区大小4096视为普遍真理; 看看我得到的不同大小的结果(缓冲区大小(字节)与墙上时间(秒)):

   2 2.726 
   4 1.948 
   8 1.693 
  16 1.534 
  32 1.525 
  64 1.398 
 128 1.432 
 256 1.377 
 512 1.347 
1024 1.442 
2048 1.316 
4096 1.318 
Run Code Online (Sandbox Code Playgroud)

正如你所看到的,你可以在早些时候开始看到收益(我的时间可能非常不准确); 缓冲区大小是性能和内存之间的权衡.默认值4096只是一个合理的选择,但与往常一样,先测量.


Mat*_*son 8

Python本身可以在交互模式下为您提供帮助:

>>> help(file.read)
Help on method_descriptor:

read(...)
    read([size]) -> read at most size bytes, returned as a string.

    If the size argument is negative or omitted, read until EOF is reached.
    Notice that when in non-blocking mode, less data than what was requested
    may be returned, even if no size parameter was given.
Run Code Online (Sandbox Code Playgroud)

  • 我同意这种观点,但也许这更适合作为对OP的评论? (4认同)
  • 可能是,但我认为所有这些文本在评论中看起来都很混乱。 (2认同)

joa*_*uin 6

只是:

myfile = open(filename)
onecaracter = myfile.read(1)
Run Code Online (Sandbox Code Playgroud)


Mic*_*pat 5

我今天在观看 Raymond Hettinger 的Transforming Code into Beautiful, Idiomatic Python 时学到了一个新的习语:

import functools

with open(filename) as f:
    f_read_ch = functools.partial(f.read, 1)
    for ch in iter(f_read_ch, ''):
        print 'Read a character:', repr(ch) 
Run Code Online (Sandbox Code Playgroud)