我有一个file可以在通用模式下打开或不打开的对象.(file.mode如果有帮助,我可以访问此模式).
我想使用标准io方法处理这个文件:read和seek.
如果我以非通用模式打开文件,一切都很好:
In [1]: f = open('example', 'r')
In [2]: f.read()
Out[2]: 'Line1\r\nLine2\r\n' # uhoh, this file has carriage returns
In [3]: f.seek(0)
In [4]: f.read(8)
Out[4]: 'Line1\r\nL'
In [5]: f.seek(-8, 1)
In [6]: f.read(8)
Out[6]: 'Line1\r\nL' # as expected, this is the same as before
In [7]: f.close()
Run Code Online (Sandbox Code Playgroud)
但是,如果我以通用模式打开文件,我们遇到了一个问题:
In [8]: f = open('example', 'rU')
In [9]: f.read()
Out[9]: 'Line1\nLine2\n' # no carriage returns - thanks, 'U'!
In [10]: f.seek(0)
In [11]: f.read(8)
Out[11]: 'Line1\nLi'
In [12]: f.seek(-8, 1)
In [13]: f.read(8)
Out[13]: 'ine1\nLin' # NOT the same output, as what we read as '\n' was *2* bytes
Run Code Online (Sandbox Code Playgroud)
Python将\r\na 解释为\n,并返回长度为8的字符串.
但是,创建此字符串涉及从文件中读取9个字节.
因此,当试图扭转read使用时seek,我们不会回到我们开始的地方!
有没有办法确定我们消耗了2字节换行符,或者更好的是,禁用此行为?
我现在能想到的最好的就是tell在阅读之前和之后做一个,并检查我们实际得到了多少,但这看起来非常不优雅.
顺便说一句,在我看来,这种行为实际上与以下文档相反read:
In [54]: f.read?
Type: builtin_function_or_method
String Form:<built-in method read of file object at 0x1a35f60>
Docstring:
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
Run Code Online (Sandbox Code Playgroud)
在我的阅读中,这表明应该读取最多大小字节,而不是返回.
特别是,我认为上面例子的正确语义应该是:
In [11]: f.read(8)
Out[11]: 'Line1\nL' # return a string of length *7*
Run Code Online (Sandbox Code Playgroud)
我误解了文档吗?
我在答案中列出了一个解决方法,尽管我绝不满意。
鉴于根本问题是\n通用模式下 a 的长度与其在文件中实际表示的字节数之间的差异,避免该错误的一种方法是从\n实际表示一个字节的中间流中读取:
def wrap_stream(f):
# if this stream is a file, it's possible to just throw the contents in
# another stream
# alternatively, we could implement an io object which used a generator to
# read lines from f and interpose newlines as required
return StringIO(f.read())
Run Code Online (Sandbox Code Playgroud)
无论文件以何种模式打开,io返回的新对象wrap_stream都会将换行符显示为。\n