Python 2.7中的Open()和codecs.open()行为奇怪地不同

Question

Python 2.7中的Open()和codecs.open()行为奇怪地不同

Kri*_*fer 7 python file-io codec python-2.7 python-unicode

我有一个带有第一行unicode字符的文本文件和ASCII中的所有其他行.我尝试将第一行读作一个变量,将所有其他行读作另一个变量.但是,当我使用以下代码时:

# -*- coding: utf-8 -*-
import codecs
import os
filename = '1.txt'
f = codecs.open(filename, 'r3', encoding='utf-8')
print f
names_f = f.readline().split(' ')
data_f = f.readlines()
print len(names_f)
print len(data_f)
f.close()
print 'And now for something completely differerent:'
g = open(filename, 'r')
names_g = g.readline().split(' ')
print g
data_g = g.readlines()
print len(names_g)
print len(data_g)
g.close()

Run Code Online (Sandbox Code Playgroud)

我得到以下输出:

<open file '1.txt', mode 'rb' at 0x01235230>
28

7

And now for something completely differerent:

<open file '1.txt', mode 'r' at 0x017875A0>

28

77

Run Code Online (Sandbox Code Playgroud)

如果我不使用readlines(),则整个文件不仅会读取codecs.open()和open()中的前7行.

为什么会发生这样的事情？为什么codecs.open()以二进制模式读取文件,尽管添加了'r'参数？

Upd:这是原始文件:http://www1.datafilehost.com/d/0792d687

Answer 1

Mar*_*ers 16

因为你.readline() 先用过,所以codecs.open()文件填充了一个换行符; 后续调用只.readlines()返回缓冲行.

如果.readlines() 再次呼叫,则返回其余行:

>>> f = codecs.open(filename, 'r3', encoding='utf-8')
>>> line = f.readline()
>>> len(f.readlines())
7
>>> len(f.readlines())
71

Run Code Online (Sandbox Code Playgroud)

解决方法是不混合.readline()和.readlines():

f = codecs.open(filename, 'r3', encoding='utf-8')
data_f = f.readlines()
names_f = data_f.pop(0).split(' ')  # take the first line.

Run Code Online (Sandbox Code Playgroud)

这种行为真的是一个bug; Python开发人员都知道它,请参阅问题8260.

另一种选择是使用io.open()而不是codecs.open(); 该io库是Python 3用于实现内置open()函数的库,比codecs模块更强大,更通用.

归档时间：	12 年，10 月前
查看次数：	24553 次
最近记录：	11 年，6 月前