我有一个巨大的文本文件,我想打开.
我正在以块的形式读取文件,避免了与一次读取过多文件相关的内存问题.
代码段:
def open_delimited(fileName, args):
with open(fileName, args, encoding="UTF16") as infile:
chunksize = 10000
remainder = ''
for chunk in iter(lambda: infile.read(chunksize), ''):
pieces = re.findall(r"(\d+)\s+(\d+_\d+)", remainder + chunk)
for piece in pieces[:-1]:
yield piece
remainder = '{} {} '.format(*pieces[-1])
if remainder:
yield remainder
Run Code Online (Sandbox Code Playgroud)
代码抛出错误UnicodeDecodeError: 'utf16' codec can't decode bytes in position 8190-8191: unexpected end of data.
我试过UTF8并得到了错误UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte.
latin-1并iso-8859-1 …