dje*_*jen 43 python csv unicode utf-8 python-2.x
当涉及UTF-8/Unicode时,Python中的csv模块无法正常工作.我在Python文档和其他网页上找到了适用于特定情况的片段,但您必须了解您正在处理的编码并使用相应的代码段.
如何从Python 2.6中"正常工作"的.csv文件中读取和写入字符串和Unicode字符串?或者这是Python 2.6的限制,没有简单的解决方案?
Max*_*kin 52
在http://docs.python.org/library/csv.html#examples中给出的如何读取Unicode的示例代码看起来已经过时,因为它不适用于Python 2.6和2.7.
下面的内容UnicodeDictReader
适用于utf-8,可能与其他编码有关,但我只在utf-8输入上测试过.
简而言之,只有在将csv行拆分为字段后才能解码Unicode csv.reader
.
class UnicodeCsvReader(object):
def __init__(self, f, encoding="utf-8", **kwargs):
self.csv_reader = csv.reader(f, **kwargs)
self.encoding = encoding
def __iter__(self):
return self
def next(self):
# read and split the csv row into fields
row = self.csv_reader.next()
# now decode
return [unicode(cell, self.encoding) for cell in row]
@property
def line_num(self):
return self.csv_reader.line_num
class UnicodeDictReader(csv.DictReader):
def __init__(self, f, encoding="utf-8", fieldnames=None, **kwds):
csv.DictReader.__init__(self, f, fieldnames=fieldnames, **kwds)
self.reader = UnicodeCsvReader(f, encoding=encoding, **kwds)
Run Code Online (Sandbox Code Playgroud)
用法(源文件编码为utf-8):
csv_lines = (
"???,123",
"???,456",
)
for row in UnicodeCsvReader(csv_lines):
for col in row:
print(type(col), col)
Run Code Online (Sandbox Code Playgroud)
输出:
$ python test.py
<type 'unicode'> ???
<type 'unicode'> 123
<type 'unicode'> ???
<type 'unicode'> 456
Run Code Online (Sandbox Code Playgroud)
Ser*_*eim 32
有点迟到的答案,但我使用unicodecsv非常成功.
its*_*dok 22
这里提供的模块看起来像csv模块的一个很酷,简单,简单的替代品,允许你使用utf-8 csv.
import ucsv as csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row
Run Code Online (Sandbox Code Playgroud)
在该doc中已经使用了Unicode示例,为什么还需要找到另一个或重新发明轮子?
import csv
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
28027 次 |
最近记录: |