Rux*_*ang 63 python csv unicode export python-2.7
我是Python新手,我有一个关于如何使用Python来读写CSV文件的问题.我的文件包含德语,法语等.根据我的代码,可以在Python中正确读取文件,但是当我将其写入新的CSV文件时,unicode会变成一些奇怪的字符.
数据如下:
我的代码是:
import csv
f=open('xxx.csv','rb')
reader=csv.reader(f)
wt=open('lll.csv','wb')
writer=csv.writer(wt,quoting=csv.QUOTE_ALL)
wt.close()
f.close()
Run Code Online (Sandbox Code Playgroud)
结果如下:
你能告诉我应该怎样做才能解决这个问题吗?非常感谢你!
Oz1*_*123 54
另一种选择:
使用unicodecsv包中的代码......
https://pypi.python.org/pypi/unicodecsv/
>>> import unicodecsv as csv
>>> from io import BytesIO
>>> f = BytesIO()
>>> w = csv.writer(f, encoding='utf-8')
>>> _ = w.writerow((u'é', u'ñ'))
>>> _ = f.seek(0)
>>> r = csv.reader(f, encoding='utf-8')
>>> next(r) == [u'é', u'ñ']
True
Run Code Online (Sandbox Code Playgroud)
此模块与STDLIB csv模块API兼容.
daw*_*awg 52
确保您根据需要进行编码和解码.
此示例将utf-8中的一些示例文本往返到csv文件并返回以演示:
# -*- coding: utf-8 -*-
import csv
tests={'German': [u'Straße',u'auslösen',u'zerstören'],
'French': [u'français',u'américaine',u'épais'],
'Chinese': [u'???',u'??',u'???']}
with open('/tmp/utf.csv','w') as fout:
writer=csv.writer(fout)
writer.writerows([tests.keys()])
for row in zip(*tests.values()):
row=[s.encode('utf-8') for s in row]
writer.writerows([row])
with open('/tmp/utf.csv','r') as fin:
reader=csv.reader(fin)
for row in reader:
temp=list(row)
fmt=u'{:<15}'*len(temp)
print fmt.format(*[s.decode('utf-8') for s in temp])
Run Code Online (Sandbox Code Playgroud)
打印:
German Chinese French
Straße ??? français
auslösen ?? américaine
zerstören ??? épais
Run Code Online (Sandbox Code Playgroud)
Mar*_*nen 30
csv模块文档末尾有一个示例,演示了如何处理Unicode.以下是从该示例中直接复制的.请注意,读取或写入的字符串将是Unicode字符串.例如,不要传递字节字符串UnicodeWriter.writerows
.
import csv,codecs,cStringIO
class UTF8Recoder:
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)
def __iter__(self):
return self
def next(self):
return self.reader.next().encode("utf-8")
class UnicodeReader:
def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, dialect=dialect, **kwds)
def next(self):
'''next() -> unicode
This function reads and returns the next line as a Unicode string.
'''
row = self.reader.next()
return [unicode(s, "utf-8") for s in row]
def __iter__(self):
return self
class UnicodeWriter:
def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
'''writerow(unicode) -> None
This function takes a Unicode string and encodes it to the output.
'''
self.writer.writerow([s.encode("utf-8") for s in row])
data = self.queue.getvalue()
data = data.decode("utf-8")
data = self.encoder.encode(data)
self.stream.write(data)
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
with open('xxx.csv','rb') as fin, open('lll.csv','wb') as fout:
reader = UnicodeReader(fin)
writer = UnicodeWriter(fout,quoting=csv.QUOTE_ALL)
for line in reader:
writer.writerow(line)
Run Code Online (Sandbox Code Playgroud)
输入(UTF-8编码):
American,???
French,???
German,???
Run Code Online (Sandbox Code Playgroud)
输出:
"American","???"
"French","???"
"German","???"
Run Code Online (Sandbox Code Playgroud)
因为实际上是str
在python2中bytes
。因此,如果要写入unicode
csv,则必须编码unicode
为str
使用utf-8
编码。
def py2_unicode_to_str(u):
# unicode is only exist in python2
assert isinstance(u, unicode)
return u.encode('utf-8')
Run Code Online (Sandbox Code Playgroud)
用途class csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)
:
csvfile
:open(fp, 'w')
bytes
其中编码的密钥和值utf-8
writer.writerow({py2_unicode_to_str(k): py2_unicode_to_str(v) for k,v in row.items()})
csvfile
:open(fp, 'w')
str
作为row
至writer.writerow(row)
最后的代码
import sys
is_py2 = sys.version_info[0] == 2
def py2_unicode_to_str(u):
# unicode is only exist in python2
assert isinstance(u, unicode)
return u.encode('utf-8')
with open('file.csv', 'w') as f:
if is_py2:
data = {u'Python??': u'Python??', u'Python??2': u'Python??2'}
# just one more line to handle this
data = {py2_unicode_to_str(k): py2_unicode_to_str(v) for k, v in data.items()}
fields = list(data[0])
writer = csv.DictWriter(f, fieldnames=fields)
for row in data:
writer.writerow(row)
else:
data = {'Python??': 'Python??', 'Python??2': 'Python??2'}
fields = list(data[0])
writer = csv.DictWriter(f, fieldnames=fields)
for row in data:
writer.writerow(row)
Run Code Online (Sandbox Code Playgroud)
在python3中,只需使用unicode即可str
。
在python2中,使用unicode
句柄文本,str
在发生I / O时使用。
归档时间: |
|
查看次数: |
99701 次 |
最近记录: |