Python DictWriter编写UTF-8编码的CSV文件

end*_*ith 51 python csv unicode utf-8

  1. 我有一个包含unicode字符串的字典列表.
  2. csv.DictWriter 可以将字典列表写入CSV文件.
  3. 我希望CSV文件以UTF8编码.
  4. csv模块无法处理将unicode字符串转换为UTF8.
  5. csv模块文档具有的一切转换为UTF-8的例子:

    def utf_8_encoder(unicode_csv_data):
        for line in unicode_csv_data:
            yield line.encode('utf-8')
    
    Run Code Online (Sandbox Code Playgroud)
  6. 它也有一个UnicodeWriter类.

但是......我该如何DictWriter使用这些?难道他们不必在它的中间注入自己,以便在将它们写入文件之前捕获反汇编的字典并对它们进行编码吗?我不明白.

Mar*_*nen 91

更新:第三方unicodecsv模块为您实现这个7岁的答案.此代码下面的示例.还有一个Python 3解决方案,不需要第三方模块.

原始Python 2答案

如果使用Python 2.7或更高版本,请在传递给DictWriter之前使用dict comprehension将字典重新映射到utf-8:

# coding: utf-8
import csv
D = {'name':u'??','pinyin':u'm?kè'}
f = open('out.csv','wb')
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
w = csv.DictWriter(f,sorted(D.keys()))
w.writeheader()
w.writerow({k:v.encode('utf8') for k,v in D.items()})
f.close()
Run Code Online (Sandbox Code Playgroud)

您可以使用此想法将UnicodeWriter更新为DictUnicodeWriter:#coding:utf-8 import csv import cStringIO import codecs

class DictUnicodeWriter(object):

    def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, D):
        self.writer.writerow({k:v.encode("utf-8") for k,v in D.items()})
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for D in rows:
            self.writerow(D)

    def writeheader(self):
        self.writer.writeheader()

D1 = {'name':u'??','pinyin':u'M?kè'}
D2 = {'name':u'??','pinyin':u'M?iguó'}
f = open('out.csv','wb')
f.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
w = DictUnicodeWriter(f,sorted(D.keys()))
w.writeheader()
w.writerows([D1,D2])
f.close()
Run Code Online (Sandbox Code Playgroud)

Python 2 unicodecsv示例:

# coding: utf-8
import unicodecsv as csv

D = {u'name':u'??',u'pinyin':u'm?kè'}

with open('out.csv','wb') as f:
    w = csv.DictWriter(f,fieldnames=sorted(D.keys()),encoding='utf-8-sig')
    w.writeheader()
    w.writerow(D)
Run Code Online (Sandbox Code Playgroud)

Python 3:

此外,Python 3的内置csv模块本身支持Unicode:

# coding: utf-8
import csv

D = {u'name':u'??',u'pinyin':u'm?kè'}

# Use newline='' instead of 'wb' in Python 3.
with open('out.csv','w',encoding='utf-8-sig',newline='') as f:
    w = csv.DictWriter(f,fieldnames=sorted(D.keys()))
    w.writeheader()
    w.writerow(D)
Run Code Online (Sandbox Code Playgroud)

  • @endolith:你可以使用`dict((k,v.encode('utf-8')if isinstance(v,unicode)else v)for k,v in D.iteritems())`而不是dict comprehension on Python 2.6. (9认同)
  • `if isinstance(v,unicode)`部分是必不可少的! (4认同)

rla*_*nte 39

使用精彩的UnicodeCSV模块有一个简单的解决方法.拥有它后,只需更改线

import csv
Run Code Online (Sandbox Code Playgroud)

import unicodecsv as csv
Run Code Online (Sandbox Code Playgroud)

它自动开始玩UTF-8.

注意:切换到Python 3也可以解决这个问题(感谢jamescampbell提示).无论如何,这是应该做的事情.

  • 最后omfg - 这是多么噩梦直到这个 (6认同)
  • 这应该是公认的答案 - 如此简单,就像一个魅力 (4认同)

sam*_*ias 15

在将dict传递给时,您可以动态地将值转换为UTF-8 DictWriter.writerow().例如:

import csv

rows = [
    {'name': u'Anton\xedn Dvo\u0159\xe1k','country': u'\u010cesko'},
    {'name': u'Bj\xf6rk Gu\xf0mundsd\xf3ttir', 'country': u'\xcdsland'},
    {'name': u'S\xf8ren Kierkeg\xe5rd', 'country': u'Danmark'}
    ]

# implement this wrapper on 2.6 or lower if you need to output a header
class DictWriterEx(csv.DictWriter):
    def writeheader(self):
        header = dict(zip(self.fieldnames, self.fieldnames))
        self.writerow(header)

out = open('foo.csv', 'wb')
writer = DictWriterEx(out, fieldnames=['name','country'])
# DictWriter.writeheader() was added in 2.7 (use class above for <= 2.6)
writer.writeheader()
for row in rows:
    writer.writerow(dict((k, v.encode('utf-8')) for k, v in row.iteritems()))
out.close()
Run Code Online (Sandbox Code Playgroud)

输出foo.csv:

name,country
Antonín Dvo?ák,?esko
Björk Guðmundsdóttir,Ísland
Søren Kierkegård,Danmark
Run Code Online (Sandbox Code Playgroud)

  • `writer.writerow(dict((k,v.encode('utf-8')if type(v)是unicode else v)for k,v in row.iteritems()))`仅对unicode字符进行编码.因为int/list没有unicode属性. (6认同)

Dan*_*uev 6

您可以根据需要使用某些代理类来编码dict值,如下所示:

# -*- coding: utf-8 -*- 
import csv
d = {'a':123,'b':456, 'c':u'Non-ASCII: ????????'}

class DictUnicodeProxy(object):
    def __init__(self, d):
        self.d = d
    def __iter__(self):
        return self.d.__iter__()
    def get(self, item, default=None):
        i = self.d.get(item, default)
        if isinstance(i, unicode):
            return i.encode('utf-8')
        return i

with open('some.csv', 'wb') as f:
    writer = csv.DictWriter(f, ['a', 'b', 'c'])
    writer.writerow(DictUnicodeProxy(d))
Run Code Online (Sandbox Code Playgroud)