'ascii'编解码器不能编码位置*ord不在范围内的字符(128)

Question

'ascii'编解码器不能编码位置*ord不在范围内的字符(128)

min*_*cha 10 python unicode encode decode

stackoverflow上有几个线程,但我找不到整个问题的有效解决方案.

我从urllib读取函数中收集了大量文本数据,并将其存储在pickle文件中.

现在我想将这些数据写入文件.写作时我得到的错误类似于 -

'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

而且很多数据正在丢失.

我想urllib读取的数据是字节数据

我试过了

   1. text=text.decode('ascii','ignore')
   2. s=filter(lambda x: x in string.printable, s)
   3. text=u''+text
      text=text.decode().encode('utf-8')

Run Code Online (Sandbox Code Playgroud)

但我仍然以类似的错误结束.有人可以指出一个合适的解决方案.并且编解码器也会剥离工作.如果冲突字节没有作为字符串写入文件,那么我就没有问题,因此可以接受丢失.

Answer 1

Tha*_*sas 11

你可以通过做smart_str的Django模块.试试这个:

from django.utils.encoding import smart_str, smart_unicode

text = u'\u2019'
print smart_str(text)

Run Code Online (Sandbox Code Playgroud)

您可以通过启动具有管理员权限的命令shell来安装Django并运行以下命令:

pip install Django

Run Code Online (Sandbox Code Playgroud)

Answer 2

Mar*_*ers 9

您的数据是unicode数据.要将其写入文件,请使用.encode():

text = text.encode('ascii', 'ignore')

Run Code Online (Sandbox Code Playgroud)

但那将删除任何非ASCII的东西.也许您想编码为更合适的编码,如UTF-8,而不是？

您可能想要阅读Python和Unicode:

绝对最低每个软件开发人员绝对必须知道关于Unicode和字符集(没有任何借口!)作者:Joel Spolsky
在Python的Unicode指南
Ned Batchelder的实用Unicode

也许还有[对Unicode说你好](http://kos.gd/2013/02/say-hello-to-unicode/)(无耻插件:-)) (2认同)

归档时间：	12 年，11 月前
查看次数：	12578 次
最近记录：	12 年，11 月前