add*_*ons 12 python unicode encode
我正处于一个我调用api的场景,并根据api的结果我为api中的每条记录调用数据库.我的api调用返回字符串,当我通过api为数据库调用返回的项时,对于某些元素,我得到以下错误.
Traceback (most recent call last):
File "TopLevelCategories.py", line 267, in <module>
cursor.execute(categoryQuery, {'title': startCategory});
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/cursors.py", line 158, in execute
query = query % db.literal(args)
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 265, in literal
return self.escape(o, self.encoders)
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 203, in unicode_literal
return db.literal(u.encode(unicode_literal.charset))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 3: ordinal not in range(256)
Run Code Online (Sandbox Code Playgroud)
上面错误引用的代码段是:
...
for startCategory in value[0]:
categoryResults = []
try:
categoryRow = ""
baseCategoryTree[startCategory] = []
#print categoryQuery % {'title': startCategory};
cursor.execute(categoryQuery, {'title': startCategory}) #unicode issue
done = False
cont...
Run Code Online (Sandbox Code Playgroud)
在做了一些谷歌搜索后,我在命令行上尝试了以下内容,以了解最新情况......
>>> import sys
>>> u'\u2013'.encode('iso-8859-1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 0: ordinal not in range(256)
>>> u'\u2013'.encode('cp1252')
'\x96'
>>> '\u2013'.encode('cp1252')
'\\u2013'
>>> u'\u2013'.encode('cp1252')
'\x96'
Run Code Online (Sandbox Code Playgroud)
但我不确定解决这个问题的解决方案是什么.另外我不知道encode('cp1252')如果我可以对上面尝试的内容做出一些解释,那背后的理论会是多么好.
Ray*_*ger 16
如果你需要Latin-1编码,你有几个选项可以摆脱en-dash或255以上的其他代码点(Latin-1中不包含的字符):
>>> u = u'hello\u2013world'
>>> u.encode('latin-1', 'replace') # replace it with a question mark
'hello?world'
>>> u.encode('latin-1', 'ignore') # ignore it
'helloworld'
Run Code Online (Sandbox Code Playgroud)
或者做自己的自定义替换:
>>> u.replace(u'\u2013', '-').encode('latin-1')
'hello-world'
Run Code Online (Sandbox Code Playgroud)
如果您不需要输出Latin-1,那么UTF-8是一种常见且首选的选择.它是W3C推荐的,可以很好地编码所有Unicode代码点:
>>> u.encode('utf-8')
'hello\xe2\x80\x93world'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
40005 次 |
| 最近记录: |