Swa*_*ale 3 python unicode python-2.7
我试图在python中读取一个utf-8编码的xml文件,我正在对从文件中读取的行进行一些处理,如下所示:
next_sent_separator_index = doc_content.find(word_value, int(characterOffsetEnd_value) + 1)
Run Code Online (Sandbox Code Playgroud)
其中doc_content是从文件中读取的行,而word_value是来自同一行的字符串之一.每当doc_content或word_value有一些Unicode字符时,我就会在上面的行中获得编码相关的错误.所以,我尝试首先用utf-8解码(而不是默认的ascii编码)解码它们,如下所示:
next_sent_separator_index = doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)
Run Code Online (Sandbox Code Playgroud)
但我仍然得到UnicodeDecodeError如下:
Traceback (most recent call last):
File "snippetRetriver.py", line 402, in <module>
sentences_list,lemmatised_sentences_list = getSentenceList(form_doc)
File "snippetRetriver.py", line 201, in getSentenceList
next_sent_separator_index = doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
任何人都可以建议我一个合适的方法/方法来避免python 2.7中的这种编码错误?
| 归档时间: |
|
| 查看次数: |
5159 次 |
| 最近记录: |