XML写入文件UnicodeDecodeError Python 2.7.3

TWh*_*ite 2 python xml dom python-2.7

我搜索了网站,但没有找到适合我的答案.我的问题是我正在尝试将xml写入文件,当我从终端运行脚本时,我得到:

Traceback (most recent call last):
File "fetchWiki.py", line 145, in <module>
pageDictionary = qSQL(users_database)
File "fetchWiki.py", line 107, in qSQL
writeXML(listNS)
File "fetchWiki.py", line 139, in writeXML
f1.write(doc.toprettyxml(indent="\t", encoding="utf-8"))       
File "/usr/lib/python2.7/xml/dom/minidom.py", line 57, in toprettyxml
self.writexml(writer, "", indent, newl, encoding)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1751, in writexml
node.writexml(writer, indent, addindent, newl)
----//---- more lines in here ----//----
self.childNodes[0].writexml(writer, '', '', '')
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1040, in writexml
_write_data(writer, "%s%s%s" % (indent, self.data, newl))
File "/usr/lib/python2.7/xml/dom/minidom.py", line 297, in _write_data
writer.write(data)
File "/usr/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1176: ordinal not
in range(128)
Run Code Online (Sandbox Code Playgroud)

这来自以下代码:

doc = Document()

base = doc.createElement('Wiki')
doc.appendChild(base)

for ns_dict in listNamespaces: 
    namespace = doc.createElement('Namespace')
    base.appendChild(namespace)
    namespace.setAttribute('NS', ns_dict)

    for title in listNamespaces[ns_dict]:

        page = doc.createElement('Page')
        try:
            title.encode('utf8')
            page.setAttribute('Title', title)
        except:
            newTitle = title.decode('latin1', 'ignore')
            newTitle.encode('utf8', 'ignore')
            page.setAttribute('Title', newTitle)

        namespace.appendChild(page)
        text = doc.createElement('Content')
        text_content = doc.createTextNode(listNamespaces[ns_dict][title])
        text.appendChild(text_content)
        page.appendChild(text)

    f1  = open('pageText.xml', 'w')
    f1.write(doc.toprettyxml(indent="\t", encoding="utf-8"))       
    f1.close()
Run Code Online (Sandbox Code Playgroud)

无论是否有编码/解码'igonore'参数,都会发生错误.添加

# -*- coding: utf-8 -*- 
Run Code Online (Sandbox Code Playgroud)

没有帮助.

我使用Eclipse和Pydoc创建了python文档,它没有任何问题,但是当我从终端运行时它出错了.

非常感谢任何帮助,包括我找不到的答案的链接.

谢谢.

Mar*_*ers 7

您不应该编码用于属性的字符串.该minidom库处理那些对你写作时.

您的错误是由字节串与unicode数据混合引起的,并且您的编码字节串不能以ASCII格式解码.

如果您的某些数据已经过编码,而某些数据已经编码,请unicode首先尝试避免这种情况.如果您无法避免必须处理混合数据,请执行以下操作:

page = doc.createElement('Page')
if not isinstance(title, unicode):
    title = title.decode('latin1', 'ignore')
page.setAttribute('Title', title)
Run Code Online (Sandbox Code Playgroud)

请注意,您不需要使用doc.toprettyxml(); 您也可以指示doc.writexml()为您缩进XML:

import codecs
with codecs.open('pageText.xml', 'w', encoding='utf8') as f1:
    doc.writexml(f1, indent='\t', newl='\n')
Run Code Online (Sandbox Code Playgroud)

  • Eclipse更改了终端中的默认编码; 如果这是由*打印*到终端引起的,请参阅http://wiki.python.org/moin/PrintFails. (2认同)