Python 2.7 JSON转储UnicodeEncodeError

Question

Python 2.7 JSON转储UnicodeEncodeError

我有一个文件,其中每一行是一个json对象,如下所示:

{"name": "John", ...}

{...}

Run Code Online (Sandbox Code Playgroud)

我正在尝试使用相同的对象创建一个新文件,但从所有这些文件中删除了某些属性.

当我这样做时,我得到一个UnicodeEncodeError.奇怪的是,如果我改为循环range(n)(对于某些数字n)并使用infile.next()它,它就像我想要的那样工作.

为什么这样？如何通过迭代来实现这一点infile？我尝试使用dumps()而不是dump(),但这只是在一堆空行outfile.

with open(filename, 'r') as infile:
    with open('_{}'.format(filename), 'w') as outfile:
        for comment in infile:
            decodedComment = json.loads(comment)
            for prop in propsToRemove:
                # use pop to avoid exception handling
                decodedComment.pop(prop, None)
            json.dump(decodedComment, outfile, ensure_ascii = False)
            outfile.write('\n')

Run Code Online (Sandbox Code Playgroud)

这是错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f47d' in position 1: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

谢谢您的帮助!

Answer 1

zpl*_*zzi 14

您面临的问题是标准file.write()函数(由json.dump()函数调用)不支持unicode字符串.从错误消息中可以看出,你的字符串包含UTF字符\U0001f47d(结果是字符EXTRATERRESTRIAL ALIEN的代码,谁知道？),以及可能的其他UTF字符.要处理这些字符,您可以将它们编码为ASCII编码(它们将在输出文件中显示为\XXXXXX),或者您需要使用可以处理unicode的文件编写器.

要执行第一个选项,请使用以下行替换您的书写行:

json.dump(unicode(decodedComment), outfile, ensure_ascii = False)

Run Code Online (Sandbox Code Playgroud)

第二个选项可能更符合您的要求,一个简单的选择就是使用该codecs模块.导入它,并将第二行更改为:

with codecs.open('_{}'.format(filename), 'w', encoding="utf-8") as outfile:

Run Code Online (Sandbox Code Playgroud)

然后,您将能够以原始形式保存特殊字符.

归档时间：	10 年，10 月前
查看次数：	3865 次
最近记录：	9 年，8 月前