将json.dumps中的utf-8文本保存为UTF8,而不是\ u转义序列

Question

将json.dumps中的utf-8文本保存为UTF8,而不是\ u转义序列

Ber*_*ala 394 python unicode json escaping utf-8

示例代码:

>>> import json
>>> json_string = json.dumps("??? ????")
>>> print json_string
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

Run Code Online (Sandbox Code Playgroud)

问题是:它不是人类可读的.我(智能)用户想要使用JSON转储验证甚至编辑文本文件.(我宁愿不使用XML)

有没有办法将对象序列化为utf-8 json字符串(而不是\ uXXXX)？

这没有帮助:

>>> import json
>>> json_string = json.dumps("??? ????")
>>> print json_string
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

Run Code Online (Sandbox Code Playgroud)

这工作,但如果任何子对象是python-unicode而不是utf-8,它将转储垃圾:

>>> import json
>>> json_string = json.dumps("??? ????")
>>> print json_string
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_*ers 531

使用ensure_ascii=False开关json.dumps(),然后手动将值编码为UTF-8:

>>> json_string = json.dumps("??? ????", ensure_ascii=False).encode('utf8')
>>> json_string
b'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
>>> print(json_string.decode())
"??? ????"

Run Code Online (Sandbox Code Playgroud)

如果要将其写入文件,则可以使用json.dump()而不是在编写时生io.open()成为您编码Unicode值的文件对象,然后使用它open()来写入该文件:

with open('filename', 'w', encoding='utf8') as json_file:
    json.dump("??? ????", json_file, ensure_ascii=False)

Run Code Online (Sandbox Code Playgroud)

在Python 3中,内置json.dump()是一个别名json.请注意,是中错误ensure_ascii=False模块,其中unicode标志可以产生一个混合的str和str对象.Python 2的解决方法是:

with io.open('filename', 'w', encoding='utf8') as json_file:
    json.dump(u"??? ????", json_file, ensure_ascii=False)

Run Code Online (Sandbox Code Playgroud)

如果要传入编码为UTF-8的字节字符串(encodingPython 2中的类型,ensure_ascii=False在Python 3中),请确保也设置json.dumps()关键字:

with io.open('filename', 'w', encoding='utf8') as json_file:
    data = json.dumps(u"??? ????", ensure_ascii=False)
    # unicode(data) auto-decodes data to unicode if str
    json_file.write(unicode(data))

Run Code Online (Sandbox Code Playgroud)

请注意,您的第二个示例不是有效的Unicode; 你给它UTF-8字节作为unicode文字,这将永远不会工作:

>>> d={ 1: "??? ????", 2: u"??? ????" }
>>> d
{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}

>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
>>> json.loads(s)['1']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> json.loads(s)['2']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> print json.loads(s)['1']
??? ????
>>> print json.loads(s)['2']
??? ????

Run Code Online (Sandbox Code Playgroud)

只有当我将该字符串编码为Latin 1(其unicode代码点一对一映射到字节)然后解码为UTF-8时,您是否看到了预期的输出.这与JSON无关,而且与使用错误输入的一切有关.结果被称为Mojibake.

如果从字符串文字中获取Unicode值,则使用错误的编解码器对其进行解码.可能是您的终端配置错误,或者您的文本编辑器使用与您告诉Python读取文件不同的编解码器保存了源代码.或者您从应用了错误编解码器的库中获取它.这一切都与JSON库无关.

往返“编码”/“解码”似乎没有必要。只需设置 `ensure_ascii=False` （根据[这个答案](/sf/answers/2840990071/)）似乎就足够了。 (10认同)
@AdamAL请更彻底地阅读我的答案：除了解码调用之外，这个答案中没有往返，该解码调用只是为了证明字节值确实包含UTF-8编码数据。我的答案中的第二个代码片段直接写入文件，仅设置“ensure_ascii=False”。注意：我强烈建议不要使用“codecs.open()”函数；该库早于“io”，并且流实现有很多未解决的问题。 (3认同)

Answer 2

Trầ*_*iệp 59

容易像蛋糕

写入文件

import codecs
import json

with codecs.open('your_file.txt', 'w', encoding='utf-8') as f:
    json.dump({"message":"xin chào vi?t nam"}, f, ensure_ascii=False)

Run Code Online (Sandbox Code Playgroud)

打印到stdin

import codecs
import json
print(json.dumps({"message":"xin chào vi?t nam"}, ensure_ascii=False))

Run Code Online (Sandbox Code Playgroud)

Answer 3

siv*_*ivi 30

在此感谢您的原始答案。对于 Python\xc2\xa03，以下代码行：

\n

print(json.dumps(result_dict,ensure_ascii=False))\n

Run Code Online (Sandbox Code Playgroud)\n

还好。如果不是命令性的，请考虑不要在代码中编写太多文本。

\n

这对于 Python 控制台来说可能已经足够了。但是，为了满足服务器的要求，您可能需要按照此处的说明设置区域设置（如果位于 Apache\xc2\xa02 上）\n使用 mod_wsgi 时设置 LANG 和 LC_ALL

\n

基本上，在 Ubuntu 上安装 he_IL 或任何语言区域设置。\n检查它是否已安装：

\n

locale -a\n

Run Code Online (Sandbox Code Playgroud)\n

安装它，其中 XX 是您的语言：

\n

sudo apt-get install language-pack-XX\n

Run Code Online (Sandbox Code Playgroud)\n

例如：

\n

sudo apt-get install language-pack-he\n

Run Code Online (Sandbox Code Playgroud)\n

添加以下文本到/etc/apache2/envvrs

\n

export LANG=\'he_IL.UTF-8\'\nexport LC_ALL=\'he_IL.UTF-8\'\n

Run Code Online (Sandbox Code Playgroud)\n

那么您希望不会从 Apache 收到 Python 错误，例如：

\n

\n
print (js)\nUnicodeEncodeError: \'ascii\' 编解码器无法对位置 41-45 中的字符进行编码：序数不在范围内(128)
\n

\n

同样在 Apache 中，尝试将 UTF 设置为默认编码，如下所述：\n如何将 Apache 的默认编码更改为 UTF-8

\n

尽早执行此操作，因为 Apache 错误可能很难调试，并且您可能会错误地认为它来自 Python，但在这种情况下可能并非如此。

\n

Answer 4

mon*_*ius 27

更新:这是错误的答案,但理解为什么它是错的仍然是有用的.看评论.

怎么样unicode-escape？

>>> d = {1: "??? ????", 2: u"??? ????"}
>>> json_str = json.dumps(d).decode('unicode-escape').encode('utf8')
>>> print json_str
{"1": "??? ????", "2": "??? ????"}

Run Code Online (Sandbox Code Playgroud)

`unicode-escape`不是必需的:你可以使用`json.dumps(d,ensure_ascii = False).encode('utf8')`代替.并且不能保证json在*all*情况下使用与Python中的`unicode-escape`编解码器完全相同的*规则,即结果在某些极端情况下可能会或可能不会相同.downvote是针对不必要的,可能是错误的转换.不相关:`print json_str`仅适用于utf8语言环境,或者如果`PYTHONIOENCODING` envvar在此处指定utf8(而不是打印Unicode). (9认同)
另一个问题:字符串值中的任何双引号都将丢失它们的转义,因此这将导致*破坏的JSON输出*. (3认同)

Answer 5

小智 24

Peters的python 2解决方案在边缘情况下失败:

d = {u'keyword': u'bad credit  \xe7redit cards'}
with io.open('filename', 'w', encoding='utf8') as json_file:
    data = json.dumps(d, ensure_ascii=False).decode('utf8')
    try:
        json_file.write(data)
    except TypeError:
        # Decode data to Unicode first
        json_file.write(data.decode('utf8'))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 25: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

它崩溃在第3行的.decode('utf8')部分.我通过避免该步骤以及ascii的特殊外壳使程序更简单来解决问题:

with io.open('filename', 'w', encoding='utf8') as json_file:
  data = json.dumps(d, ensure_ascii=False, encoding='utf8')
  json_file.write(unicode(data))

cat filename
{"keyword": "bad credit  çredit cards"}

Run Code Online (Sandbox Code Playgroud)

"边缘案例"对我来说只是一个愚蠢的未经测试的错误.你的`unicode(data)`方法是更好的选择,而不是使用异常处理.请注意,`encoding ='utf8'`关键字参数与`json.dumps()`生成的输出无关; 它用于解码函数接收的*`str`输入*. (2认同)
@MartijnPieters:或者更简单:`open('filename','wb').write(json.dumps(d,ensure_ascii = False).encode('utf8'))`无论`dumps`是否返回它都有效(仅ascii) )str或unicode对象. (2认同)

Answer 6

小智 12

使用unicode-escape解决问题

\n

>>>import json\n>>>json_string = json.dumps("\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94")\n>>>json_string.encode(\'ascii\').decode(\'unicode-escape\')\n\'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"\'\n

Run Code Online (Sandbox Code Playgroud)\n

解释

\n

>>>s = \'\xe6\xbc\xa2  \xcf\x87\xce\xb1\xce\xbd  \xd1\x85\xd0\xb0\xd0\xbd\'\n>>>print(\'Unicode: \' + s.encode(\'unicode-escape\').decode(\'utf-8\'))\n\nUnicode: \\u6f22  \\u03c7\\u03b1\\u03bd  \\u0445\\u0430\\u043d\n\n>>>u = s.encode(\'unicode-escape\').decode(\'utf-8\')\n>>>print(\'Original: \' + u.encode("utf-8").decode(\'unicode-escape\'))\n\nOriginal: \xe6\xbc\xa2  \xcf\x87\xce\xb1\xce\xbd  \xd1\x85\xd0\xb0\xd0\xbd\n

Run Code Online (Sandbox Code Playgroud)\n

原始资源\xef\xbc\x9a Python3 \xe4\xbd\xbf\xe7\x94\xa8 unicode-escape \xe5\xa4\x84\xe7\x90\x86 unicode 16\xe8\xbf\x9b\xe5\x88\xb6 \xe5\xad\x97\xe7\xac\xa6\xe4\xb8\xb2\xe7\xbc\x96\xe8\xa7\xa3\xe7\xa0\x81\xe9\x97\xae\xe9\xa2\x98

\n

Answer 7

Cha*_*rma 8

如果您从文件加载 JSON 字符串并且文件内容是阿拉伯文本，那么这将起作用。

\n

假设有一个类似arabic.json的文件

\n
{\n "key1": "\xd9\x84\xd9\x85\xd8\xb3\xd8\xaa\xd8\xae\xd8\xaf\xd9\x85\xd9\x8a\xd9\x86",\n "key2": "\xd8\xa5\xd8\xb6\xd8\xa7\xd9\x81\xd8\xa9 \xd9\x85\xd8\xb3\xd8\xaa\xd8\xae\xd8\xaf\xd9\x85"\n}\n
Run Code Online (Sandbox Code Playgroud)\n
从arabic.json文件中获取阿拉伯语内容
\n
with open(arabic.json, encoding='utf-8') as f:\n # Deserialises it\n json_data = json.load(f)\n f.close()\n\n# JSON formatted string\njson_data2 = json.dumps(json_data, ensure_ascii = False)\n
Run Code Online (Sandbox Code Playgroud)\n
要在 Django 模板中使用 JSON 数据，请按照以下步骤操作：
\n
# If have to get the JSON index in a Django template file, then simply decode the encoded string.\n\njson.JSONDecoder().decode(json_data2)\n
Run Code Online (Sandbox Code Playgroud)\n
完毕！现在我们可以获得带有阿拉伯值的 JSON 索引结果。
\n

Answer 8

Che*_*ney 7

以下是我的理解var阅读上面的答案和谷歌.

# coding:utf-8
r"""
@update: 2017-01-09 14:44:39
@explain: str, unicode, bytes in python2to3
    #python2 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
    #1.reload
    #importlib,sys
    #importlib.reload(sys)
    #sys.setdefaultencoding('utf-8') #python3 don't have this attribute.
    #not suggest even in python2 #see:http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script
    #2.overwrite /usr/lib/python2.7/sitecustomize.py or (sitecustomize.py and PYTHONPATH=".:$PYTHONPATH" python)
    #too complex
    #3.control by your own (best)
    #==> all string must be unicode like python3 (u'xx'|b'xx'.encode('utf-8')) (unicode 's disappeared in python3)
    #see: http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes

    #how to Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence
    #http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence
"""

from __future__ import print_function
import json

a = {"b": u"??"}  # add u for python2 compatibility
print('%r' % a)
print('%r' % json.dumps(a))
print('%r' % (json.dumps(a).encode('utf8')))
a = {"b": u"??"}
print('%r' % json.dumps(a, ensure_ascii=False))
print('%r' % (json.dumps(a, ensure_ascii=False).encode('utf8')))
# print(a.encode('utf8')) #AttributeError: 'dict' object has no attribute 'encode'
print('')

# python2:bytes=str; python3:bytes
b = a['b'].encode('utf-8')
print('%r' % b)
print('%r' % b.decode("utf-8"))
print('')

# python2:unicode; python3:str=unicode
c = b.decode('utf-8')
print('%r' % c)
print('%r' % c.encode('utf-8'))
"""
#python2
{'b': u'\u4e2d\u6587'}
'{"b": "\\u4e2d\\u6587"}'
'{"b": "\\u4e2d\\u6587"}'
u'{"b": "\u4e2d\u6587"}'
'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'

'\xe4\xb8\xad\xe6\x96\x87'
u'\u4e2d\u6587'

u'\u4e2d\u6587'
'\xe4\xb8\xad\xe6\x96\x87'

#python3
{'b': '??'}
'{"b": "\\u4e2d\\u6587"}'
b'{"b": "\\u4e2d\\u6587"}'
'{"b": "??"}'
b'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'

b'\xe4\xb8\xad\xe6\x96\x87'
'??'

'??'
b'\xe4\xb8\xad\xe6\x96\x87'
"""

Run Code Online (Sandbox Code Playgroud)

Answer 9

小智 5

这是我使用json.dump()的解决方案:

def jsonWrite(p, pyobj, ensure_ascii=False, encoding=SYSTEM_ENCODING, **kwargs):
    with codecs.open(p, 'wb', 'utf_8') as fileobj:
        json.dump(pyobj, fileobj, ensure_ascii=ensure_ascii,encoding=encoding, **kwargs)

Run Code Online (Sandbox Code Playgroud)

其中SYSTEM_ENCODING设置为:

locale.setlocale(locale.LC_ALL, '')
SYSTEM_ENCODING = locale.getlocale()[1]

Run Code Online (Sandbox Code Playgroud)

Answer 10

Yul*_*GUO 5

如果可能的话使用编解码器，

with codecs.open('file_path', 'a+', 'utf-8') as fp:
    fp.write(json.dumps(res, ensure_ascii=False))

Run Code Online (Sandbox Code Playgroud)

Answer 11

Nik*_*Nik 5

从Python 3.7开始，以下代码可以正常运行：

from json import dumps
result = {"symbol": "ƒ"}
json_string = dumps(result, sort_keys=True, indent=2, ensure_ascii=False)
print(json_string)

Run Code Online (Sandbox Code Playgroud)

输出：

{"symbol": "ƒ"}

Run Code Online (Sandbox Code Playgroud)

也在python 3.6中（刚刚验证）。 (2认同)

归档时间：	12 年，5 月前
查看次数：	290706 次
最近记录：	6 年，3 月前

将json.dumps中的utf-8文本保存为UTF8,而不是\ u转义序列

使用unicode-escape解决问题

解释

假设有一个类似arabic.json的文件

从arabic.json文件中获取阿拉伯语内容

要在 Django 模板中使用 JSON 数据，请按照以下步骤操作：