字符串编码 IDNA -> UTF-8 (Python)

Question

字符串编码 IDNA -> UTF-8 (Python)

use*_*837 2 python string encoding character-encoding

字符串编码和格式总是让我困惑。

\n\n

这是我所拥有的：

\n\n

'\xe0\xb9\x84\xe0\xb8\x97\xe0\xb8\xa2'

\n\n

我相信是 UTF-8，并且

\n\n

'xn--o3cw4h'

\n\n

这应该与 IDNA 编码相同。但是，我不知道如何让 python 从一种转换为另一种。

\n\n

我只是在尝试

\n\n

a = u'xn--o3cw4h'\nb = a.encode('idna')\nb.decode('utf-8')\n

Run Code Online (Sandbox Code Playgroud)\n\n

但我得到了完全相同的字符串（'xn--o3cw4h'，尽管不再是 unicode）。我目前使用的是 python 3.5。

\n

Answer 1

Rob*_*obᵩ 7

要从一种编码转换为另一种编码，必须首先将字符串解码为 Unicode，然后以目标编码再次对其进行编码。

\n\n

因此，例如：

\n\n

idna_encoded_bytes = b'xn--o3cw4h'\nunicode_string = idna_encoded_bytes.decode('idna')\nutf8_encoded_bytes = unicode_string.encode('utf-8')\n\nprint (repr(idna_encoded_bytes))\nprint (repr(utf8_encoded_bytes))\nprint (repr(unicode_string))\n

Run Code Online (Sandbox Code Playgroud)\n\n

Python2结果：

\n\n

'xn--o3cw4h'\n'\\xe0\\xb9\\x84\\xe0\\xb8\\x97\\xe0\\xb8\\xa2'\nu'\\u0e44\\u0e17\\u0e22'\n

Run Code Online (Sandbox Code Playgroud)\n\n

可以看到，第一行是\xe0\xb9\x84\xe0\xb8\x97\xe0\xb8\xa2的IDNA编码，第二行是utf8编码，最后一行是Unicode未编码的序列代码点 U-0E44、U-0E17 和 U-0E22。

\n\n

要一步完成转换，只需链接操作：

\n\n

utf8_encoded_bytes = idna_encoded_bytes.decode('idna').encode('utf8')\n

Run Code Online (Sandbox Code Playgroud)\n\n

\n\n

回复评论：

\n\n

\n
我开始的不是 b'xn--o3cw4h'，而是字符串 'xn--o3cw4h'。[在Python3中]。
\n

\n\n

你那里有一只奇怪的鸭子。您已将明显编码的数据存储在 unicode 字符串中。我们需要bytes以某种方式将其转换为对象。一个简单的方法是使用（令人困惑的）ASCII 编码：

\n\n

improperly_encoded_idna = 'xn--o3cw4h'\nidna_encoded_bytes = improperly_encoded_idna.encode('ascii')\nunicode_string = idna_encoded_bytes.decode('idna')\nutf8_encoded_bytes = unicode_string.encode('utf-8')\n\nprint (repr(idna_encoded_bytes))\nprint (repr(utf8_encoded_bytes))\nprint (repr(unicode_string))\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	8 年，11 月前
查看次数：	5558 次
最近记录：	8 年，11 月前

字符串编码 IDNA -&gt; UTF-8 (Python)

字符串编码 IDNA -> UTF-8 (Python)