如何在Python中将字符串转换为utf-8

Question

如何在Python中将字符串转换为utf-8

Bin*_*hen 177 python unicode utf-8 python-2.7

我有一个浏览器,它向我的Python服务器发送utf-8字符,但是当我从查询字符串中检索它时,Python返回的编码是ASCII.如何将纯字符串转换为utf-8？

注意:从Web传递的字符串已经是UTF-8编码的,我只想让Python将其视为UTF-8而不是ASCII.

Answer 1

>>> plain_string = "Hi!"
>>> unicode_string = u"Hi!"
>>> type(plain_string), type(unicode_string)
(<type 'str'>, <type 'unicode'>)

Run Code Online (Sandbox Code Playgroud)

^这是字节字符串(plain_string)和unicode字符串之间的区别.

>>> s = "Hello!"
>>> u = unicode(s, "utf-8")

Run Code Online (Sandbox Code Playgroud)

^转换为unicode并指定编码.

这些都不适用于Python 3,所有字符串都是unicode,并且`unicode()`不存在. (77认同)
,我收到以下错误:`UnicodeDecodeError:'utf8'编解码器无法解码位置2中的字节0xb0:无效的起始字节`这是我的代码:ret = []用于csvReader中的行:cline = []用于elm in line:unicodestr = unicode(elm,'utf-8')cline.append(unicodestr)ret.append(cline) (33认同)
@Tanguy`hisIsAString = u'abcd'.encode('utf-8')` (4认同)
只有文本不包含非ascii字符时,此代码才有效; 字符串上的简单重音字符会使其失败. (3认同)

Answer 2

duh*_*ime 67

如果上述方法不起作用,您还可以告诉Python忽略无法转换为utf-8的字符串部分:

stringnamehere.decode('utf-8', 'ignore')

Run Code Online (Sandbox Code Playgroud)

得到了AttributeError:'str'对象没有属性'decode' (4认同)
Python 默认选择系统编码。在 Windows 10 中，它是 cp1252，与 utf-8 不同。我在 py 3.8 中使用 codecs.open() 时浪费了几个小时 (4认同)
@ saran3h，听起来您正在使用Python 3，在这种情况下，Python *应该*为您处理编码问题。您是否尝试在未指定编码的情况下阅读文档？ (2认同)

Answer 3

小智 21

可能有点矫枉过正,但是当我在同一个文件中使用ascii和unicode时,重复解码会很麻烦,这就是我使用的:

def make_unicode(input):
    if type(input) != unicode:
        input =  input.decode('utf-8')
    return input

Run Code Online (Sandbox Code Playgroud)

正如所写，这不再有效...... python3 中不存在“unicode”类型 (4认同)

Answer 4

小智 14

将以下行添加到.py文件的顶部:

# -*- coding: utf-8 -*-

Run Code Online (Sandbox Code Playgroud)

允许您直接在脚本中编码字符串,如下所示:

utfstr = "????"

Run Code Online (Sandbox Code Playgroud)

这不是OP所要求的。但无论如何都要避免这样的字符串文字。它在 Python 3 中创建 Unicode 字符串（好），但在 Python 2 中创建字节串（坏）。在顶部添加 `from __future__ import unicode_literals` 或使用 `u''` 前缀。不要在“bytes”文字中使用非 ASCII 字符。要获取 utf-8 字节，如果有必要，您可以稍后使用“utf8bytes = unicode_text.encode('utf-8')”。 (2认同)

Answer 5

cod*_*ape 13

如果我理解正确,你的代码中有一个utf-8编码的字节串.

将字节字符串转换为unicode字符串称为解码(unicode - > byte-string is encoding).

您可以使用unicode函数或解码方法执行此操作.或者:

unicodestr = unicode(bytestr, encoding)
unicodestr = unicode(bytestr, "utf-8")

Run Code Online (Sandbox Code Playgroud)

要么:

unicodestr = bytestr.decode(encoding)
unicodestr = bytestr.decode("utf-8")

Run Code Online (Sandbox Code Playgroud)

Answer 6

小智 9

city = 'Ribeir\xc3\xa3o Preto'
print city.decode('cp1252').encode('utf-8')

Run Code Online (Sandbox Code Playgroud)

Answer 7

小智 6

在Python 3.6中,它们没有内置的unicode()方法.默认情况下,字符串已存储为unicode,无需转换.例:

my_str = "\u221a25"
print(my_str)
>>> ?25

Run Code Online (Sandbox Code Playgroud)

Answer 8

Joe*_*008 5

使用 ord() 和 unichar() 进行翻译。每个 unicode char 都有一个关联的数字，类似于索引。所以 Python 有一些方法可以在字符和他的数字之间进行转换。缺点是一个例子。希望它能有所帮助。

>>> C = 'ñ'
>>> U = C.decode('utf8')
>>> U
u'\xf1'
>>> ord(U)
241
>>> unichr(241)
u'\xf1'
>>> print unichr(241).encode('utf8')
ñ

Run Code Online (Sandbox Code Playgroud)

Answer 9

小智 5

url 被转换为 ASCII，对于 Python 服务器来说，它只是一个 Unicode 字符串，例如：\n"T%C3%A9st%C3%A3o"

\n

Python 将“\xc3\xa9”和“\xc3\xa3”理解为实际的%C3%A9 和%C3%A3。

\n

您可以像这样对 URL 进行编码：

\n

import urllib\nurl = "T%C3%A9st%C3%A3o"\nprint(urllib.parse.unquote(url))\n>> T\xc3\xa9st\xc3\xa3o\n

Run Code Online (Sandbox Code Playgroud)\n

有关详细信息，请参阅https://www.adamsmith.haus/python/answers/how-to-decode-a-utf-8-url-in-python。

\n

归档时间：	15 年，3 月前
查看次数：	539629 次
最近记录：	6 年，4 月前