用英文字母替换python中的语言特定字符

Question

用英文字母替换python中的语言特定字符

Pho*_*ght 3 python string encoding decoding character-encoding

Python 3 中是否有任何方法可以替换英文字母的通用语言特定字符？
例如，我有 function get_city(IP)，它返回与给定 IP 连接的城市名称。它连接到外部数据库，所以我不能改变它的编码方式，我只是从数据库中获取价值。
我想做类似的事情：

city = "?eské Bud?jovice"
city = clear_name(city)
print(city) #should return "Ceske Budejovice"

Run Code Online (Sandbox Code Playgroud)

在这里我使用捷克语，但总的来说它应该适用于任何非亚洲语言。

Answer 1

aso*_*uin 8

尝试unidecode：

# coding=utf-8
from unidecode import unidecode

city = "?eské Bud?jovice"
print(unidecode(city.decode('utf-8')))

Run Code Online (Sandbox Code Playgroud)

Ceske Budejovice根据需要打印（假设您的帖子有错别字）。

您能否指定此解决方案适用于 Python 2.x 并添加适用于 Python 3.x 的解决方案（只需删除 `decode('utf-8')`）？谢谢！ (3认同)

Answer 2

Rom*_*est 5

unicodedata对于这种情况使用模块。
\n 要获得所需的结果，您应该使用unicodedata.normalize()和 \n unicodedata.combining()函数规范化给定字符串：

\n\n

import unicodedata\n\ncity = "\xc4\x8cesk\xc3\xa9 Bud\xc4\x9bjovice"\nnormalized = unicodedata.normalize(\'NFD\', city)\nnew_city = u"".join([c for c in normalized if not unicodedata.combining(c)])\n\nprint(new_city)   # Ceske Budejovice\n

Run Code Online (Sandbox Code Playgroud)\n\n

NFD是四种Unicode 规范化形式之一

\n\n

http://www.unicode.org/reports/tr15/

\n

归档时间：	8 年，11 月前
查看次数：	4219 次
最近记录：	4 年，9 月前