如何替换熊猫数据框列中的重音

Question

如何替换熊猫数据框列中的重音

ema*_*max 5 python string unicode decode pandas

我有一个dataSwiss包含瑞士城市信息的数据框。我想用普通字母用重音符号替换字母。

这就是我正在做的：

dataSwiss['Municipality'] = dataSwiss['Municipality'].str.encode('utf-8')
dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")

Run Code Online (Sandbox Code Playgroud)

但我收到以下错误：

----> 2 dataSwiss['Municipality'] = dataSwiss['Municipality'].str.replace(u"é", "e")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

数据看起来像：

dataSwiss.Municipality
0               Zürich
1               Zürich
2               Zürich
3               Zürich
4               Zürich
5               Zürich
6               Zürich
7               Zürich

Run Code Online (Sandbox Code Playgroud)

我找到了解决方案

s = dataSwiss['Municipality']
res = s.str.decode('utf-8')
res = res.str.replace(u"é", "e")

Run Code Online (Sandbox Code Playgroud)

Answer 1

jpp*_*jpp 7

这是一种方式。您可以先转换为字节文字，然后再解码为 utf-8。

s = pd.Series(['hello', 'héllo', 'Zürich', 'Zurich'])

res = s.str.normalize('NFKD')\
       .str.encode('ascii', errors='ignore')\
       .str.decode('utf-8')

print(res)

0     hello
1     hello
2    Zurich
3    Zurich
dtype: object

Run Code Online (Sandbox Code Playgroud)

pd.Series.str.normalize使用unicodedata模块。根据文档：

范式 KD (NFKD) 将应用兼容性分解，即将所有兼容性字符替换为其等价物。

归档时间：	8 年前
查看次数：	5717 次
最近记录：	8 年前