Mar*_*ius 12 python dataframe pandas
如何将特殊字符更改为通常的字母?这是我的数据帧:
In [56]: cities
Out[56]:
Table Code Country Year City Value
240 Åland Islands 2014.0 MARIEHAMN 11437.0 1
240 Åland Islands 2010.0 MARIEHAMN 5829.5 1
240 Albania 2011.0 Durrës 113249.0
240 Albania 2011.0 TIRANA 418495.0
240 Albania 2011.0 Durrës 56511.0
Run Code Online (Sandbox Code Playgroud)
我希望它看起来像这样:
In [56]: cities
Out[56]:
Table Code Country Year City Value
240 Aland Islands 2014.0 MARIEHAMN 11437.0 1
240 Aland Islands 2010.0 MARIEHAMN 5829.5 1
240 Albania 2011.0 Durres 113249.0
240 Albania 2011.0 TIRANA 418495.0
240 Albania 2011.0 Durres 56511.0
Run Code Online (Sandbox Code Playgroud)
EdC*_*ica 26
pandas方法是使用vectorised str.normalize
结合str.decode
和str.encode
:
In [60]:
df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')
Out[60]:
0 Aland Islands
1 Aland Islands
2 Albania
3 Albania
4 Albania
Name: Country, dtype: object
Run Code Online (Sandbox Code Playgroud)
所以要为所有str
dtypes 执行此操作:
In [64]:
cols = df.select_dtypes(include=[np.object]).columns
df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
df
Out[64]:
Table Code Country Year City Value
0 240 Aland Islands 2014.0 MARIEHAMN 11437.0 1
1 240 Aland Islands 2010.0 MARIEHAMN 5829.5 1
2 240 Albania 2011.0 Durres 113249.0
3 240 Albania 2011.0 TIRANA 418495.0
4 240 Albania 2011.0 Durres 56511.0
Run Code Online (Sandbox Code Playgroud)
小智 7
以熊猫系列为例
def remove_accents(a):
return unidecode.unidecode(a.decode('utf-8'))
df['column'] = df['column'].apply(remove_accents)
Run Code Online (Sandbox Code Playgroud)
在这种情况下解码 asciis
这是针对 Python 2.7 的。要转换为 ASCII,您可能需要尝试:
\n\nimport unicodedata\n\nunicodedata.normalize(\'NFKD\', u"Durr\xc3\xabs \xc3\x85land Islands").encode(\'ascii\',\'ignore\')\n\'Durres Aland Islands\'\n
Run Code Online (Sandbox Code Playgroud)\n
归档时间: |
|
查看次数: |
8593 次 |
最近记录: |