如何从列中的值中删除重音?

Mar*_*ius 12 python dataframe pandas

如何将特殊字符更改为通常的字母?这是我的数据帧:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         Åland Islands   2014.0      MARIEHAMN   11437.0 1
240         Åland Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durrës      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durrës      56511.0 
Run Code Online (Sandbox Code Playgroud)

我希望它看起来像这样:

In [56]: cities
Out[56]:

Table Code  Country         Year        City        Value       
240         Aland Islands   2014.0      MARIEHAMN   11437.0 1
240         Aland Islands   2010.0      MARIEHAMN   5829.5  1
240         Albania         2011.0      Durres      113249.0
240         Albania         2011.0      TIRANA      418495.0
240         Albania         2011.0      Durres      56511.0 
Run Code Online (Sandbox Code Playgroud)

EdC*_*ica 26

pandas方法是使用vectorised str.normalize结合str.decodestr.encode:

In [60]:
df['Country'].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')

Out[60]:
0    Aland Islands
1    Aland Islands
2          Albania
3          Albania
4          Albania
Name: Country, dtype: object
Run Code Online (Sandbox Code Playgroud)

所以要为所有strdtypes 执行此操作:

In [64]:
cols = df.select_dtypes(include=[np.object]).columns
df[cols] = df[cols].apply(lambda x: x.str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8'))
df

Out[64]:
   Table Code        Country    Year       City      Value
0         240  Aland Islands  2014.0  MARIEHAMN  11437.0 1
1         240  Aland Islands  2010.0  MARIEHAMN  5829.5  1
2         240        Albania  2011.0     Durres   113249.0
3         240        Albania  2011.0     TIRANA   418495.0
4         240        Albania  2011.0     Durres    56511.0
Run Code Online (Sandbox Code Playgroud)

  • 这应该是选定的答案,正确解决问题。 (3认同)

小智 7

熊猫系列为例

def remove_accents(a):
    return unidecode.unidecode(a.decode('utf-8'))

df['column'] = df['column'].apply(remove_accents)
Run Code Online (Sandbox Code Playgroud)

在这种情况下解码 asciis


adv*_*512 5

这是针对 Python 2.7 的。要转换为 ASCII,您可能需要尝试:

\n\n
import unicodedata\n\nunicodedata.normalize(\'NFKD\', u"Durr\xc3\xabs \xc3\x85land Islands").encode(\'ascii\',\'ignore\')\n\'Durres Aland Islands\'\n
Run Code Online (Sandbox Code Playgroud)\n