我正在尝试将一列字符串转换为整数标识符......我无法在pandas(或python)中找到一种优雅的方法.在下面的例子中,我通过映射将"A"(字符串的列/变量)转换为数字,但对我来说它看起来像一个脏的黑客
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['homer_simpson', 'mean_street', 'homer_simpson', 'bla_bla'], 'B': 4})
unique = df['A'].unique()
mapping = dict(zip(unique, np.arange(len(unique))))
new_df = df.replace({'A': mapping})
Run Code Online (Sandbox Code Playgroud)
是否有更好,更直接的方法来实现这一目标?
怎么用factorize?
>>> labels, uniques = df.A.factorize()
>>> df.A = labels
>>> df
A B
0 0 4
1 1 4
2 0 4
3 2 4
Run Code Online (Sandbox Code Playgroud)
http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.factorize.html