如何将分类数据转换为数值数据?

sto*_*ock 0 python pandas

我有功能=> city这是分类数据,即字符串,但是使用硬编码代替硬编码replace()吗?

train['city'].unique()
Output: ['city_149', 'city_83', 'city_16', 'city_64', 'city_100', 'city_21',
       'city_114', 'city_103', 'city_97', 'city_160', 'city_65',
       'city_90', 'city_75', 'city_136', 'city_159', 'city_67', 'city_28',
       'city_10', 'city_73', 'city_76', 'city_104', 'city_27', 'city_30',
       'city_61', 'city_99', 'city_41', 'city_142', 'city_9', 'city_116',
       'city_128', 'city_74', 'city_69', 'city_1', 'city_176', 'city_40',
       'city_123', 'city_152', 'city_165', 'city_89', 'city_36', .......]
Run Code Online (Sandbox Code Playgroud)

我正在尝试的是:

train.replace(['city_149', 'city_83', 'city_16', 'city_64', 'city_100', 'city_21',
           'city_114', 'city_103', 'city_97', 'city_160', 'city_65',
           'city_90', 'city_75', 'city_136', 'city_159', 'city_67', 'city_28',
           'city_10', 'city_73', 'city_76', 'city_104', 'city_27', 'city_30',
           'city_61', 'city_99', 'city_41', 'city_142', 'city_9', 'city_116',
           'city_128', 'city_74', 'city_69', 'city_1', 'city_176', 'city_40',
           'city_123', 'city_152', 'city_165', 'city_89', 'city_36', .......], [1,2,3,4,5,6,7,8,9....], inplace=True)
Run Code Online (Sandbox Code Playgroud)

有没有更好的方法可以将数据转换为数值?因为唯一值的数量为123。所以我需要对1,2,3,4,... 123中的数字进行硬编码以进行转换。提出一些更好的方法将其转换为数值。

sac*_*cuL 5

尝试pd.factorize()

train['city'] = pd.factorize(train.city)[0]
Run Code Online (Sandbox Code Playgroud)

categoricaldtypes

train['city'] = train['city'].astype('category').cat.codes
Run Code Online (Sandbox Code Playgroud)

例如:

>>> train
       city
0  city_151
1  city_149
2  city_151
3  city_149
4  city_149
5  city_149
6  city_151
7  city_151
8  city_150
9  city_151
Run Code Online (Sandbox Code Playgroud)

factorize

train['city'] = pd.factorize(train.city)[0]

>>> train
   city
0     0
1     1
2     0
3     1
4     1
5     1
6     0
7     0
8     2
9     0
Run Code Online (Sandbox Code Playgroud)

astype('category')

train['city'] = train['city'].astype('category').cat.codes

>>> train
   city
0     2
1     0
2     2
3     0
4     0
5     0
6     2
7     2
8     1
9     2
Run Code Online (Sandbox Code Playgroud)