Erk*_*rin 0 python scikit-learn categorical-data
我正在尝试预处理成人数据以进行分类。我使用 scikit-learn 处理分类属性。
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X[:,0] = labelencoder.fit_transform(X[:,0])
labelencoder.classes_
Run Code Online (Sandbox Code Playgroud)
输出:
array(['Federal-gov', 'Local-gov', 'Private', 'Self-emp-inc',
'Self-emp-not-inc', 'State-gov', 'Without-pay'], dtype=object)
Run Code Online (Sandbox Code Playgroud)
新内容:
X[:3]
array([[5, 'Bachelors', 'Under-Graduate', 'Never-married',
'Adm-clerical', 'Not-in-family', 'White', 'Male',
'United-States', 39.0, 77516.0, 13.0, 2174.0, 0.0, 40.0],
[4, 'Bachelors', 'Under-Graduate', 'Married-civ-spouse',
'Exec-managerial', 'Husband', 'White', 'Male', 'United-States',
50.0, 83311.0, 13.0, 0.0, 0.0, 13.0],
[2, 'HS-grad', 'HS-grad', 'Divorced', 'Handlers-cleaners',
'Not-in-family', 'White', 'Male', 'United-States', 38.0,
215646.0, 9.0, 0.0, 0.0, 40.0]], dtype=object)
Run Code Online (Sandbox Code Playgroud)
一切都很好,直到这里。但我需要查看原始属性并尝试返回以下内容:
original = labelencoder.inverse_transform(X[:,0])
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
IndexError Traceback (most recent call last)
<ipython-input-78-f8cf404b255a> in <module>
----> 1 original = labelencoder.inverse_transform(X[:,0])
D:\Anaconda\lib\site-packages\sklearn\preprocessing\label.py in inverse_transform(self, y)
281 "y contains previously unseen labels: %s" % str(diff))
282 y = np.asarray(y)
--> 283 return self.classes_[y]
284
285
IndexError: arrays used as indices must be of integer (or boolean) type
Run Code Online (Sandbox Code Playgroud)
The error comes from the fact that your array has an "object" type. And even if you extract the first column, the type remains "object" (check X[:,0].dtype). Furthermore inverse_transform requires int type. So in order to use inverse_transform you need to cast your vector to int like that:
original = labelencoder.inverse_transform(X[:,0].astype(int))
Run Code Online (Sandbox Code Playgroud)
Output:
array(['a', 'b', 'c'], dtype=object)
Run Code Online (Sandbox Code Playgroud)