在scikit-learn中使用ExtraTreesClassifier时出错

Cro*_*opy 3 python numpy scikit-learn

我试图在scikit中使用ExtraTreesClassifier - 学习我的数据.我有两个numpy数组X和y.X的尺寸为(10000,51),y为(10000,).为了确保它们采用numpy数组格式,我使用

X = numpy.array(X, dtype=np.float32)
print numpy.asarray(X,dtype=np.float32) is X
y = numpy.array(y, dtype=np.float32)
print numpy.asarray(y,dtype=np.float32) is y`
Run Code Online (Sandbox Code Playgroud)

我得到TRUE了两个.然后我将我的模型定义为:

clf = ExtraTreesClassifier(n_estimators=10, max_depth=None, min_samples_split=1, random_state=0, n_jobs = -1)`
Run Code Online (Sandbox Code Playgroud)

当我想用我的模型训练时

clf = clf.fit(X, y)`
Run Code Online (Sandbox Code Playgroud)

我收到以下错误:

File "CFD_scikit_learn.py", line 169, in <module>
clf = Xtra_Trees(my_var)
  File "CFD_scikit_learn.py", line 140, in Xtra_Trees
clf = clf.fit(X, y)
  File "/user/leuven/308/vsc30879/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 235, in fit
y, expanded_class_weight = self._validate_y_class_weight(y)
  File "/user/leuven/308/vsc30879/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 421, in _validate_y_class_weight
check_classification_targets(y)
  File "/user/leuven/308/vsc30879/.local/lib/python2.7/site-packages/sklearn/utils/multiclass.py", line 173, in check_classification_targets
raise ValueError("Unknown label type: %r" % y)
ValueError: Unknown label type: array([[ 2.09895 ],
   [ 1.658568],
   [ 1.242831],
   ..., 
   [ 1.743349],
   [ 1.765763],
   [ 1.824112]])
Run Code Online (Sandbox Code Playgroud)

如果有人知道如何解决这个问题,请不要让我知道.

kwi*_*nks 6

分类器需要整数标签.

您需要将它们转换为整数(例如bin它们),或使用回归类型模型.

如果你认为你可以把花车分成合理的课程,numpy.digitize可能会有所帮助.或者你可以将它们二值化.