Sklearn:如何将数据提供给sklearn RandomForestClassifier

Dav*_*ams 4 python random-forest scikit-learn

我有这些数据:

print training_data
print labels

# prints

[[1, 0, 1, 1], [1, 1, 1, 1], [1, 0, 1, 1], [1, 1, 1, 0], [1, 1, 0, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 0,0], [1, 1, 1, 1], [1, 0, 1, 1]]
['a', 'b', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'b']
Run Code Online (Sandbox Code Playgroud)

我正在尝试从sklearn python库将其提供给RandomForestClassifier.

classifier = RandomForestClassifier(n_estimators=10)
classifier.fit(training_data, labels)
Run Code Online (Sandbox Code Playgroud)

但收到此错误:

Traceback (most recent call last):
  File "learn.py", line 52, in <module>
    main()
  File "learn.py", line 48, in main
    classifier = train_classifier()
  File "learn.py", line 33, in train_classifier
    classifier.fit(training_data, labels)
  File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 348, in fit
    y = np.ascontiguousarray(y, dtype=DOUBLE)
  File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_bbcfcf6_20130307-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 419, in ascontiguousarray
    return array(a, dtype, copy=False, order='C', ndmin=1)
ValueError: could not convert string to float: a
Run Code Online (Sandbox Code Playgroud)

我的猜测是我没有正确格式化这些数据以进行拟合.但我不明白为什么从文档中

这似乎是一个非常基本的简单问题.有谁知道答案?

Mat*_*att 7

尝试使用LabelEncoder预先转换标签.