ValueError：未知标签类型：应用随机福雷斯特时“连续”

Question

ValueError：未知标签类型：应用随机福雷斯特时“连续”

oct*_*ian 1 python dataframe pandas scikit-learn

我有一个数据集df_train和一些标签df_train_labels。

print(df_train.shape)
print(df_train_labels.shape)

Run Code Online (Sandbox Code Playgroud)

输出：

(1460, 6)
(1460,)

Run Code Online (Sandbox Code Playgroud)

和

print(df_train[0:4])
print(df_train_labels[0:4])

Run Code Online (Sandbox Code Playgroud)

输出

   OverallQual  GrLivArea  GarageCars  TotalBsmtSF  FullBath  YearBuilt
0            1   0.000000           1            1         1          1
1            1   0.000000           0            1         0          1
2            0   0.693147           0            2         0          2
3            0   1.098612           1            3         1          3
0    2.505338
1    2.493950
2    2.510994
3    2.472277
Name: SalePrice, dtype: float64

Run Code Online (Sandbox Code Playgroud)

我正在尝试根据这些数据拟合模型：

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=10)
clf = clf.fit(df_train, df_train_labels)

Run Code Online (Sandbox Code Playgroud)

但是，最后一行失败并出现以下错误：

raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'

Run Code Online (Sandbox Code Playgroud)

我查看了这里和这里，但没有看到任何与我的问题相关的信息。

知道可能出什么问题吗？

Answer 1

oct*_*ian 8

RandomForestClassifier似乎不适用于浮动，所以我改用了RandomForestRegressor。

“似乎不起作用”并不能完全掩盖它根本不是为此设计的事实！你正在做回归而不是分类，所以我们使用了错误的东西。 (2认同)
请首先了解您在这里要解决的问题。您正在尝试预测连续值，这是一项回归任务。 (2认同)

归档时间：	7 年，11 月前
查看次数：	15074 次
最近记录：	5 年，7 月前