对测试数据集使用 cross_val_predict

Question

对测试数据集使用 cross_val_predict

Kab*_*ard 6 python machine-learning scikit-learn data-science

我对在测试数据集中使用 cross_val_predict 感到困惑。

我创建了一个简单的随机森林模型并使用 cross_val_predict 进行预测

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_predict, KFold

lr = RandomForestClassifier(random_state=1, class_weight="balanced", n_estimators=25, max_depth=6)
kf = KFold(train_df.shape[0], random_state=1)
predictions = cross_val_predict(lr,train_df[features_columns], train_df["target"], cv=kf)
predictions = pd.Series(predictions)

Run Code Online (Sandbox Code Playgroud)

我对这里的下一步感到困惑，我如何使用上面学到的知识对测试数据集进行预测？

Answer 1

Jak*_*kub 1

正如 @DmitryPolonskiy 评论的那样，必须先对模型进行训练（使用该fit方法），然后才能将其用于predict.

# Train the model (a.k.a. `fit` training data to it).
lr.fit(train_df[features_columns], train_df["target"])
# Use the model to make predictions based on testing data.
y_pred = lr.predict(test_df[feature_columns])
# Compare the predicted y values to actual y values.
accuracy = (y_pred == test_df["target"]).mean()

Run Code Online (Sandbox Code Playgroud)

cross_val_predict是一种交叉验证方法，可让您确定模型的准确性。看一下sklearn 的交叉验证页面。

归档时间：	9 年，5 月前
查看次数：	5050 次
最近记录：	8 年，10 月前