Kab*_*ard 6 python machine-learning scikit-learn data-science
我对在测试数据集中使用 cross_val_predict 感到困惑。
我创建了一个简单的随机森林模型并使用 cross_val_predict 进行预测
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_predict, KFold
lr = RandomForestClassifier(random_state=1, class_weight="balanced", n_estimators=25, max_depth=6)
kf = KFold(train_df.shape[0], random_state=1)
predictions = cross_val_predict(lr,train_df[features_columns], train_df["target"], cv=kf)
predictions = pd.Series(predictions)
Run Code Online (Sandbox Code Playgroud)
我对这里的下一步感到困惑,我如何使用上面学到的知识对测试数据集进行预测?
正如 @DmitryPolonskiy 评论的那样,必须先对模型进行训练(使用该fit方法),然后才能将其用于predict.
# Train the model (a.k.a. `fit` training data to it).
lr.fit(train_df[features_columns], train_df["target"])
# Use the model to make predictions based on testing data.
y_pred = lr.predict(test_df[feature_columns])
# Compare the predicted y values to actual y values.
accuracy = (y_pred == test_df["target"]).mean()
Run Code Online (Sandbox Code Playgroud)
cross_val_predict是一种交叉验证方法,可让您确定模型的准确性。看一下sklearn 的交叉验证页面。
| 归档时间: |
|
| 查看次数: |
5050 次 |
| 最近记录: |