我有一个36k行的数据集.我想用pandas从它中随机选择9k行.我该如何完成这项任务?
对于下面的函数,我没有得到估计的估计数,但是相反,我得到了以下类型错误。
cv() got an unexpected keyword argument 'show_progress'
Run Code Online (Sandbox Code Playgroud)
即使文档中包含该标志,我也收到类型错误。我正在关注此博客进行参数调整。谁能指出我在哪里错了?博客 还有其他方法可以获取估计量作为输出吗?
def modelfit(alg, dtrain, predictors, useTrainCV=True, cv_folds=5, early_stopping_rounds=50):
if useTrainCV:
xgb_param = alg.get_xgb_params()
xgtrain = xgb.DMatrix(dtrain[predictors].values, label=dtrain[target].values, silent=False)
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], nfold=cv_folds,
metrics='auc', early_stopping_rounds=early_stopping_rounds, show_progress = True)
alg.set_params(n_estimators=cvresult.shape[0])
#Fit the algorithm on the data
alg.fit(dtrain[predictors], dtrain[target],eval_metric='auc')
#Predict training set:
dtrain_predictions = alg.predict(dtrain[predictors])
dtrain_predprob = alg.predict_proba(dtrain[predictors])[:,1]
#Print model report:
print "\nModel Report"
print "Accuracy : %.4g" % metrics.accuracy_score(dtrain[target].values, dtrain_predictions)
print "AUC Score (Train): %f" % metrics.roc_auc_score(dtrain[target], dtrain_predprob)
feat_imp = …Run Code Online (Sandbox Code Playgroud)