我的数据框中有两列。一列是日期(df["Start_date]),另一列是天数。我想从日期列中减去天数列(df["days"])。
我正在尝试这样的事情
df["new_date"]=df["Start_date"]-datetime.timedelta(days=df["days"])
Run Code Online (Sandbox Code Playgroud) 在随机森林中,predict() 和predict_proba() 都给出了不同的roc_auc_score。
据我所知,predict_proba() 给出了概率,例如在二元分类的情况下,它将给出对应于两个类的两个概率。Predict() 给出了它预测的类。
#Using predict_proba()
rf = RandomForestClassifier(n_estimators=200, random_state=39)
rf.fit(X_train[['Cabin_mapped', 'Sex']], y_train)
#make predictions on train and test set
pred_train = rf.predict_proba(X_train[['Cabin_mapped', 'Sex']])
pred_test = rf.predict_proba(X_test[['Cabin_mapped', 'Sex']].fillna(0))
print('Train set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score (y_train, pred_train)))
print('Test set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_test, pred_test)))
#using predict()
pred_train = rf.predict(X_train[['Cabin_reduced', 'Sex']])
pred_test = rf.predict(X_test[['Cabin_reduced', 'Sex']])
print('Train set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_train, pred_train)))
print('Test set')
print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_test, pred_test)))
Run Code Online (Sandbox Code Playgroud)
使用 Predict_proba roc-auc …
我是Python的新手.可以用regex来完成.我想在字符串中搜索特定的子字符串,并在字符串中删除字符前后的字符.
例1
Input:"This is the consignment no 1234578TP43789"
Output:"This is the consignment no TP"
Run Code Online (Sandbox Code Playgroud)
例2
Input:"Consignment no 1234578TP43789 is on its way on vehicle no 3456MP567890"
Output:"Consignment no TP is on its way on vehicle no MP"
Run Code Online (Sandbox Code Playgroud)
我有要在字符串中搜索的这些首字母缩写词(MP,TP)的列表.