我recursive feature elimination with cross validation (rfecv)用作以下功能选择器randomforest classifier。
X = df[[my_features]] #all my features
y = df['gold_standard'] #labels
clf = RandomForestClassifier(random_state = 42, class_weight="balanced")
rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(10), scoring='roc_auc')
rfecv.fit(X,y)
print("Optimal number of features : %d" % rfecv.n_features_)
features=list(X.columns[rfecv.support_])
Run Code Online (Sandbox Code Playgroud)
我还执行GridSearchCV以下操作,以调整以下超参数RandomForestClassifier。
X = df[[my_features]] #all my features
y = df['gold_standard'] #labels
x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0)
rfc = RandomForestClassifier(random_state=42, class_weight = 'balanced')
param_grid = {
'n_estimators': [200, 500],
'max_features': …Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn grid-search data-science