相关疑难解决方法(0)

如何在sklearn中使用gridsearchcv执行特征选择

我recursive feature elimination with cross validation (rfecv)用作以下功能选择器randomforest classifier。

X = df[[my_features]] #all my features
y = df['gold_standard'] #labels

clf = RandomForestClassifier(random_state = 42, class_weight="balanced")
rfecv = RFECV(estimator=clf, step=1, cv=StratifiedKFold(10), scoring='roc_auc')
rfecv.fit(X,y)

print("Optimal number of features : %d" % rfecv.n_features_)
features=list(X.columns[rfecv.support_])

Run Code Online (Sandbox Code Playgroud)

我还执行GridSearchCV以下操作，以调整以下超参数RandomForestClassifier。

X = df[[my_features]] #all my features
y = df['gold_standard'] #labels

x_train, x_test, y_train, y_test = train_test_split(X, y, random_state=0)

rfc = RandomForestClassifier(random_state=42, class_weight = 'balanced')
param_grid = { 
    'n_estimators': [200, 500],
    'max_features': …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn grid-search data-science

EmJ*_*EmJ

2019 04-11

6
推荐指数

2
解决办法

1710
查看次数

进行“离开一个组出去”交叉验证时如何应用过采样？

我正在处理用于分类的不平衡数据，并且之前尝试使用综合少数族裔过采样技术（SMOTE）对培训数据进行过采样。但是，这一次我认为我还需要使用“离开一个组出去”（LOGO）交叉验证，因为我想在每个简历上都留出一个主题。

我不确定我能否很好地解释它，但是据我所知，要使用SMOTE进行k折CV，我们可以在每一折上循环进行SMOTE，正如我在另一篇文章中的代码中所看到的那样。以下是在K折CV上实施SMOTE的示例。

from sklearn.model_selection import KFold
from imblearn.over_sampling import SMOTE
from sklearn.metrics import f1_score

kf = KFold(n_splits=5)

for fold, (train_index, test_index) in enumerate(kf.split(X), 1):
    X_train = X[train_index]
    y_train = y[train_index]  
    X_test = X[test_index]
    y_test = y[test_index]  
    sm = SMOTE()
    X_train_oversampled, y_train_oversampled = sm.fit_sample(X_train, y_train)
    model = ...  # classification model example
    model.fit(X_train, y_train)  
    y_pred = model.predict(X_test)
    print(f'For fold {fold}:')
    print(f'Accuracy: {model.score(X_test, y_test)}')
    print(f'f-score: {f1_score(y_test, y_pred)}')

Run Code Online (Sandbox Code Playgroud)

没有SMOTE，我试图这样做来做LOGO CV。但是通过这样做，我将使用超不平衡数据集。

X = X
y = np.array(df.loc[:, df.columns == 'label'])
groups = …

Run Code Online (Sandbox Code Playgroud)

python machine-learning pandas scikit-learn cross-validation

npm*_*npm

2019 07-13

6
推荐指数

1
解决办法

235
查看次数

标签统计

machine-learning ×2

python ×2

scikit-learn ×2

cross-validation ×1

data-science ×1

grid-search ×1

pandas ×1

如何在sklearn中使用gridsearchcv执行特征选择

进行“离开一个组出去”交叉验证时如何应用过采样？

标签 统计

标签统计