小编MLe*_*ner的帖子

如何在交叉验证和GridSearchCV中实现SMOTE

我对Python比较陌生。您可以帮助我将SMOTE的实施改进到适当的流程吗?我想要的是对每k倍迭代的训练集应用过采样和欠采样,以便在平衡的数据集上训练模型,并在不平衡的遗漏片段上进行评估。问题是,当我这样做时,无法使用熟悉的sklearn界面进行评估和网格搜索。

有可能做类似的事情吗model_selection.RandomizedSearchCV?我对此:

df = pd.read_csv("Imbalanced_data.csv") #Load the data set
X = df.iloc[:,0:64]
X = X.values
y = df.iloc[:,64]
y = y.values
n_splits = 2
n_measures = 2 #Recall and AUC
kf = StratifiedKFold(n_splits=n_splits) #Stratified because we need balanced samples
kf.get_n_splits(X)
clf_rf = RandomForestClassifier(n_estimators=25, random_state=1)
s =(n_splits,n_measures)
scores = np.zeros(s)
for train_index, test_index in kf.split(X,y):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]
   sm = SMOTE(ratio = 'auto',k_neighbors = 5, n_jobs …
Run Code Online (Sandbox Code Playgroud)

python pipeline scikit-learn cross-validation grid-search

8
推荐指数
1
解决办法
4901
查看次数