在我的分类方案中,有几个步骤,包括:
在上述方案中要调整的主要参数是百分位数 (2.) 和 SVC (4.) 的超参数,我想通过网格搜索进行调整。
当前的解决方案构建了一个“部分”管道,包括方案中的第 3 步和第 4 步,并将方案clf = Pipeline([('normal',preprocessing.StandardScaler()),('svc',svm.SVC(class_weight='auto'))])
分解为两部分:
调整特征的百分位数以保持通过第一次网格搜索
skf = StratifiedKFold(y)
for train_ind, test_ind in skf:
X_train, X_test, y_train, y_test = X[train_ind], X[test_ind], y[train_ind], y[test_ind]
# SMOTE synthesizes the training data (we want to keep test data intact)
X_train, y_train = SMOTE(X_train, y_train)
for percentile in percentiles:
# Fisher returns the indices of the selected features specified by the parameter 'percentile'
selected_ind = Fisher(X_train, …Run Code Online (Sandbox Code Playgroud)pipeline machine-learning feature-selection scikit-learn cross-validation