给定一个名为“m”的机器学习模型 RBF SVC,我对 gamma 值执行了 gridSearchCV,以优化召回率。我想回答这个问题:“网格搜索应该找到最优化召回率的模型。这个模型的召回率比精度好多少?”
所以我做了 gridSearchCV:
grid_values = {'gamma': [0.001, 0.01, 0.05, 0.1, 1, 10, 100]}
grid_m_re = GridSearchCV(m, param_grid = grid_values, scoring = 'recall')
grid_m_re.fit(X_train, y_train)
y_decision_fn_scores_re = grid_m_re.decision_function(X_test)
print('Grid best parameter (max. recall): ', grid_m_re.best_params_)
print('Grid best score (recall): ', grid_m_re.best_score_)
Run Code Online (Sandbox Code Playgroud)
这告诉我最好的模型是 gamma=0.001,它的召回分数为 1。
我想知道如何获得此模型的精度以获取此模型的交易,因为 GridSearchCV 仅具有获取其优化目的的属性。( [Doc sklearn.GridSearchCV][1])
我正在使用scikit,正在尝试调整XGBoost。我尝试使用嵌套的交叉验证,通过管道对训练折叠进行重新缩放(以避免数据泄漏和过度拟合),并与GridSearchCV并行进行参数调整,并与cross_val_score并行获得roc_auc得分。
from imblearn.pipeline import Pipeline
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from xgboost import XGBClassifier
std_scaling = StandardScaler()
algo = XGBClassifier()
steps = [('std_scaling', StandardScaler()), ('algo', XGBClassifier())]
pipeline = Pipeline(steps)
parameters = {'algo__min_child_weight': [1, 2],
'algo__subsample': [0.6, 0.9],
'algo__max_depth': [4, 6],
'algo__gamma': [0.1, 0.2],
'algo__learning_rate': [0.05, 0.5, 0.3]}
cv1 = RepeatedKFold(n_splits=2, n_repeats = 5, random_state = 15)
clf_auc = GridSearchCV(pipeline, cv = cv1, param_grid = parameters, scoring = 'roc_auc', n_jobs=-1, return_train_score=False)
cv1 = RepeatedKFold(n_splits=2, …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用GridSearchCVLightGBMsklearn估计器,但在构建搜索时遇到问题。
我要构建的代码如下所示:
d_train = lgb.Dataset(X_train, label=y_train)
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'gbdt'
params['objective'] = 'binary'
params['metric'] = 'binary_logloss'
params['sub_feature'] = 0.5
params['num_leaves'] = 10
params['min_data'] = 50
params['max_depth'] = 10
clf = lgb.train(params, d_train, 100)
param_grid = {
'num_leaves': [10, 31, 127],
'boosting_type': ['gbdt', 'rf'],
'learning rate': [0.1, 0.001, 0.003]
}
gsearch = GridSearchCV(estimator=clf, param_grid=param_grid)
lgb_model = gsearch.fit(X=train, y=y)
Run Code Online (Sandbox Code Playgroud)
但是我遇到了以下错误:
TypeError: estimator should be an estimator implementing 'fit' method,
<lightgbm.basic.Booster object at 0x0000014C55CA2880> …Run Code Online (Sandbox Code Playgroud) 我有一些分类器是使用Grid Search创建的,其他分类器是直接创建为Random Forests 的。
随机森林返回 type sklearn.ensemble.forest.RandomForestClassifier,使用 gridSearch 创建的随机森林返回 type sklearn.grid_search.RandomizedSearchCV。
我正在尝试以编程方式检查估计器的类型(以确定是否需要best_estimator_对特征重要性使用),但似乎找不到这样做的好方法。
if type(estimator) == 'sklearn.grid_search.RandomizedSearchCV' 是我的第一个猜测,但显然是错误的。
我使用GridSearchCV进行线性回归的交叉验证(不是分类器也不是逻辑回归).
我还使用StandardScaler来标准化X.
我的数据框有17个特征(X)和5个目标(y)(观察).大约1150行
我一直得到ValueError:不支持连续错误消息并且没有选项.
这里有一些代码(假设所有导入都正确完成):
soilM = pd.read_csv('C:/training.csv', index_col=0)
soilM = getDummiedSoilDepth(soilM) #transform text values in 0 and 1
soilM = soilM.drop('Depth', 1)
soil = soilM.iloc[:,-22:]
X_train, X_test, Ca_train, Ca_test, P_train, P_test, pH_train, pH_test, SOC_train, SOC_test, Sand_train, Sand_test = splitTrainTestAdv(soil)
scores = ['precision', 'recall']
for score in scores:
for model in MODELS.keys():
print model, score
performParameterSelection(model, score, X_test, Ca_test, X_train, Ca_train)
def performParameterSelection(model_name, criteria, X_test, y_test, X_train, y_train):
model, param_grid = MODELS[model_name]
gs = GridSearchCV(model, param_grid, n_jobs= 1, cv=5, verbose=1, …Run Code Online (Sandbox Code Playgroud) 在sklearn中,可以定义串行管道,以使管道的所有连续部分都获得超参数的最佳组合。串行管道可以实现如下:
from sklearn.svm import SVC
from sklearn import decomposition, datasets
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
digits = datasets.load_digits()
X_train = digits.data
y_train = digits.target
#Use Principal Component Analysis to reduce dimensionality
# and improve generalization
pca = decomposition.PCA()
# Use a linear SVC
svm = SVC()
# Combine PCA and SVC to a pipeline
pipe = Pipeline(steps=[('pca', pca), ('svm', svm)])
# Check the training time for the SVC
n_components = [20, 40, 64]
params_grid = {
'svm__C': …Run Code Online (Sandbox Code Playgroud) 我只是数据分析的初学者。我想使用“交叉验证网格搜索方法”来确定径向基函数 (RBF) 内核 SVM 的参数 gamma 和 C。我不知道应该将数据放在这段代码的哪里,也不知道我的数据类型是什么应该使用(训练或目标数据)?
对于SVR
import numpy as np
import pandas as pd
from math import sqrt
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error,explained_variance_score
from TwoStageTrAdaBoostR2 import TwoStageTrAdaBoostR2 # import the two-stage algorithm
from sklearn import preprocessing
from sklearn import svm
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from matplotlib.colors import Normalize
from sklearn.svm import SVC
# Data import (source)
source= pd.read_csv(sourcedata) …Run Code Online (Sandbox Code Playgroud) data-visualization svm data-analysis scikit-learn grid-search
我正在尝试使用一些自定义转换器来优化scikit-learn管道中的超参数,但我不断遇到错误:
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import GridSearchCV
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
class RollingMeanTransform(BaseEstimator, TransformerMixin):
def __init__(self, col, window=3):
self._window = window
self._col = col
def fit(self, X, y=None):
return self
def transform(self, X):
df = X.copy()
df['{}_rolling_mean'.format(self._col)] = df[self._col].shift(1).rolling(self._window).mean().fillna(0.0)
return df
class TimeEncoding(BaseEstimator, TransformerMixin):
def __init__(self, col, drop_original=True):
self._col = col
self._drop_original = drop_original
def fit(self, X, y=None):
return self
def transform(self, X):
X = X.copy()
unique_vals = float(len(X[self._col].unique()))
X['sin_{}'.format(self._col)] = np.sin(2 * …Run Code Online (Sandbox Code Playgroud) grid-search ×8
scikit-learn ×8
python ×5
lightgbm ×1
nested ×1
pandas ×1
pipeline ×1
python-2.7 ×1
svm ×1
types ×1