Cyb*_*cop 4 python scikit-learn grid-search
我不知道这是否是正确的问题,但无论如何我都会问.如果不允许,请告诉我.
我习惯GridSearchCV
调整参数以找到最佳准确度.这就是我所做的:
from sklearn.grid_search import GridSearchCV
parameters = {'min_samples_split':np.arange(2, 80), 'max_depth': np.arange(2,10), 'criterion':['gini', 'entropy']}
clfr = DecisionTreeClassifier()
grid = GridSearchCV(clfr, parameters,scoring='accuracy', cv=8)
grid.fit(X_train,y_train)
print('The parameters combination that would give best accuracy is : ')
print(grid.best_params_)
print('The best accuracy achieved after parameter tuning via grid search is : ', grid.best_score_)
Run Code Online (Sandbox Code Playgroud)
这给了我以下结果:
The parameters combination that would give best accuracy is :
{'max_depth': 5, 'criterion': 'entropy', 'min_samples_split': 2}
The best accuracy achieved after parameter tuning via grid search is : 0.8147086914995224
Run Code Online (Sandbox Code Playgroud)
现在,我想在调用可视化决策树的函数时使用这些参数
该函数看起来像这样
def visualize_decision_tree(decision_tree, feature, target):
dot_data = export_graphviz(decision_tree, out_file=None,
feature_names=feature,
class_names=target,
filled=True, rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
return Image(graph.create_png())
Run Code Online (Sandbox Code Playgroud)
现在我正在尝试使用GridSearchCV提供的最佳参数以下列方式调用该函数
dtBestScore = DecisionTreeClassifier(parameters = grid.best_params_)
dtBestScore = dtBestScore.fit(X=dfWithTrainFeatures, y= dfWithTestFeature)
visualize_decision_tree(dtBestScore, list(dfCopy.columns.delete(0).values), 'survived')
Run Code Online (Sandbox Code Playgroud)
我在第一行代码中遇到错误
TypeError: __init__() got an unexpected keyword argument 'parameters'
Run Code Online (Sandbox Code Playgroud)
有什么方法我可以设法使用网格搜索提供的最佳参数并自动使用它?而不是查看结果并手动设置每个参数的值?
试试python kwargs:
DecisionTreeClassifier(**grid.best_params)
Run Code Online (Sandbox Code Playgroud)
有关kwargs的更多信息,请参阅http://pythontips.com/2013/08/04/args-and-kwargs-in-python-explaine d.