这是我在这里的第一个问题,我希望我做对了,
我正在研究在 kaggle 上流行的泰坦尼克号数据集,如果你想检查数据科学框架:达到 99% 的准确度,这个 tutarial
第 5.2 部分,它教了如何网格搜索和调整超参数。在具体说明我的问题之前,让我与您分享相关代码;
这是使用 GridSearchCV 调整模型:
cv_split = model_selection.ShuffleSplit(n_splits = 10, test_size = .3, train_size = .6, random_state = 0)
#cv_split = model_selection.KFold(n_splits=10, shuffle=False, random_state=None)
param_grid = {'criterion': ['gini', 'entropy'],
'splitter': ['best', 'random'], #splitting methodology; two supported strategies - default is best
'max_depth': [2,4,6,8,10,None], #max depth tree can grow; default is none
'min_samples_split': [2,5,10,.03,.05], #minimum subset size BEFORE new split (fraction is % of total); default is 2
'min_samples_leaf': [1,5,10,.03,.05], #minimum …Run Code Online (Sandbox Code Playgroud) python machine-learning data-analysis scikit-learn grid-search