sw0*_*7sw 11 python scikit-learn cross-validation
我正在尝试对分组数据实施交叉验证方案.我希望使用GroupKFold方法,但我一直收到错误.我究竟做错了什么?代码(与我使用的代码略有不同 - 我有不同的数据,所以我有一个更大的n_splits,但其他每一个都是相同的)
from sklearn import metrics
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import GroupKFold
from sklearn.grid_search import GridSearchCV
from xgboost import XGBRegressor
#generate data
x=np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13])
y= np.array([1,2,3,4,5,6,7,1,2,3,4,5,6,7])
group=np.array([1,0,1,1,2,2,2,1,1,1,2,0,0,2)]
#grid search
gkf = GroupKFold( n_splits=3).split(x,y,group)
subsample = np.arange(0.3,0.5,0.1)
param_grid = dict( subsample=subsample)
rgr_xgb = XGBRegressor(n_estimators=50)
grid_search = GridSearchCV(rgr_xgb, param_grid, cv=gkf, n_jobs=-1)
result = grid_search.fit(x, y)
Run Code Online (Sandbox Code Playgroud)
错误:
Traceback (most recent call last):
File "<ipython-input-143-11d785056a08>", line 8, in <module>
result = grid_search.fit(x, y)
File "/home/student/anaconda/lib/python3.5/site-packages/sklearn/grid_search.py", line 813, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/home/student/anaconda/lib/python3.5/site-packages/sklearn/grid_search.py", line 566, in _fit
n_folds = len(cv)
TypeError: object of type 'generator' has no len()
Run Code Online (Sandbox Code Playgroud)
换线
gkf = GroupKFold( n_splits=3).split(x,y,group)
Run Code Online (Sandbox Code Playgroud)
至
gkf = GroupKFold( n_splits=3)
Run Code Online (Sandbox Code Playgroud)
也不起作用.然后是错误消息:
'GroupKFold' object is not iterable
Run Code Online (Sandbox Code Playgroud)
Mos*_*oye 23
产生训练和测试指数的split功能一次配对一个.您应该调用split值将它们全部放入列表中,以便计算长度:GroupKFold list
gkf = list(GroupKFold( n_splits=3).split(x,y,group))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4380 次 |
| 最近记录: |