使用 KFold 分割来拟合模型返回“不在索引中”

Question

使用 KFold 分割来拟合模型返回“不在索引中”

Cle*_*Ros 0 python pandas cross-validation

我有一个像这样的数据框：

    Col1    Col2    
10   1        6         
11   3        8        
12   9        4        
13   7        2
14   4        3
15   2        9
16   6        7
17   8        1
18   5        5

Run Code Online (Sandbox Code Playgroud)

我想使用 KFold 交叉验证来拟合我的模型并进行预测。

for train_index, test_index in kf.split(X_train, y_train):

    model.fit(X[train_index], y[train_index])
    y_pred = model.predict(X[test_index])

Run Code Online (Sandbox Code Playgroud)

此代码生成以下错误：

'[1 2 4 7] 不在索引中'

我看到在 KFold.split() 之后，train_index 和 test_index 不使用数据帧的真实索引号。

所以我无法适应我的模型。

有人有主意吗？

Answer 1

Sta*_*ean 6

据我所知，您的数据帧的索引从 10 开始，而不是从 0 开始，正如您所说，从 sklearn 中分割使用从 0 开始的索引。一种解决方案是使用以下命令重置数据帧的索引：

df = df.reset_index(drop=True)

Run Code Online (Sandbox Code Playgroud)

另一个解决方案是在数据帧上使用 .iloc ，所以它看起来像（假设 y 是一个数组，如果它是数据帧，你也必须在那里使用 .iloc ）。

for train_index, test_index in kf.split(X_train, y_train):
   model.fit(X.iloc[train_index], y[train_index])
   y_pred = model.predict(X.iloc[test_index])

Run Code Online (Sandbox Code Playgroud)

第三种解决方案是将数据帧转换为数组。

for train_index, test_index in kf.split(X_train, y_train):
   model.fit(X.values[train_index], y[train_index])
   y_pred = model.predict(X.values[test_index])

Run Code Online (Sandbox Code Playgroud)

编辑：我什至可以看到第四种解决方案，这可能是您想要的。您只需执行 df.index.values[train_index] 即可获取训练集中的索引数组。

归档时间：	7 年前
查看次数：	2476 次
最近记录：	7 年前