在 train_test_split 返回的数据上，熊猫“不再支持将类似列表的内容传递给 .loc 或 [] 并带有任何缺失的标签”

Question

在 train_test_split 返回的数据上，熊猫“不再支持将类似列表的内容传递给 .loc 或 [] 并带有任何缺失的标签”

Dan*_*y W 10 python numpy pandas scikit-learn

由于某种原因，train_test_split 尽管长度相同且索引看起来相同，但仍会触发此错误。

from sklearn.model_selection import KFold

data = {'col1':[30.5,45,1,99,6,5,4,2,5,7,7,3], 'col2':[99.5, 98, 95, 90,1,5,6,7,4,4,3,3],'col3':[23, 23.6, 3, 90,1,9,60,9,7,2,2,1]} 
df = pd.DataFrame(data)

train, test = train_test_split(df, test_size=0.10)
X = train[['col1', 'col2']]
y2 = train['col3']

X = np.array(X)

kf = KFold(n_splits=3, shuffle=True)
for train_index, test_index in kf.split(X):
    X_train, y_train = X[train_index], y[train_index]

Run Code Online (Sandbox Code Playgroud)

y 是熊猫系列（与 x 长度相同）。x 是一个数据框，大约有 20 个数字列被转换为 numpy 数组。

出于某种原因，尽管长度相同，但 train_test_split 仍会触发错误。

如果我不调用 train_test_split 它工作正常。

由于尝试以这种方式索引 numpy 数组而触发错误的最后一行：y[train_ind]

Answer 1

tal*_*can 11

我已经尝试为您的情况创建一个场景。

我创建了以下数据框：

    col1  col2  col3
0      1     2     1
1      3     4     0
2      5     6     1
3      7     8     0
4      9    10     1
5     11    12     0
6     13    14     1
7     15    16     0
8     17    18     1
9     19    20     0
10    21    22     1
11    23    24     0
12    25    26     1
13    27    28     0
14    29    30     1

Run Code Online (Sandbox Code Playgroud)

我为 X 和y设置col1和。在此之后，我将 X 转换为 numpy 数组，如下所示。唯一的区别是我在.col2col3shuffleKFold

X = df[['col1', 'col2']]
y = df['col3']
X = np.array(X)
kf = KFold(n_splits=3, shuffle=True)
for train_index, test_index in kf.split(X):
    X_train, y_train = X[train_index], y[train_index]

Run Code Online (Sandbox Code Playgroud)

它运作良好。所以请检查我的代码和你的代码，如果我遗漏了什么，请澄清它。

更新

我假设 y2 是 y。所以 y 类型仍然是Series，你需要使用.iloc它。以下代码运行良好。

data = {'col1':[30.5,45,1,99,6,5,4,2,5,7,7,3], 'col2':[99.5, 98, 95, 90,1,5,6,7,4,4,3,3],'col3':[23, 23.6, 3, 90,1,9,60,9,7,2,2,1]}
df = pd.DataFrame(data)
train, test = train_test_split(df, test_size=0.10)

X = train[['col1', 'col2']]
y = train['col3']

X = np.array(X)

kf = KFold(n_splits=3, shuffle=True)
for train_index, test_index in kf.split(X):
    X_train, y_train = X[train_index], y.iloc[train_index]

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年前
查看次数：	25556 次
最近记录：	5 年，4 月前