我想将我拥有的数据集拆分为测试/训练,同时确保分类标签在测试/训练中的分布相同。为此,我使用了分层选项,但它会引发如下错误:
X_full_train, X_full_test, Y_full_train, Y_full_test = train_test_split(X_values_full, Y_values, test_size = 0.33, random_state = 42, stratify = True)
Run Code Online (Sandbox Code Playgroud)
错误信息:
TypeError Traceback (most recent call last)
in
19
20
---> 21 X_full_train, X_full_test, Y_full_train, Y_full_test = train_test_split(X_values_full, Y_values, test_size = 0.33, random_state = 42, stratify = True)
22
23
~/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
2150 random_state=random_state)
2151
-> 2152 train, test = next(cv.split(X=arrays[0], y=stratify))
2153
2154 return list(chain.from_iterable((_safe_indexing(a, train),
~/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_split.py in split(self, X, y, groups)
1744 to an integer.
1745 """
-> 1746 …Run Code Online (Sandbox Code Playgroud)