Par*_*kla 1 python machine-learning scikit-learn

我正在 covid 19 数据集上制作 ML 项目并收到这样的错误
from sklearn.model_selection import StratifiedShuffleSplit
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_index, test_index in split.split(covid, covid['Death Ratio']):
strat_train_set = covid.loc[train_index]
strat_test_set = covid.loc[test_index]
Run Code Online (Sandbox Code Playgroud)
我尝试了很多方法来解决,但我没能做到
ValueError Traceback (most recent call last)
<ipython-input-31-42056912ab46> in <module>
1 from sklearn.model_selection import StratifiedShuffleSplit
2 split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
----> 3 for train_index, test_index in split.split(covid, covid['Death Ratio']):
4 strat_train_set = covid.loc[train_index]
5 strat_test_set = covid.loc[test_index]
c:\users\hp\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
1385 """
1386 X, y, groups = indexable(X, y, groups)
-> 1387 for train, test in self._iter_indices(X, y, groups):
1388 yield train, test
1389
c:\users\hp\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_split.py in _iter_indices(self, X, y, groups)
1713 class_counts = np.bincount(y_indices)
1714 if np.min(class_counts) < 2:
-> 1715 raise ValueError("The least populated class in y has only 1"
1716 " member, which is too few. The minimum"
1717 " number of groups for any class cannot"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
Run Code Online (Sandbox Code Playgroud)

您无法使用以下命令执行分层分割,covid['Death Ratio']因为此列中的某些值的出现次数少于 1 次。
如果你想基于此列进行分层分割,你可以将其离散化。否则,您可以根据另一个值对分割进行分层。在我看来,我不会根据此列执行分层拆分,而是执行简单的ShuffleSplit.
编辑:
如果要执行多次分割,请使用(例如:5)使用:
from sklearn.model_selection import ShuffleSplit
splits = ShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
Run Code Online (Sandbox Code Playgroud)
如果你想执行单个分割,你可以使用:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
12452 次 |
| 最近记录: |