y 中人口最少的类只有 1 个成员,太少了。任何班级的最少小组人数不能少于2人。该怎么办

Par*_*kla 1 python machine-learning scikit-learn

在此输入图像描述

我正在 covid 19 数据集上制作 ML 项目并收到这样的错误

from sklearn.model_selection import StratifiedShuffleSplit
split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
for train_index, test_index in split.split(covid, covid['Death Ratio']):
    strat_train_set = covid.loc[train_index]
    strat_test_set = covid.loc[test_index]
Run Code Online (Sandbox Code Playgroud)

我尝试了很多方法来解决,但我没能做到

ValueError                                Traceback (most recent call last)
<ipython-input-31-42056912ab46> in <module>
      1 from sklearn.model_selection import StratifiedShuffleSplit
      2 split = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
----> 3 for train_index, test_index in split.split(covid, covid['Death Ratio']):
      4     strat_train_set = covid.loc[train_index]
      5     strat_test_set = covid.loc[test_index]

c:\users\hp\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_split.py in split(self, X, y, groups)
   1385         """
   1386         X, y, groups = indexable(X, y, groups)
-> 1387         for train, test in self._iter_indices(X, y, groups):
   1388             yield train, test
   1389 

c:\users\hp\appdata\local\programs\python\python37\lib\site-packages\sklearn\model_selection\_split.py in _iter_indices(self, X, y, groups)
   1713         class_counts = np.bincount(y_indices)
   1714         if np.min(class_counts) < 2:
-> 1715             raise ValueError("The least populated class in y has only 1"
   1716                              " member, which is too few. The minimum"
   1717                              " number of groups for any class cannot"

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
Run Code Online (Sandbox Code Playgroud)

这是错误

在此输入图像描述

在此输入图像描述

Ant*_*uis 5

您无法使用以下命令执行分层分割,covid['Death Ratio']因为此列中的某些值的出现次数少于 1 次。

如果你想基于此列进行分层分割,你可以将其离散化。否则,您可以根据另一个值对分割进行分层。在我看来,我不会根据此列执行分层拆分,而是执行简单的ShuffleSplit.

编辑

如果要执行多次分割,请使用(例如:5)使用:

from sklearn.model_selection import ShuffleSplit
splits = ShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
Run Code Online (Sandbox Code Playgroud)

如果你想执行单个分割,你可以使用:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Run Code Online (Sandbox Code Playgroud)