无法在多标签分类器上使用 Stratified-K-Fold

Question

无法在多标签分类器上使用 Stratified-K-Fold

Sai*_*van 6 scikit-learn cross-validation deep-learning keras

以下代码用于进行 KFold 验证，但我要训练模型，因为它抛出错误

ValueError: Error when checking target: expected dense_14 to have shape (7,) but got array with shape (1,)

Run Code Online (Sandbox Code Playgroud)

我的目标变量有 7 个类。我正在使用LabelEncoder将类编码为数字。

通过看到此错误，如果我将其更改为MultiLabelBinarizer对类进行编码。我收到以下错误

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.

Run Code Online (Sandbox Code Playgroud)

以下是KFold验证的代码

skf = StratifiedKFold(n_splits=10, shuffle=True)
scores = np.zeros(10)
idx = 0
for index, (train_indices, val_indices) in enumerate(skf.split(X, y)):
    print("Training on fold " + str(index+1) + "/10...")
    # Generate batches from indices
    xtrain, xval = X[train_indices], X[val_indices]
    ytrain, yval = y[train_indices], y[val_indices]
    model = None
    model = load_model() //defined above

    scores[idx] = train_model(model, xtrain, ytrain, xval, yval)
    idx+=1
print(scores)
print(scores.mean())

Run Code Online (Sandbox Code Playgroud)

我不知道该怎么办。我想在我的模型上使用分层 K 折叠。请帮我。

Answer 1

pan*_*ijk 10

MultiLabelBinarizer 返回一个向量，该向量的长度为您的类数。

如果您查看如何StratifiedKFold 拆分数据集，您会发现它只接受一维目标变量，而您正在尝试传递具有维度的目标变量[n_samples, n_classes]

分层拆分基本上保留了您的班级分布。如果你仔细想想，如果你有一个多标签分类问题，那就没有多大意义了。

如果您想保留目标变量中不同类别组合的分布，那么这里的答案解释了您可以定义自己的策略拆分函数的两种方法。

更新：

逻辑是这样的：

假设您有n类并且您的目标变量是这些n类的组合。您将有(2^n) - 1组合（不包括所有 0）。您现在可以创建一个新的目标变量，将每个组合视为一个新标签。

例如，如果n=3，您将拥有7独特的组合：

 1. [1, 0, 0]
 2. [0, 1, 0]
 3. [0, 0, 1]
 4. [1, 1, 0]
 5. [1, 0, 1]
 6. [0, 1, 1]
 7. [1, 1, 1]

Run Code Online (Sandbox Code Playgroud)

将所有标签映射到这个新的目标变量。您现在可以将您的问题视为简单的多类分类，而不是多标签分类。

现在您可以直接使用StartefiedKFoldusingy_new作为您的目标。拆分完成后，您可以将标签映射回来。

代码示例：

import numpy as np

np.random.seed(1)
y = np.random.randint(0, 2, (10, 7))
y = y[np.where(y.sum(axis=1) != 0)[0]]

Run Code Online (Sandbox Code Playgroud)

输出：

array([[1, 1, 0, 0, 1, 1, 1],
       [1, 1, 0, 0, 1, 0, 1],
       [1, 0, 0, 1, 0, 0, 0],
       [1, 0, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 1, 1, 1],
       [1, 1, 0, 0, 0, 1, 1],
       [1, 1, 1, 1, 0, 1, 1],
       [0, 0, 1, 0, 0, 1, 1],
       [1, 0, 1, 0, 0, 1, 1],
       [0, 1, 1, 1, 1, 0, 0]])

Run Code Online (Sandbox Code Playgroud)

标签编码您的类向量：

from sklearn.preprocessing import LabelEncoder

def get_new_labels(y):
    y_new = LabelEncoder().fit_transform([''.join(str(l)) for l in y])
    return y_new

y_new = get_new_labels(y)

Run Code Online (Sandbox Code Playgroud)

输出：

array([7, 6, 3, 3, 2, 5, 8, 0, 4, 1])

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，8 月前
查看次数：	6061 次
最近记录：	6 年，8 月前