PyTorch 中的标签平滑

Question

PyTorch 中的标签平滑

Jar*_*sen 24 python machine-learning pytorch transfer-learning

我正在使用迁移学习ResNet-18为斯坦福汽车数据集构建分类模型。我想实施标签平滑来惩罚过度自信的预测并提高泛化能力。

TensorFlow在中有一个简单的关键字参数CrossEntropyLoss。有没有人为PyTorch我可以即插即用的类似功能？

Answer 1

通过使用软目标（硬目标的加权平均值和标签上的均匀分布），通常可以显着提高多类神经网络的泛化和学习速度。以这种方式平滑标签可以防止网络变得过于自信，并且标签平滑已被用于许多最先进的模型，包括图像分类、语言翻译和语音识别。

标签平滑已经在Tensorflow交叉熵损失函数中实现。二元交叉熵，分类交叉熵。但目前，有没有正式实施标签平滑在PyTorch。但是，正在对此进行积极的讨论，希望它将提供一个正式的软件包。这是该讨论主题：问题 #7455。

在这里，我们将从实践者那里带来一些可用的标签平滑（LS）的最佳实现PyTorch。基本上，有很多方法可以实现LS。请参阅有关此的具体讨论，一个在此处，另一个在此处。在这里，我们将带来实现2种与每两个版本独特的方式; 所以总共4 .

选项 1：CrossEntropyLossWithProbs

这样，它就接受了one-hot目标向量。用户必须手动平滑他们的目标向量。它可以在with torch.no_grad()范围内完成，因为它暂时将所有requires_grad标志设置为 false。

杨德文：来源

import torch import numpy as np import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable from torch.nn.modules.loss import _WeightedLoss class LabelSmoothingLoss(nn.Module): def __init__(self, classes, smoothing=0.0, dim=-1, weight = None): """if smoothing == 0, it's one-hot method if 0 < smoothing < 1, it's smooth method """ super(LabelSmoothingLoss, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing self.weight = weight self.cls = classes self.dim = dim def forward(self, pred, target): assert 0 <= self.smoothing < 1 pred = pred.log_softmax(dim=self.dim) if self.weight is not None: pred = pred * self.weight.unsqueeze(0) with torch.no_grad(): true_dist = torch.zeros_like(pred) true_dist.fill_(self.smoothing / (self.cls - 1)) true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence) return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))
Run Code Online (Sandbox Code Playgroud)
此外，我们self. smoothing在此实现上添加了断言复选标记并添加了损失加权支持。

Shital Shah :来源

Shital 已经在这里发布了答案。这里我们要指出的是，这个实现类似于Devin Yang的上述实现。但是，在这里我们提到了他的代码，将code syntax.

class SmoothCrossEntropyLoss(_WeightedLoss): def __init__(self, weight=None, reduction='mean', smoothing=0.0): super().__init__(weight=weight, reduction=reduction) self.smoothing = smoothing self.weight = weight self.reduction = reduction def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0): with torch.no_grad(): targets = torch.empty(size=(targets.size(0), n_classes), device=targets.device) \ .fill_(smoothing /(n_classes-1)) \ .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing) return targets def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def forward(self, inputs, targets): assert 0 <= self.smoothing < 1 targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing) log_preds = F.log_softmax(inputs, -1) if self.weight is not None: log_preds = log_preds * self.weight.unsqueeze(0) return self.reduce_loss(-(targets * log_preds).sum(dim=-1))
Run Code Online (Sandbox Code Playgroud)
查看

import torch import numpy as np import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable from torch.nn.modules.loss import _WeightedLoss if __name__=="__main__": # 1. Devin Yang crit = LabelSmoothingLoss(classes=5, smoothing=0.5) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) # 2. Shital Shah crit = SmoothCrossEntropyLoss(smoothing=0.5) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) tensor(1.4178) tensor(1.4178)
Run Code Online (Sandbox Code Playgroud)

选项 2：LabelSmoothingCrossEntropyLoss

通过这种方式，它接受目标向量并使用不手动平滑目标向量，而是内置模块负责标签平滑。它允许我们根据实现标签平滑F.nll_loss。

（一种）。Wangleiofficial :来源- (AFAIK), 原始海报

(b)。Datasaurus：源 - 添加了加权支持

此外，我们略微减少了编码编写，使其更加简洁。

class LabelSmoothingLoss(torch.nn.Module): def __init__(self, smoothing: float = 0.1, reduction="mean", weight=None): super(LabelSmoothingLoss, self).__init__() self.smoothing = smoothing self.reduction = reduction self.weight = weight def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def linear_combination(self, x, y): return self.smoothing * x + (1 - self.smoothing) * y def forward(self, preds, target): assert 0 <= self.smoothing < 1 if self.weight is not None: self.weight = self.weight.to(preds.device) n = preds.size(-1) log_preds = F.log_softmax(preds, dim=-1) loss = self.reduce_loss(-log_preds.sum(dim=-1)) nll = F.nll_loss( log_preds, target, reduction=self.reduction, weight=self.weight ) return self.linear_combination(loss / n, nll)
Run Code Online (Sandbox Code Playgroud)

NVIDIA/深度学习示例：来源

class LabelSmoothing(nn.Module): """NLL loss with label smoothing. """ def __init__(self, smoothing=0.0): """Constructor for the LabelSmoothing module. :param smoothing: label smoothing factor """ super(LabelSmoothing, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing def forward(self, x, target): logprobs = torch.nn.functional.log_softmax(x, dim=-1) nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1)) nll_loss = nll_loss.squeeze(1) smooth_loss = -logprobs.mean(dim=-1) loss = self.confidence * nll_loss + self.smoothing * smooth_loss return loss.mean()
Run Code Online (Sandbox Code Playgroud)
查看

if __name__=="__main__": # Wangleiofficial crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean") predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) # NVIDIA crit = LabelSmoothing(smoothing=0.3) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) tensor(1.3883) tensor(1.3883)
Run Code Online (Sandbox Code Playgroud)

Answer 2

Shi*_*hah 9

我一直在寻找从_LossPyTorch 中的其他损失类派生并尊重基本参数（如reduction. 不幸的是，我找不到直接的替代品，所以最终写了我自己的。我还没有完全测试过这个，但是：

import torch
from torch.nn.modules.loss import _WeightedLoss
import torch.nn.functional as F

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    @staticmethod
    def _smooth_one_hot(targets:torch.Tensor, n_classes:int, smoothing=0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                    device=targets.device) \
                .fill_(smoothing /(n_classes-1)) \
                .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def forward(self, inputs, targets):
        targets = SmoothCrossEntropyLoss._smooth_one_hot(targets, inputs.size(-1),
            self.smoothing)
        lsm = F.log_softmax(inputs, -1)

        if self.weight is not None:
            lsm = lsm * self.weight.unsqueeze(0)

        loss = -(targets * lsm).sum(-1)

        if  self.reduction == 'sum':
            loss = loss.sum()
        elif  self.reduction == 'mean':
            loss = loss.mean()

        return loss

Run Code Online (Sandbox Code Playgroud)

其他选项：

utils.pytorch实现
深度匹配实现

Answer 3

Jin*_*ich 6

我所知道的都没有。

以下是 PyTorch 实现的两个示例：

LabelSmoothingLossOpenNMT 框架中用于机器翻译的模块
attention-is-all-you-need-pytorch，重新实现Google的Attention就是你所需要的纸

归档时间：	6 年，11 月前
查看次数：	20370 次
最近记录：	4 年，10 月前