Using Softmax Activation function after calculating loss from BCEWithLogitLoss (Binary Cross Entropy + Sigmoid activation)

Question

Using Softmax Activation function after calculating loss from BCEWithLogitLoss (Binary Cross Entropy + Sigmoid activation)

Des*_*wal 2 neural-network deep-learning recurrent-neural-network pytorch

I am going through a Binary Classification tutorial using PyTorch and here, the last layer of the network is torch.Linear() with just one neuron. (Makes Sense) which will give us a single neuron. as pred=network(input_batch)

After that the choice of Loss function is loss_fn=BCEWithLogitsLoss() (which is numerically stable than using the softmax first and then calculating loss) which will apply Softmax function to the output of last layer to give us a probability. so after that, it'll calculate the binary cross entropy to minimize the loss.

loss=loss_fn(pred,true)

My concern is that after all this, the author used torch.round(torch.sigmoid(pred))

Why would that be? I mean I know it'll get the prediction probabilities in the range [0,1] and then round of the values with default threshold of 0.5.

Isn't it better to use the sigmoid once after the last layer within the network rather using a softmax and a sigmoid at 2 different places given it's a binary classification??

Wouldn't it be better to just

out = self.linear(batch_tensor)
return self.sigmoid(out)

Run Code Online (Sandbox Code Playgroud)

and then calculate the BCE loss and use the argmax() for checking accuracy??

I am just curious that can it be a valid strategy?

Answer 1

Mic*_*ngo 6

您似乎将二元分类视为具有两个类别的多类分类，但在使用二元交叉熵方法时，这并不完全正确。在查看任何实现细节之前，让我们首先澄清二元分类的目标。

从技术上讲，有两个类，0 和 1，但您可以将它们视为彼此相反的类，而不是将它们视为两个单独的类。例如，您想要对 StackOverflow 答案是否有帮助进行分类。这两个类将是“有帮助”和“无帮助”。当然，你会简单地问“这个答案有帮助吗？” ，负面的方面就被忽略了，如果不是这种情况，你可以推断它“没有帮助”。（请记住，这是一个二元案例，没有中间立场）。

因此，你的模型只需要预测单个类，但为了避免与实际的两个类混淆，可以表示为：模型预测正例发生的概率。在前面的示例中：StackOverflow 答案有帮助的概率是多少？

Sigmoid 为您提供[0, 1]范围内的值，即概率。现在，您需要通过定义阈值来决定模型何时有足够的信心使其为正值。为了使其平衡，阈值是 0.5，因此只要概率大于 0.5 就是正数（类别 1：“有帮助”），否则就是负数（类别 0：“无帮助”），这是通过舍入实现的（ IE torch.round(torch.sigmoid(pred))）。

之后，损失函数的选择loss_fn=BCEWithLogitsLoss()（比先使用softmax然后计算损失Softmax在数值上更稳定）将函数应用于最后一层的输出以给出概率。

考虑到它是二元分类，在网络中最后一层之后使用一次 sigmoid 不是更好吗？而不是在 2 个不同的位置使用 softmax 和 sigmoid？

BCEWithLogitsLoss应用Sigmoid而不是 Softmax，根本不涉及 Softmax。从nn.BCEWithLogitsLoss文档中：

该损失将Sigmoid层和BCELoss结合在一个类别中。该版本比使用简单的Sigmoid和BCELoss具有更高的数值稳定性，因为通过将操作组合到一层，我们利用 log-sum-exp 技巧来实现数值稳定性。

如果不在模型中应用 Sigmoid，您将获得二元交叉熵的数值更稳定的版本，但这意味着如果您想在训练之外进行实际预测，则必须手动应用 Sigmoid。

[...] 并使用argmax()来检查准确性？

同样，您正在考虑多类别场景。您只有一个输出类，即输出大小为[batch_size, 1]。取 argmax ，总是给你 0，因为这是唯一可用的类。

归档时间：	5 年，8 月前
查看次数：	5383 次
最近记录：	5 年，8 月前