如何在 pytorch 中向权重添加 L1 或 L2 正则化

Question

如何在 pytorch 中向权重添加 L1 或 L2 正则化

zez*_*ezo 4 optimization regression machine-learning pytorch

在张量流中，我们可以在顺序模型中添加 L1 或 L2 正则化。我在 pytorch 中找不到等效的方法。我们如何在网络定义中为 pytorch 中的权重添加正则化：

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        """ How to add a L1 regularization after a certain hidden layer?? """
        """ OR How to add a L1 regularization after a certain hidden layer?? """
        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x

net = Net(n_feature=1, n_hidden=10, n_output=1)     # define the network
# print(net)  # net architecture
optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

Run Code Online (Sandbox Code Playgroud)

Answer 1

jod*_*dag 5

一般来说，L2 正则化是通过weight_decayPyTorch 中优化器的参数来处理的（您也可以为不同的层分配不同的参数）。然而，这种机制不允许在不扩展现有优化器或编写自定义优化器的情况下进行 L1 正则化。

根据张量流文档，他们reduce_sum(abs(x))对 L1 正则化使用惩罚，reduce_sum(square(x))对 L2 正则化使用惩罚。实现这一目标的最简单方法可能是直接将这些惩罚项添加到训练期间用于梯度计算的损失函数中。

# set l1_weight and l2_weight to non-zero values to enable penalties

# inside the training loop (given input x and target y)
...
pred = net(x)
loss = loss_func(pred, y)

# compute penalty only for net.hidden parameters
l1_penalty = l1_weight * sum([p.abs().sum() for p in net.hidden.parameters()])
l2_penalty = l2_weight * sum([(p**2).sum() for p in net.hidden.parameters()])
loss_with_penalty = loss + l1_penalty + l2_penalty

optimizer.zero_grad()
loss_with_penalty.backward()
optimizer.step()

# The pre-penalty loss is the one we ultimately care about
print('loss:', loss.item())

Run Code Online (Sandbox Code Playgroud)

我能想到的最直观的解释是，这是机器学习中的一种技术，如果模型参数与某些预定义状态“相差太远”，则会对模型进行惩罚。从某种意义上说，这限制了模型的复杂性，并有助于避免过度拟合等问题。 (3认同)

归档时间：	5 年前
查看次数：	10970 次
最近记录：	5 年前