在Pytorch中,如何将L1正则化器添加到激活中?

Bul*_*ull 15 python pytorch

(pytorch初学者在这里)

我想将L1正则化器添加到ReLU的激活输出中.更一般地说,如何仅将规则化器添加到网络中的特定层?

这篇文章可能有关: 在PyTorch中添加L1/L2正则化? 但无论是否相关,或者我不明白答案:

它指的是在优化中应用的L2正则化器,这是另一回事.换句话说,如果总的期望损失是

crossentropy + lambda1*L1(layer1) + lambda2*L1(layer2) + ...
Run Code Online (Sandbox Code Playgroud)

我相信提供给torch.optim.Adagrad的参数仅适用于交叉熵损失.或者它可能适用于整个网络的所有参数(权重).但无论如何,它似乎不允许将单一的正则化应用于单层激活,并且不会提供L1损失.

另一个相关主题是nn.modules.loss,其中包含L1Loss().从文档中,我还不知道如何使用它.

最后,有一个模块https://github.com/pytorch/pytorch/blob/master/torch/legacy/nn/L1Penalty.py似乎最接近目标,但它被称为"遗产".这是为什么?

Sas*_*thy 12

这是你如何做到这一点:

  • 在模块的正向返回最终输出和要为其应用L1正则化的图层输出中
  • loss 变量将是输出目标和L1罚分的交叉熵损失之和.

这是一个示例代码

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)

    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out, layer1_out, layer2_out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

# usually following code is looped over all batches 
# but let's just do a dummy batch for brevity

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())

optimizer.zero_grad()
outputs, layer1_out, layer2_out = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)

all_linear1_params = torch.cat([x.view(-1) for x in model.linear1.parameters()])
all_linear2_params = torch.cat([x.view(-1) for x in model.linear2.parameters()])
l1_regularization = lambda1 * torch.norm(all_linear1_params, 1)
l2_regularization = lambda2 * torch.norm(all_linear2_params, 2)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
Run Code Online (Sandbox Code Playgroud)

  • 这规范了权重,您应该规范返回的层输出(即激活)。这就是你首先归还它们的原因!正则化项应类似于: `l1_regularization = lambda1 * torch.norm(layer1_out, 1)` `l2_regularization = lambda2 * torch.norm(layer2_out, 2)` (5认同)
  • 看来答案有错误。对于“norm(all_linear2_params, 2)”:torch 返回 L2 正则化的 **平方根**。即表达式应取 2 次方 (3认同)
  • 这不会正则化层的权重吗?我猜原始海报想要正则化层的输出而不是权重。如何仅对 PyTorch 中的激活进行正则化(稀疏)? (2认同)
  • 为什么在不使用这些变量的情况下需要从前向返回layer1_out、layer2_out? (2认同)

ndr*_*nen 11

所有(其他当前)响应在某种程度上都是不正确的。这个最接近,因为它建议对输出的范数求和,这是正确的,但代码对权重的范数求和,这是不正确的。

正确的方法不是修改网络代码,而是通过前向钩子捕获输出,就像在OutputHook类中一样。从那里,输出范数的总和很简单,但需要注意清除每次迭代捕获的输出。

import torch


class OutputHook(list):
    """ Hook to capture module outputs.
    """
    def __call__(self, module, input, output):
        self.append(output)


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
        # Instantiate ReLU, so a hook can be registered to capture its output.
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        layer1_out = self.relu(self.linear1(x))
        layer2_out = self.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out


batch_size = 4
l1_lambda = 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
# Register hook to capture the ReLU outputs. Non-trivial networks will often
# require hooks to be applied more judiciously.
output_hook = OutputHook()
model.relu.register_forward_hook(output_hook)

inputs = torch.rand(batch_size, 128)
targets = torch.ones(batch_size).long()

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = torch.nn.functional.cross_entropy(outputs, targets)

# Compute the L1 penalty over the ReLU outputs captured by the hook.
l1_penalty = 0.
for output in output_hook:
    l1_penalty += torch.norm(output, 1)
l1_penalty *= l1_lambda

loss = cross_entropy_loss + l1_penalty
loss.backward()
optimizer.step()
output_hook.clear()
Run Code Online (Sandbox Code Playgroud)

  • 这应该是公认的答案 (2认同)

小智 7

@Sasank Chilamkurthy正则化应该是模型每一层的加权参数,而不是每一层的输出。请看下面: 正则化

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
l1_regularization, l2_regularization = torch.tensor(0), torch.tensor(0)

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
for param in model.parameters():
    l1_regularization += torch.norm(param, 1)
    l2_regularization += torch.norm(param, 2)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
Run Code Online (Sandbox Code Playgroud)