Yil*_*ang 5 nlp deep-learning lstm pytorch loss-function
I'm training a LSTM model using pytorch with batch size of 256 and NLLLoss() as loss function. The loss function is having problem with the data shape.
The softmax output from the forward passing has shape of torch.Size([256, 4, 1181])
where 256 is batch size, 4 is sequence length, and 1181 is vocab size.
The target is in the shape of torch.Size([256, 4])
where 256 is batch size and 4 is the output sequence length.
When I was testing earlier with batch size of 1, the model works fine but when I add batch size, it is breaking. I read that NLLLoss() can take class target as input instead of one hot encoded target.
Am I misunderstanding it? Or did I not format the shape of the target correctly?
class LSTM(nn.Module):
def __init__(self, embed_size=100, hidden_size=100, vocab_size=1181, embedding_matrix=...):
super(LSTM, self).__init__()
self.hidden_size = hidden_size
self.word_embeddings = nn.Embedding(vocab_size, embed_size)
self.word_embeddings.load_state_dict({'weight': torch.Tensor(embedding_matrix)})
self.word_embeddings.weight.requires_grad = False
self.lstm = nn.LSTM(embed_size, hidden_size)
self.hidden2out = nn.Linear(hidden_size, vocab_size)
def forward(self, tokens):
batch_size, num_steps = tokens.shape
embeds = self.word_embeddings(tokens)
lstm_out, _ = self.lstm(embeds.view(batch_size, num_steps, -1))
out_space = self.hidden2out(lstm_out.view(batch_size, num_steps, -1))
out_scores = F.log_softmax(out_space, dim=1)
return out_scores
model = LSTM(self.config.embed_size, self.config.hidden_size, self.config.vocab_size, self.embedding_matrix)
loss_function = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=self.config.lr)
Run Code Online (Sandbox Code Playgroud)
Error:
~/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
1846 if target.size()[1:] != input.size()[2:]:
1847 raise ValueError('Expected target size {}, got {}'.format(
-> 1848 out_size, target.size()))
1849 input = input.contiguous().view(n, c, 1, -1)
1850 target = target.contiguous().view(n, 1, -1)
ValueError: Expected target size (256, 554), got torch.Size([256, 4])
Run Code Online (Sandbox Code Playgroud)
损失函数的输入形状是 ,(N, d, C) = (256, 4, 1181)
目标形状是(N, d) = (256, 4)
,但是,根据NLLLoss上的文档,输入应该是(N, C, d)
目标形状(N, d)
。
假设x
是您的网络输出并且y
是目标,那么您可以通过转置不正确的维度来计算损失,x
如下所示:
loss = loss_function(x.transpose(1, 2), y)
Run Code Online (Sandbox Code Playgroud)
或者,由于 NLLLoss 只是对所有响应进行平均,因此您可以将x
和重塑y
为(N*d, C)
和(N*d)
。这会产生相同的结果,而无需创建张量的临时副本。
loss = loss_function(x.reshape(N*d, C), y.reshape(N*d))
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3206 次 |
最近记录: |