rcs*_*hon 3 python nlp word-embedding pytorch glove
在NLP任务中使用GloVe嵌入时,GloVe中可能不存在来自数据集的某些单词。因此,我们为这些未知单词实例化随机权重。
是否可以冻结从GloVe获得的重量,并仅训练新实例化的重量?
我只知道我们可以设置:model.embedding.weight.requires_grad = False
但这使新单词难以训练。
还是有更好的方法来提取单词的语义。
一种方法是使用两个单独的嵌入,一个用于预训练,另一个用于待训练。
GloVe应该被冻结,而没有预训练表示的GloVe应该从可训练层获取。
如果您格式化数据以用于预训练的令牌表示,则其范围比没有GloVe表示的令牌要小。假设您的预训练索引在[0,300]范围内,而没有代表性的索引在[301,500]。我会遵循以下思路:
import numpy as np
import torch
class YourNetwork(torch.nn.Module):
    def __init__(self, glove_embeddings: np.array, how_many_tokens_not_present: int):
        self.pretrained_embedding = torch.nn.Embedding.from_pretrained(glove_embeddings)
        self.trainable_embedding = torch.nn.Embedding(
            how_many_tokens_not_present, glove_embeddings.shape[1]
        )
        # Rest of your network setup
    def forward(self, batch):
        # Which tokens in batch do not have representation, should have indices BIGGER
        # than the pretrained ones, adjust your data creating function accordingly
        mask = batch > self.pretrained_embedding.shape[0]
        # You may want to optimize it, you could probably get away without copy, though
        # I'm not currently sure how
        pretrained_batch = batch.copy()
        pretrained_batch[mask] = 0
        embedded_batch = self.pretrained_embedding[pretrained_batch]
        # Every token without representation has to be brought into appropriate range
        batch -= self.pretrained_embedding.shape[0]
        # Zero out the ones which already have pretrained embedding
        batch[~mask] = 0
        non_pretrained_embedded_batch = self.trainable_embedding(batch)
        # And finally change appropriate tokens from placeholder embedding created by
        # pretrained into trainable embeddings.
        embedded_batch[mask] = non_pretrained_embedded_batch[mask]
        # Rest of your code
        ...
假设您的预训练索引在[0,300]范围内,而没有代表性的索引在[301,500]。
这有点棘手,但我认为它非常简洁且易于实现。因此,如果获得没有GloVe表示形式的标记的索引,则可以在反向传播后将它们的梯度显式清零,这样这些行将不会被更新。
import torch
embedding = torch.nn.Embedding(10, 3)
X = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]])
values = embedding(X)
loss = values.mean()
# Use whatever loss you want
loss.backward()
# Let's say those indices in your embedding are pretrained (have GloVe representation)
indices = torch.LongTensor([2, 4, 5])
print("Before zeroing out gradient")
print(embedding.weight.grad)
print("After zeroing out gradient")
embedding.weight.grad[indices] = 0
print(embedding.weight.grad)
和第二种方法的输出:
Before zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417],
        [0.0833, 0.0833, 0.0833],
        [0.0417, 0.0417, 0.0417],
        [0.0833, 0.0833, 0.0833],
        [0.0417, 0.0417, 0.0417],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417]])
After zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417],
        [0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417]])
| 归档时间: | 
 | 
| 查看次数: | 927 次 | 
| 最近记录: |