blu*_*nox 6 python vectorization neural-network deep-learning pytorch
我已经建立了一个注意力的RNN语言模型,我通过参加所有以前的隐藏状态(只有一个方向)为输入的每个元素创建上下文向量.
在我看来,最直接的解决方案是在RNN输出上使用for循环,这样每个上下文向量一个接一个地计算.
import torch
import torch.nn as nn
import torch.nn.functional as F
class RNN_LM(nn.Module):
def __init__(self, hidden_size, vocab_size, embedding_dim=None, droprate=0.5):
super().__init__()
if not embedding_dim:
embedding_dim = hidden_size
self.embedding_matrix = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_size, batch_first=False)
self.attn = nn.Linear(hidden_size, hidden_size)
self.vocab_dist = nn.Linear(hidden_size, vocab_size)
self.dropout = nn.Dropout(droprate)
def forward(self, x):
x = self.dropout(self.embedding_matrix(x.view(-1, 1)))
x, states = self.lstm(x)
#print(x.size())
x = x.squeeze()
content_vectors = [x[0].view(1, -1)]
# for-loop over hidden states and attention
for i in range(1, x.size(0)):
prev_states = x[:i]
current_state = x[i].view(1, -1)
attn_prod = torch.mm(self.attn(current_state), prev_states.t())
attn_weights = F.softmax(attn_prod, dim=1)
context = torch.mm(attn_weights, prev_states)
content_vectors.append(context)
return self.vocab_dist(self.dropout(torch.cat(content_vectors)))
Run Code Online (Sandbox Code Playgroud)
注意:forward此处的方法仅用于培训.
然而,该解决方案不是非常有效,因为代码与后续计算每个上下文向量不能很好地并行化.但由于上下文向量不依赖于彼此,我想知道是否存在非连续的计算方法.
那么有没有一种方法可以在没有for-loop的情况下计算上下文向量,从而可以并行化更多的计算?
好吧,为了清楚起见:我假设我们只真正关心for循环的矢量化。是什么形状的x?假设x是二维的,我有以下代码,其中v1执行循环并且v2是矢量化版本:
import torch
import torch.nn.functional as F
torch.manual_seed(0)
x = torch.randn(3, 6)
def v1():
for i in range(1, x.size(0)):
prev = x[:i]
curr = x[i].view(1, -1)
prod = torch.mm(curr, prev.t())
attn = prod # same shape
context = torch.mm(attn, prev)
print(context)
def v2():
# we're going to unroll the loop by vectorizing over the new,
# 0-th dimension of `x`. We repeat it as many times as there
# are iterations in the for loop
repeated = x.unsqueeze(0).repeat(x.size(0), 1, 1)
# we're looking to build a `prevs` tensor such that
# prevs[i, x, y] == prev[x, y] at i-th iteration of the loop in v1,
# up to 0-padding necessary to make them all the same size.
# We need to build a higher-dimensional equivalent of torch.triu
xs = torch.arange(x.size(0)).reshape(1, -1, 1)
zs = torch.arange(x.size(0)).reshape(-1, 1, 1)
prevs = torch.where(zs < xs, torch.tensor(0.), repeated)
# this is an equivalent of the above iteration starting at 1
prevs = prevs[:-1]
currs = x[1:]
# a batched matrix multiplication
prod = torch.matmul(currs, prevs.transpose(1, 2))
attn = prod # same shape
context = torch.matmul(attn, prevs)
# equivalent of a higher dimensional torch.diagonal
contexts = torch.einsum('iij->ij', (context))
print(contexts)
print(x)
print('\n------ v1 -------\n')
v1()
print('\n------ v2 -------\n')
v2()
Run Code Online (Sandbox Code Playgroud)
它矢量化你的循环,但有一些警告。首先,我假设x是二维的。其次,我跳过声称softmax它不会改变输入的大小,因此不会影响矢量化。这是真的,但不幸的是 0 填充向量的 softmaxv不等于 0 填充向量的 unpangled v。不过,这可以通过重整化来解决。请告诉我我的假设是否正确,以及这是否是您工作的一个足够好的起点。
| 归档时间: |
|
| 查看次数: |
868 次 |
| 最近记录: |