ZH *_*LIU 7 machine-learning neural-network deep-learning lstm pytorch
嗨,我有一个问题,关于如何从BI-LSTM模块的输出中收集正确的结果。
假设我有一个10长度的序列输入到具有100个隐藏单元的单层LSTM模块中:
lstm = nn.LSTM(5, 100, 1, bidirectional=True)
Run Code Online (Sandbox Code Playgroud)
output
形状如下:
[10 (seq_length), 1 (batch), 200 (num_directions * hidden_size)]
# or according to the doc, can be viewed as
[10 (seq_length), 1 (batch), 2 (num_directions), 100 (hidden_size)]
Run Code Online (Sandbox Code Playgroud)
如果我想同时获得两个方向上的第三个(1-索引)输入输出(两个100维矢量),我该如何正确执行呢?
我知道output[2, 0]
会给我一个200维的向量。这个200个暗淡的向量代表两个方向上的第三个输入的输出吗?
令我困扰的是,当进行反向进给时,从第8个(1-索引)输入计算出第三个(1-索引)输出矢量,对吗?
pytorch会自动处理此问题并考虑方向分组输出吗?
谢谢!
我知道 output[2, 0] 会给我一个 200-dim 的向量。这个 200 暗向量是否表示两个方向的第三个输入的输出?
答案是肯定的。
output
LSTM 模块输出的张量是输入序列中相应位置的前向 LSTM 输出和后向 LSTM 输出的串联。而h_n
张是最后的时间戳输出这是向前LSTM但在落后LSTM第一令牌令牌的LSAT的输出。
In [1]: import torch
...: lstm = torch.nn.LSTM(input_size=5, hidden_size=3, bidirectional=True)
...: seq_len, batch, input_size, num_directions = 3, 1, 5, 2
...: in_data = torch.randint(10, (seq_len, batch, input_size)).float()
...: output, (h_n, c_n) = lstm(in_data)
...:
In [2]: # output of shape (seq_len, batch, num_directions * hidden_size)
...:
...: print(output)
...:
tensor([[[ 0.0379, 0.0169, 0.2539, 0.2547, 0.0456, -0.1274]],
[[ 0.7753, 0.0862, -0.0001, 0.3897, 0.0688, -0.0002]],
[[ 0.7120, 0.2965, -0.3405, 0.0946, 0.0360, -0.0519]]],
grad_fn=<CatBackward>)
In [3]: # h_n of shape (num_layers * num_directions, batch, hidden_size)
...:
...: print(h_n)
...:
tensor([[[ 0.7120, 0.2965, -0.3405]],
[[ 0.2547, 0.0456, -0.1274]]], grad_fn=<ViewBackward>)
In [4]: output = output.view(seq_len, batch, num_directions, lstm.hidden_size)
...: print(output[-1, 0, 0]) # forward LSTM output of last token
...: print(output[0, 0, 1]) # backward LSTM output of first token
...:
tensor([ 0.7120, 0.2965, -0.3405], grad_fn=<SelectBackward>)
tensor([ 0.2547, 0.0456, -0.1274], grad_fn=<SelectBackward>)
In [5]: h_n = h_n.view(lstm.num_layers, num_directions, batch, lstm.hidden_size)
...: print(h_n[0, 0, 0]) # h_n of forward LSTM
...: print(h_n[0, 1, 0]) # h_n of backward LSTM
...:
tensor([ 0.7120, 0.2965, -0.3405], grad_fn=<SelectBackward>)
tensor([ 0.2547, 0.0456, -0.1274], grad_fn=<SelectBackward>)
Run Code Online (Sandbox Code Playgroud)
使用BiLSTM时,仅隐藏方向的隐藏状态(中间的第二部分是隐藏状态,以相反的顺序进行馈送)。
因此,在中间拆分会很好。
由于从右到左的尺寸重塑工作,将两个方向分开不会有任何问题。
这是一个小例子:
# so these are your original hidden states for each direction
# in this case hidden size is 5, but this works for any size
direction_one_out = torch.tensor(range(5))
direction_two_out = torch.tensor(list(reversed(range(5))))
print('Direction one:')
print(direction_one_out)
print('Direction two:')
print(direction_two_out)
# before outputting they will be concatinated
# I'm adding here batch dimension and sequence length, in this case seq length is 1
hidden = torch.cat((direction_one_out, direction_two_out), dim=0).view(1, 1, -1)
print('\nYour hidden output:')
print(hidden, hidden.shape)
# trivial case, reshaping for one hidden state
hidden_reshaped = hidden.view(1, 1, 2, -1)
print('\nReshaped:')
print(hidden_reshaped, hidden_reshaped.shape)
# This works as well for abitrary sequence lengths as you can see here
# I've set sequence length here to 5, but this will work for any other value as well
print('\nThis also works for more multiple hidden states in a tensor:')
multi_hidden = hidden.expand(5, 1, 10)
print(multi_hidden, multi_hidden.shape)
print('Directions can be split up just like this:')
multi_hidden = multi_hidden.view(5, 1, 2, 5)
print(multi_hidden, multi_hidden.shape)
Run Code Online (Sandbox Code Playgroud)
输出:
Direction one:
tensor([0, 1, 2, 3, 4])
Direction two:
tensor([4, 3, 2, 1, 0])
Your hidden output:
tensor([[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]]]) torch.Size([1, 1, 10])
Reshaped:
tensor([[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]]]) torch.Size([1, 1, 2, 5])
This also works for more multiple hidden states in a tensor:
tensor([[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]],
[[0, 1, 2, 3, 4, 4, 3, 2, 1, 0]]]) torch.Size([5, 1, 10])
Directions can be split up just like this:
tensor([[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]],
[[[0, 1, 2, 3, 4],
[4, 3, 2, 1, 0]]]]) torch.Size([5, 1, 2, 5])
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助!:)
归档时间: |
|
查看次数: |
3313 次 |
最近记录: |