Keras lstm具有用于可变长度输入的屏蔽层

Flo*_*tel 11 python masking lstm keras

我知道这是一个有很多问题的主题,但我找不到任何问题的解决方案.

我正在使用屏蔽层训练可变长度输入的LSTM网络,但它似乎没有任何影响.

输入形状(100,362,24),其中362是最大序列长度,24是特征数量,100是样本数量(划分75列火车/ 25有效).

输出形状(100,362,1)稍后变换为(100,362-N,1).

这是我的网络代码:

from keras import Sequential
from keras.layers import Embedding, Masking, LSTM, Lambda
import keras.backend as K


#                          O O O
#   example for N:3        | | |
#                    O O O O O O
#                    | | | | | | 
#                    O O O O O O

N = 5
y= y[:,N:,:]

x_train = x[:75]
x_test = x[75:]
y_train = y[:75]
y_test = y[75:]

model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :]))

model.compile('adam', 'mae')

print(model.summary())
history = model.fit(x_train, y_train, 
                    epochs=3, 
                    batch_size=15, 
                    validation_data=[x_test, y_test])
Run Code Online (Sandbox Code Playgroud)

我的数据在最后填充.例:

>> x_test[10,350]
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
   0., 0., 0., 0., 0., 0., 0.], dtype=float32)
Run Code Online (Sandbox Code Playgroud)

问题是掩模层似乎没有效果.我可以看到它在训练期间打印的损失值等于没有掩码的损失值我计算之后:

Layer (type)                 Output Shape              Param #   
=================================================================
masking_1 (Masking)          (None, 362, 24)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 362, 128)          78336     
_________________________________________________________________
lstm_2 (LSTM)                (None, 362, 64)           49408     
_________________________________________________________________
lstm_3 (LSTM)                (None, 362, 1)            264       
_________________________________________________________________
lambda_1 (Lambda)            (None, 357, 1)            0         
=================================================================
Total params: 128,008
Trainable params: 128,008
Non-trainable params: 0
_________________________________________________________________
None
Train on 75 samples, validate on 25 samples
Epoch 1/3
75/75 [==============================] - 8s 113ms/step - loss: 0.1711 - val_loss: 0.1814
Epoch 2/3
75/75 [==============================] - 5s 64ms/step - loss: 0.1591 - val_loss: 0.1307
Epoch 3/3
75/75 [==============================] - 5s 63ms/step - loss: 0.1057 - val_loss: 0.1034

>> from sklearn.metrics import mean_absolute_error
>> out = model.predict(x_test, batch_size=1)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
wo mask 0.10343371
w mask 0.16236152
Run Code Online (Sandbox Code Playgroud)

此外,如果我使用nan值作为屏蔽输出值,我可以看到在训练期间传播的nan(损失等于nan).

我缺少什么使掩蔽层按预期工作?

Yu-*_*ang 13

Lambda层,在默认情况下,不会传播口罩.换句话说,由该Masking层计算的掩模张量被该层抛弃Lambda,因此该Masking层对输出损耗没有影响.

如果希望图层的compute_mask方法Lambda传播先前的蒙版,则必须在mask创建图层时提供参数.从Lambda图层的源代码可以看出,

def __init__(self, function, output_shape=None,
             mask=None, arguments=None, **kwargs):
    # ...
    if mask is not None:
        self.supports_masking = True
    self.mask = mask

# ...

def compute_mask(self, inputs, mask=None):
    if callable(self.mask):
        return self.mask(inputs, mask)
    return self.mask
Run Code Online (Sandbox Code Playgroud)

因为默认值maskNone,compute_mask收益None和损失没有被屏蔽的.

要解决此问题,由于Lambda图层本身不会引入任何其他遮罩,因此该compute_mask方法应该只从前一层返回遮罩(使用适当的切片以匹配图层的输出形状).

masking_func = lambda inputs, previous_mask: previous_mask[:, N:]
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(1, return_sequences=True))
model.add(Lambda(lambda x: x[:, N:, :], mask=masking_func))
Run Code Online (Sandbox Code Playgroud)

现在您应该能够看到正确的损失值.

>> model.evaluate(x_test, y_test, verbose=0)
0.2660679519176483
>> out = model.predict(x_test)
>> print('wo mask', mean_absolute_error(y_test.ravel(), out.ravel()))
wo mask 0.26519736809498456
>> print('w mask', mean_absolute_error(y_test[~(x_test[:,N:] == 0).all(axis=2)].ravel(), out[~(x_test[:,N:] == 0).all(axis=2)].ravel()))
w mask 0.2660679670482195
Run Code Online (Sandbox Code Playgroud)

使用NaN值进行填充不起作用,因为掩码是通过将损失张量乘以二进制掩码来完成的(0 * nan仍然是nan,因此平均值将是nan).