Shu*_*aai 22 masking lstm keras loss-function
我正在尝试使用Keras和Tensorflow后端使用LSTM实现序列到序列任务.输入是具有可变长度的英语句子.为了构建具有二维形状[batch_number,max_sentence_length]的数据集,我在行尾添加EOF并用足够的占位符填充每个句子,例如"#".然后将句子中的每个字符转换为单热矢量,现在数据集具有3-D形状[batch_number,max_sentence_length,character_number].在LSTM编码器和解码器层之后,计算输出和目标之间的softmax交叉熵.
为了消除模型训练中的填充效应,可以在输入和丢失功能上使用掩蔽.Keras中的掩码输入可以通过使用"layers.core.Masking"来完成.在Tensorflow中,可以按如下方式 屏蔽损失函数:Tensorflow中的自定义屏蔽损失函数
但是,我没有找到在Keras中实现它的方法,因为keras中使用定义的损失函数只接受参数y_true和y_pred.那么如何将真正的sequence_lengths输入到丢失函数和掩码?
此外,我在\ keras\engine\training.py中找到了一个函数"_weighted_masked_objective(fn)".它的定义是"为目标函数添加对屏蔽和样本加权的支持."但似乎该函数只能接受fn(y_true,y_pred).有没有办法使用这个函数来解决我的问题?
具体来说,我修改了余阳的例子.
from keras.models import Model
from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation
import numpy as np
from numpy.random import seed as random_seed
random_seed(123)
max_sentence_length = 5
character_number = 3 # valid character 'a, b' and placeholder '#'
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
encoder_output = LSTM(10, return_sequences=False)(masked_input)
repeat_output = RepeatVector(max_sentence_length)(encoder_output)
decoder_output = LSTM(10, return_sequences=True)(repeat_output)
output = Dense(3, activation='softmax')(decoder_output)
model = Model(input_tensor, output)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]],
[[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#'
[[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]])
y_pred = model.predict(X)
print('y_pred:', y_pred)
print('y_true:', y_true)
print('model.evaluate:', model.evaluate(X, y_true))
# See if the loss computed by model.evaluate() is equal to the masked loss
import tensorflow as tf
logits=tf.constant(y_pred, dtype=tf.float32)
target=tf.constant(y_true, dtype=tf.float32)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2))
losses = -tf.reduce_sum(target * tf.log(logits),axis=2)
sequence_lengths=tf.constant([3,4])
mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1])
losses = tf.boolean_mask(losses, mask)
masked_loss = tf.reduce_mean(losses)
with tf.Session() as sess:
c_e = sess.run(cross_entropy)
m_c_e=sess.run(masked_loss)
print("tf unmasked_loss:", c_e)
print("tf masked_loss:", m_c_e)
Run Code Online (Sandbox Code Playgroud)
如上所示,在某些层之后禁用屏蔽.那么在添加这些图层时如何屏蔽keras中的损失函数?
Yu-*_*ang 18
如果模型中有掩模,它将逐层传播并最终应用于损失.因此,如果您以正确的方式填充和屏蔽序列,则将忽略填充占位符的丢失.
解释整个过程有点涉及,所以我将其分解为几个步骤:
compile(),通过调用收集掩码compute_mask()并应用于损失(为了清楚起见,忽略不相关的行).weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]
# Prepare output masks.
masks = self.compute_mask(self.inputs, mask=None)
if masks is None:
masks = [None for _ in self.outputs]
if not isinstance(masks, list):
masks = [masks]
# Compute total loss.
total_loss = None
with K.name_scope('loss'):
for i in range(len(self.outputs)):
y_true = self.targets[i]
y_pred = self.outputs[i]
weighted_loss = weighted_losses[i]
sample_weight = sample_weights[i]
mask = masks[i]
with K.name_scope(self.output_names[i] + '_loss'):
output_loss = weighted_loss(y_true, y_pred,
sample_weight, mask)
Run Code Online (Sandbox Code Playgroud)
Model.compute_mask(),run_internal_graph()被称为.run_internal_graph(),模型中的掩模通过Layer.compute_mask()迭代地调用每个层,从模型的输入到输出逐层传播.因此,如果您Masking在模型中使用图层,则不必担心填充占位符的丢失.你可能已经在里面看到这些条目的损失将被掩盖_weighted_masked_objective().
max_sentence_length = 5
character_number = 2
input_tensor = Input(shape=(max_sentence_length, character_number))
masked_input = Masking(mask_value=0)(input_tensor)
output = LSTM(3, return_sequences=True)(masked_input)
model = Model(input_tensor, output)
model.compile(loss='mae', optimizer='adam')
X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]],
[[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]])
y_true = np.ones((2, max_sentence_length, 3))
y_pred = model.predict(X)
print(y_pred)
[[[ 0. 0. 0. ]
[ 0. 0. 0. ]
[-0.11980877 0.05803877 0.07880752]
[-0.00429189 0.13382857 0.19167568]
[ 0.06817091 0.19093043 0.26219055]]
[[ 0. 0. 0. ]
[ 0.0651961 0.10283815 0.12413475]
[-0.04420842 0.137494 0.13727818]
[ 0.04479844 0.17440712 0.24715884]
[ 0.11117355 0.21645413 0.30220413]]]
# See if the loss computed by model.evaluate() is equal to the masked loss
unmasked_loss = np.abs(1 - y_pred).mean()
masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean()
print(model.evaluate(X, y_true))
0.881977558136
print(masked_loss)
0.881978
print(unmasked_loss)
0.917384
Run Code Online (Sandbox Code Playgroud)
从这个例子可以看出,被屏蔽部分的损失(零点y_pred)被忽略,输出model.evaluate()等于masked_loss.
如果有一个复现层return_sequences=False,则掩码停止传播(即,返回的掩码是None).在RNN.compute_mask():
def compute_mask(self, inputs, mask):
if isinstance(mask, list):
mask = mask[0]
output_mask = mask if self.return_sequences else None
if self.return_state:
state_mask = [None for _ in self.states]
return [output_mask] + state_mask
else:
return output_mask
Run Code Online (Sandbox Code Playgroud)
在你的情况,如果我理解正确的话,你希望那是基于一个面具y_true,而当值y_true是[0, 0, 1](的"#"的一个热编码)要被掩盖的损失.如果是这样,你需要以与Daniel的答案有些相似的方式掩盖损失值.
主要区别在于最终平均值.平均值应该取决于未屏蔽值的数量,这是正确的K.sum(mask).而且,y_true可以直接与单热编码矢量进行比较[0, 0, 1].
def get_loss(mask_value):
mask_value = K.variable(mask_value)
def masked_categorical_crossentropy(y_true, y_pred):
# find out which timesteps in `y_true` are not the padding character '#'
mask = K.all(K.equal(y_true, mask_value), axis=-1)
mask = 1 - K.cast(mask, K.floatx())
# multiply categorical_crossentropy with the mask
loss = K.categorical_crossentropy(y_true, y_pred) * mask
# take average w.r.t. the number of unmasked entries
return K.sum(loss) / K.sum(mask)
return masked_categorical_crossentropy
masked_categorical_crossentropy = get_loss(np.array([0, 0, 1]))
model = Model(input_tensor, output)
model.compile(loss=masked_categorical_crossentropy, optimizer='adam')
Run Code Online (Sandbox Code Playgroud)
然后,上面代码的输出显示仅在未屏蔽的值上计算损失:
model.evaluate: 1.08339476585
tf unmasked_loss: 1.08989
tf masked_loss: 1.08339
Run Code Online (Sandbox Code Playgroud)
因为我已经改变了的值是从你的不同axis论点tf.reverse的[0,1]到[1].
| 归档时间: |
|
| 查看次数: |
10882 次 |
| 最近记录: |