Tensorflow中的预定采样

Kev*_*eng 7 python machine-learning deep-learning tensorflow sequence-to-sequence

关于seq2seq模型的最新Tensorflow api包括预定的采样:

https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/ScheduledEmbeddingTrainingHelper https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/ScheduledOutputTrainingHelper

有关预定抽样的原始论文可在此处找到:https: //arxiv.org/abs/1506.03099

我看报纸,但我不明白之间的差别ScheduledEmbeddingTrainingHelperScheduledOutputTrainingHelper.该文档仅说ScheduledEmbeddingTrainingHelper是一个培训助手,它增加了预定的采样,同时ScheduledOutputTrainingHelper是一个培训助手,可以将预定的采样直接添加到输出.

我想知道这两个助手之间有什么区别?

Pet*_*den 9

我联系了这背后的工程师,他回答说:

输出采样器在该时间步骤发出原始rnn输出或原始接地事实.嵌入式采样器将rnn输出视为分布的对数,并从该分类分布中发出采样id的嵌入查找,或者在该时间步骤发出原始基础事实.


Mat*_*rro 5

这是ScheduledEmbeddingTrainingHelper使用TensorFlow 1.3和一些更高级别的tf.contrib API 的基本示例.它是一个sequence2sequence模型,其中解码器的初始隐藏状态是编码器的最终隐藏状态.它仅显示如何训练单个批次(显然任务是"逆转此序列").对于实际的培训任务,我建议查看tf.contrib.learn API,例如learn_runner,Experiment和tf.estimator.Estimator.

import tensorflow as tf
import numpy as np
from tensorflow.python.layers.core import Dense

vocab_size = 7
embedding_size = 5
lstm_units = 10

src_batch = np.array([[1, 2, 3], [4, 5, 6]])
trg_batch = np.array([[3, 2, 1], [6, 5, 4]])

# *_seq will have shape (2, 3), *_seq_len will have shape (2)
source_seq = tf.placeholder(shape=(None, None), dtype=tf.int32)
target_seq = tf.placeholder(shape=(None, None), dtype=tf.int32)
source_seq_len = tf.placeholder(shape=(None,), dtype=tf.int32)
target_seq_len = tf.placeholder(shape=(None,), dtype=tf.int32)

# add Start of Sequence (SOS) tokens to each sequence
batch_size, sequence_size = tf.unstack(tf.shape(target_seq))
sos_slice = tf.zeros([batch_size, 1], dtype=tf.int32) # 0 = start of sentence token
decoder_input = tf.concat([sos_slice, target_seq], axis=1)

embedding_matrix = tf.get_variable(
    name="embedding_matrix",
    shape=[vocab_size, embedding_size],
    dtype=tf.float32)
source_seq_embedded = tf.nn.embedding_lookup(embedding_matrix, source_seq) # shape=(2, 3, 5)
decoder_input_embedded = tf.nn.embedding_lookup(embedding_matrix, decoder_input) # shape=(2, 4, 5)

unused_encoder_outputs, encoder_state = tf.nn.dynamic_rnn(
    tf.contrib.rnn.LSTMCell(lstm_units),
    source_seq_embedded,
    sequence_length=source_seq_len,
    dtype=tf.float32)

# Decoder:
# At each time step t and for each sequence in the batch, we get x_t by either
#   (1) sampling from the distribution output_layer(t-1), or
#   (2) reading from decoder_input_embedded.
# We do (1) with probability sampling_probability and (2) with 1 - sampling_probability.
# Using sampling_probability=0.0 is equivalent to using TrainingHelper (no sampling).
# Using sampling_probability=1.0 is equivalent to doing inference,
# where we don't supervise the decoder at all: output at t-1 is the input at t.
sampling_prob = tf.Variable(0.0, dtype=tf.float32)
helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(
    decoder_input_embedded,
    target_seq_len,
    embedding_matrix,
    sampling_probability=sampling_prob)

output_layer = Dense(vocab_size)
decoder = tf.contrib.seq2seq.BasicDecoder(
    tf.contrib.rnn.LSTMCell(lstm_units),
    helper,
    encoder_state,
    output_layer=output_layer)

outputs, state, seq_len = tf.contrib.seq2seq.dynamic_decode(decoder)
loss = tf.contrib.seq2seq.sequence_loss(
    logits=outputs.rnn_output,
    targets=target_seq,
    weights=tf.ones(trg_batch.shape))

train_op = tf.contrib.layers.optimize_loss(
    loss=loss,
    global_step=tf.contrib.framework.get_global_step(),
    optimizer=tf.train.AdamOptimizer,
    learning_rate=0.001)

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    _, _loss = session.run([train_op, loss], {
        source_seq: src_batch,
        target_seq: trg_batch,
        source_seq_len: [3, 3],
        target_seq_len: [3, 3],
        sampling_prob: 0.5
    })
    print("Loss: " + str(_loss))
Run Code Online (Sandbox Code Playgroud)

因为ScheduledOutputTrainingHelper,我希望只需更换帮助器并使用:

helper = tf.contrib.seq2seq.ScheduledOutputTrainingHelper(
    target_seq,
    target_seq_len,
    sampling_probability=sampling_prob)
Run Code Online (Sandbox Code Playgroud)

但是这会产生错误,因为LSTM单元格需要每个时间步长(形状(batch_size,input_dims))的多维输入.我将在GitHub中提出一个问题,以确定这是一个错误,还是有其他方法可以使用ScheduledOutputTrainingHelper.