TensorFlow使用LSTM生成文本

seb*_*rik 14 lstm tensorflow

我想使用tensorflow生成文本,并且一直在修改LSTM教程(https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks)代码来执行此操作,然而,我的初始解决方案似乎产生了废话,即使经过长时间的训练,它也没有改善.我不明白为什么.我们的想法是从零矩阵开始,然后一次生成一个单词.

这是代码,我在https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py下添加了两个函数.

发电机看起来如下

def generate_text(session,m,eval_op):

    state = m.initial_state.eval()

    x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32)

    output = str()
    for i in xrange(m.batch_size):
        for step in xrange(m.num_steps):
            try:
                # Run the batch 
                # targets have to bee set but m is the validation model, thus it should not train the neural network
                cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
                                                            {m.input_data: x, m.targets: x, m.initial_state: state})

                # Sample a word-id and add it to the matrix and output
                word_id = sample(probabilities[0,:])
                output = output + " " + reader.word_from_id(word_id)
                x[i][step] = word_id

            except ValueError as e:
                print("ValueError")

    print(output)
Run Code Online (Sandbox Code Playgroud)

我已经将变量"probabilities"添加到ptb_model中,它只是logits上的softmax.

self._probabilities = tf.nn.softmax(logits)
Run Code Online (Sandbox Code Playgroud)

抽样:

def sample(a, temperature=1.0):
    # helper function to sample an index from a probability array
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    return np.argmax(np.random.multinomial(1, a, 1))
Run Code Online (Sandbox Code Playgroud)

小智 18

我一直在朝着完全相同的目标努力,并且让它发挥作用.你在这里有很多正确的修改,但我认为你错过了一些步骤.

首先,要生成文本,您需要创建一个仅代表单个时间步长的模型的不同版本.原因是我们需要对每个输出y进行采样,然后才能将其输入模型的下一步.我这样做是通过创建一个新的配置设置,num_steps并且batch_size都等于1.

class SmallGenConfig(object):
  """Small config. for generation"""
  init_scale = 0.1
  learning_rate = 1.0
  max_grad_norm = 5
  num_layers = 2
  num_steps = 1 # this is the main difference
  hidden_size = 200
  max_epoch = 4
  max_max_epoch = 13
  keep_prob = 1.0
  lr_decay = 0.5
  batch_size = 1
  vocab_size = 10000
Run Code Online (Sandbox Code Playgroud)

我还使用以下几行为模型添加了一个概率:

self._output_probs = tf.nn.softmax(logits)
Run Code Online (Sandbox Code Playgroud)

@property
def output_probs(self):
  return self._output_probs
Run Code Online (Sandbox Code Playgroud)

然后,我的generate_text()功能有一些不同.第一个是我使用tf.train.Saver()对象从磁盘加载已保存的模型参数.请注意,我们在使用上面的新配置实例化PTBModel之后执行此操作.

def generate_text(train_path, model_path, num_sentences):
  gen_config = SmallGenConfig()

  with tf.Graph().as_default(), tf.Session() as session:
    initializer = tf.random_uniform_initializer(-gen_config.init_scale,
                                                gen_config.init_scale)    
    with tf.variable_scope("model", reuse=None, initializer=initializer):
      m = PTBModel(is_training=False, config=gen_config)

    # Restore variables from disk.
    saver = tf.train.Saver() 
    saver.restore(session, model_path)
    print("Model restored from file " + model_path)
Run Code Online (Sandbox Code Playgroud)

第二个区别是我从id到字符串获取查找表(我必须编写这个函数,请参阅下面的代码).

    words = reader.get_vocab(train_path)
Run Code Online (Sandbox Code Playgroud)

我以与您相同的方式设置初始状态,但随后我以不同的方式设置初始令牌.我想使用"句末"标记,以便我用正确的单词类型开始我的句子.我查看单词索引,发现<eos>碰巧有索引2(确定性),所以我只是硬编码.最后,我将它包装在1x1 Numpy Matrix中,这样它就是模型输入的正确类型.

    state = m.initial_state.eval()
    x = 2 # the id for '<eos>' from the training set
    input = np.matrix([[x]])  # a 2D numpy matrix 
Run Code Online (Sandbox Code Playgroud)

最后,这是我们生成句子的部分.请注意,我们告诉session.run()计算output_probsfinal_state.我们给它输入和状态.在第一次迭代中,输入是<eos>,状态是initial_state,但在随后的迭代中,我们将最后一次采样输出作为输入,并且我们从最后一次迭代传递状态.另请注意,我们使用words列表从输出索引中查找单词字符串.

    text = ""
    count = 0
    while count < num_sentences:
      output_probs, state = session.run([m.output_probs, m.final_state],
                                   {m.input_data: input,
                                    m.initial_state: state})
      x = sample(output_probs[0], 0.9)
      if words[x]=="<eos>":
        text += ".\n\n"
        count += 1
      else:
        text += " " + words[x]
      # now feed this new word as input into the next iteration
      input = np.matrix([[x]]) 
Run Code Online (Sandbox Code Playgroud)

然后我们要做的就是打印出我们积累的文字.

    print(text)
  return
Run Code Online (Sandbox Code Playgroud)

就是这个generate_text()功能.

最后,让我向您展示get_vocab()我在reader.py中输入的函数定义.

def get_vocab(filename):
  data = _read_words(filename)

  counter = collections.Counter(data)
  count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

  words, _ = list(zip(*count_pairs))

  return words
Run Code Online (Sandbox Code Playgroud)

您需要做的最后一件事是能够在训练后保存模型,看起来像

save_path = saver.save(session, "/tmp/model.ckpt")
Run Code Online (Sandbox Code Playgroud)

这是您在生成文本后将从磁盘加载的模型.

还有一个问题:我发现Tensorflow softmax函数产生的概率分布有时并不精确到1.0.当总和大于1.0时,np.random.multinomial()抛出错误.所以我必须编写自己的采样函数,看起来像这样

def sample(a, temperature=1.0):
  a = np.log(a) / temperature
  a = np.exp(a) / np.sum(np.exp(a))
  r = random.random() # range: [0,1)
  total = 0.0
  for i in range(len(a)):
    total += a[i]
    if total>r:
      return i
  return len(a)-1 
Run Code Online (Sandbox Code Playgroud)

当你把所有这些放在一起时,小模型能够产生一些很酷的句子.祝好运.


小智 0

我用的是你的代码,好像不对。所以我稍微修改一下,看起来可行。这是我的代码,我不确定它是否正确:

def generate_text(session,m,eval_op, word_list):
output = []
for i in xrange(20):
    state = m.initial_state.eval()
    x = np.zeros((1,1), dtype=np.int32)
    y = np.zeros((1,1), dtype=np.int32)
    output_str = ""
    for step in xrange(100):
        if True:
            # Run the batch 
            # targets have to bee set but m is the validation model, thus it should not train the neural network
            cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
                                                        {m.input_data: x, m.targets: y, m.initial_state: state})
            # Sample a word-id and add it to the matrix and output
            word_id = sample(probabilities[0,:])
            if (word_id<0) or (word_id > len(word_list)):
                continue
            #print(word_id)
            output_str = output_str + " " + word_list[word_id]
            x[0][0] = word_id
    print(output_str)
    output.append(output_str)
return output
Run Code Online (Sandbox Code Playgroud)