如何训练具有LSTM细胞的RNN用于时间序列预测

Jak*_*kob 21 time-series prediction lstm tensorflow

我目前正在尝试构建一个用于预测时间序列的简单模型.目标是使用序列训练模型,以便模型能够预测未来值.

我正在使用tensorflow和lstm单元格来执行此操作.该模型通过时间截断反向传播进行训练.我的问题是如何构建培训数据.

例如,假设我们想要学习给定的序列:

[1,2,3,4,5,6,7,8,9,10,11,...]
Run Code Online (Sandbox Code Playgroud)

我们将网络展开num_steps=4.

选项1

input data               label     
1,2,3,4                  2,3,4,5
5,6,7,8                  6,7,8,9
9,10,11,12               10,11,12,13
...
Run Code Online (Sandbox Code Playgroud)

选项2

input data               label     
1,2,3,4                  2,3,4,5
2,3,4,5                  3,4,5,6
3,4,5,6                  4,5,6,7
...
Run Code Online (Sandbox Code Playgroud)

选项3

input data               label     
1,2,3,4                  5
2,3,4,5                  6
3,4,5,6                  7
...
Run Code Online (Sandbox Code Playgroud)

选项4

input data               label     
1,2,3,4                  5
5,6,7,8                  9
9,10,11,12               13
...
Run Code Online (Sandbox Code Playgroud)

任何帮助,将不胜感激.

bsa*_*ter 5

我正准备在TensorFlow中学习LSTM并试图实现一个例子(幸运的是)试图通过一个简单的数学函数来预测一些时间序列/数字序列.

但是我使用不同的方式来构建训练数据,这是由使用LSTM的无监督学习视频表示所激发的:

LSTM未来预测模型

选项5:

input data               label     
1,2,3,4                  5,6,7,8
2,3,4,5                  6,7,8,9
3,4,5,6                  7,8,9,10
...
Run Code Online (Sandbox Code Playgroud)

除了本文之外,我(尝试)通过给定的TensorFlow RNN示例获取灵感.我目前的完整解决方案如下所示:

import math
import random
import numpy as np
import tensorflow as tf

LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4


def ground_truth_func(i, j, t):
    return i * math.pow(t, 2) + j


def get_batch(batch_size):
    seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
    tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)

    for b in xrange(batch_size):
        i = float(random.randint(-25, 25))
        j = float(random.randint(-100, 100))
        for t in xrange(NUM_T_STEPS):
            value = ground_truth_func(i, j, t)
            seq[b, t, 0] = value

        for t in xrange(NUM_T_STEPS):
            tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
    return seq, tgt


# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])

fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))

# ENCODER
with tf.variable_scope('ENC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
    state = initial_state
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)

learned_representation = state

# DECODER
with tf.variable_scope('DEC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    state = learned_representation
    logits_stacked = None
    loss = 0.0
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)
        # output can be used to make next number prediction
        logits = tf.matmul(output, fc1_weight) + fc1_bias

        if logits_stacked is None:
            logits_stacked = logits
        else:
            logits_stacked = tf.concat(1, [logits_stacked, logits])

        loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE

reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))

train = tf.train.AdamOptimizer().minimize(reg_loss)

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    total_loss = 0.0
    for step in xrange(MAX_STEPS):
        seq_batch, target_batch = get_batch(BATCH_SIZE)

        feed = {sequence: seq_batch, target: target_batch}
        _, current_loss = sess.run([train, reg_loss], feed)
        if step % 10 == 0:
            print("@{}: {}".format(step, current_loss))
        total_loss += current_loss

    print('Total loss:', total_loss)

    print('### SIMPLE EVAL: ###')
    seq_batch, target_batch = get_batch(BATCH_SIZE)
    feed = {sequence: seq_batch, target: target_batch}
    prediction = sess.run([logits_stacked], feed)
    for b in xrange(BATCH_SIZE):
        print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
        print(" `-> Prediction: {}".format(prediction[0][b]))
Run Code Online (Sandbox Code Playgroud)

此示例输出如下所示:

### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
#  `-> Prediction: [model prediction]  
[  33.   53.  113.  213.] -> [  353.   533.   753.  1013.])
 `-> Prediction: [ 19.74548721  28.3149128   33.11489105  35.06603241]
[ -17.  -32.  -77. -152.] -> [-257. -392. -557. -752.])
 `-> Prediction: [-16.38951683 -24.3657589  -29.49801064 -31.58583832]
[ -7.  -4.   5.  20.] -> [  41.   68.  101.  140.])
 `-> Prediction: [ 14.14126873  22.74848557  31.29668617  36.73633194]
...
Run Code Online (Sandbox Code Playgroud)

该模型是LSTM自动编码器,每个都有2层.

不幸的是,正如您在结果中看到的那样,此模型无法正确学习序列.我可能就是这样,我只是在某个地方犯了一个错误的错误,或者1000-10000的训练步骤对于LSTM来说只是少数几个.正如我所说,我也刚刚开始正确理解/使用LSTM.但希望这可以为您提供有关实施的一些启发.


Rob*_*lak 4

在阅读了几篇 LSTM 介绍博客(例如Jakob Aungiers 的)后,选项 3 似乎是无状态 LSTM 的正确选择。

如果您的 LSTM 需要记住比您更早的数据num_steps,您可以以有状态的方式进行训练 - 有关 Keras 示例,请参阅Philippe Remy 的博客文章“Keras 中的 Stateful LSTM”。然而,Philippe 没有展示批量大小大于 1 的示例。我猜想在您的情况下,带有状态 LSTM 的批量大小为 4 可以与以下数据一起使用(写为input -> label):

batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8

batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12

batch #2:
9,10,11,12 -> 13
...
Run Code Online (Sandbox Code Playgroud)

由此,例如批次#0中的第二个样本的状态被正确地重用以继续使用批次#1的第二个样本进行训练。

这在某种程度上类似于您的选项 4,但是您没有使用那里的所有可用标签。

更新:

作为我的建议的延伸,其中batch_size等于num_steps,Alexis Huet给出了作为 的除数的情况的答案,它可用于更大的。他在他的博客上对此进行了很好的描述batch_sizenum_stepsnum_steps