LSTM 预测一条直线

VIS*_*SQL 4 python forecasting lstm keras

我在 Keras 中构建了一个 LSTM。它读取 9 个时间滞后的观察结果,并预测下一个标签。由于某种原因,我训练的模型预测的东西几乎是一条直线。模型架构中可能存在什么问题导致如此糟糕的回归结果?

输入数据:每小时金融时间序列,上升趋势明显1200+条记录

输入数据维度:
- 原始:

X_train.shape (1212, 9)
Run Code Online (Sandbox Code Playgroud)

- 针对 LSTM 进行重塑:

Z_train.shape (1212, 1, 9)


array([[[0.45073171, 0.46783444, 0.46226164, ..., 0.47164819,
         0.47649667, 0.46017738]],

       [[0.46783444, 0.46226164, 0.4553289 , ..., 0.47649667,
         0.46017738, 0.47167775]],
Run Code Online (Sandbox Code Playgroud)

目标数据:y_train

69200    0.471678
69140    0.476364
69080    0.467761
       ...   
7055     0.924937
7017     0.923651
7003     0.906253
Name: Close, Length: 1212, dtype: float64

type(y_train)
<class 'pandas.core.series.Series'>
Run Code Online (Sandbox Code Playgroud)

LSTM设计:

my = Sequential()
my.add(LSTM((20),batch_input_shape=(None,1,9), return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(1))
Run Code Online (Sandbox Code Playgroud)

input layer of 9 nodes. 3 hidden layers of 20 units each. 1 output layers of 1 unit.

The Keras default is return_sequences=False

Model is compiled with mse loss, and adam or sgd optimizer.

curr_model.compile(optimizer=optmfunc, loss="mse")
Run Code Online (Sandbox Code Playgroud)

Model is fit in this manner. Batch is 32, shuffle can be True/False

curr_model.fit(Z_train, y_train,
                           validation_data=(Z_validation,y_validation),
                           epochs=noepoch, verbose=0,
                           batch_size=btchsize,
                           shuffle=shufBOOL)
Run Code Online (Sandbox Code Playgroud)

Config and Weights are saved to disk. Since I'm training several models, I load them afterward to test certain performance metrics.

spec_model.model.save_weights(mname_trn)
mkerascfg = spec_model.model.to_json()
    with open(mname_cfg, "w") as json_file:
        json_file.write(mkerascfg)
Run Code Online (Sandbox Code Playgroud)


When I trained an MLP, I got this result against the validation set:

在此输入图像描述

I've trained several of the LSTMs, but the result against the validation set looks like this:

在此输入图像描述

The 2nd plot (LSTM plot) is of the validation data. This is y_validation versus predictions on Z_validation. They are the last 135 records in respective arrays. These were split out of full data (i.e validation), and have the same type/properties as Z_train and y_train. The x-axis is just numbering 0 to 134 of the index, and y-axis it the value of y_validation or the prediction. Units are normalized in both arrays. So all the units are the same. The "straight" line is the prediction.

What idea could you suggest on why this is happening? - I've changed batch sizes. Similar result. - I've tried changing the return_sequences, but it leads to various errors around shape for subsequent layers, etc.

Information about LSTM progression of MSE loss

There are 4 models trained, all with the same issue of course. We'll just focus on the 3 hidden layer, 20-unit per layer, LSTM, as defined above.(Mini-batch size was 32, and shuffling was disabled. But enabling changed nothing).

This is a slightly zoomed in image of the loss progressionfor the first model (adam optimizer)

在此输入图像描述

From what I can tell by messing with the index, that bounce in the loss values (which creates the thick area) starts after in the 500s of epochs.

在此输入图像描述

Ove*_*gon 6

您的代码有一个关键问题:维度改组。LSTM 期望输入的形状为(batch_size, timesteps, channels)( 或(num_samples, timesteps, features)) - 而您正在使用 9 个通道提供一个时间步长。时间上的反向传播甚至从未发生过。

修复:将输入重塑为(1212, 9, 1).


建议:阅读这个答案。它很长,但可以节省您的调试时间;这些信息在其他地方无法以如此紧凑的形式提供,我希望在开始使用 LSTM 时就已经拥有它。

对相关问题的回答也可能有用 - 但之前的链接更重要。