VIS*_*SQL 4 python forecasting lstm keras
我在 Keras 中构建了一个 LSTM。它读取 9 个时间滞后的观察结果,并预测下一个标签。由于某种原因,我训练的模型预测的东西几乎是一条直线。模型架构中可能存在什么问题导致如此糟糕的回归结果?
输入数据:每小时金融时间序列,上升趋势明显1200+条记录
输入数据维度:
- 原始:
X_train.shape (1212, 9)
Run Code Online (Sandbox Code Playgroud)
- 针对 LSTM 进行重塑:
Z_train.shape (1212, 1, 9)
array([[[0.45073171, 0.46783444, 0.46226164, ..., 0.47164819,
0.47649667, 0.46017738]],
[[0.46783444, 0.46226164, 0.4553289 , ..., 0.47649667,
0.46017738, 0.47167775]],
Run Code Online (Sandbox Code Playgroud)
目标数据:y_train
69200 0.471678
69140 0.476364
69080 0.467761
...
7055 0.924937
7017 0.923651
7003 0.906253
Name: Close, Length: 1212, dtype: float64
type(y_train)
<class 'pandas.core.series.Series'>
Run Code Online (Sandbox Code Playgroud)
LSTM设计:
my = Sequential()
my.add(LSTM((20),batch_input_shape=(None,1,9), return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(1))
Run Code Online (Sandbox Code Playgroud)
input layer of 9 nodes. 3 hidden layers of 20 units each. 1 output layers of 1 unit.
The Keras default is return_sequences=False
Model is compiled with mse loss, and adam or sgd optimizer.
curr_model.compile(optimizer=optmfunc, loss="mse")
Run Code Online (Sandbox Code Playgroud)
Model is fit in this manner. Batch is 32, shuffle can be True/False
curr_model.fit(Z_train, y_train,
validation_data=(Z_validation,y_validation),
epochs=noepoch, verbose=0,
batch_size=btchsize,
shuffle=shufBOOL)
Run Code Online (Sandbox Code Playgroud)
Config and Weights are saved to disk. Since I'm training several models, I load them afterward to test certain performance metrics.
spec_model.model.save_weights(mname_trn)
mkerascfg = spec_model.model.to_json()
with open(mname_cfg, "w") as json_file:
json_file.write(mkerascfg)
Run Code Online (Sandbox Code Playgroud)
I've trained several of the LSTMs, but the result against the validation set looks like this:
The 2nd plot (LSTM plot) is of the validation data. This is y_validation versus predictions on Z_validation. They are the last 135 records in respective arrays. These were split out of full data (i.e validation), and have the same type/properties as Z_train and y_train. The x-axis is just numbering 0 to 134 of the index, and y-axis it the value of y_validation or the prediction. Units are normalized in both arrays. So all the units are the same. The "straight" line is the prediction.
What idea could you suggest on why this is happening? - I've changed batch sizes. Similar result. - I've tried changing the return_sequences, but it leads to various errors around shape for subsequent layers, etc.
Information about LSTM progression of MSE loss
There are 4 models trained, all with the same issue of course. We'll just focus on the 3 hidden layer, 20-unit per layer, LSTM, as defined above.(Mini-batch size was 32, and shuffling was disabled. But enabling changed nothing).
This is a slightly zoomed in image of the loss progressionfor the first model (adam optimizer)
From what I can tell by messing with the index, that bounce in the loss values (which creates the thick area) starts after in the 500s of epochs.