用于 LSTM 的时间序列数据的训练-测试分割

Question

用于 LSTM 的时间序列数据的训练-测试分割

War*_*ior 5 python regression time-series lstm keras

values = df.values
train, test = train_test_split(values)

#Split into train and test
X_train, y_train = train[:, :-1], train[:, -1]
X_test, y_test = test[:, :-1], test[:, -1]

Run Code Online (Sandbox Code Playgroud)

执行上述代码将时间序列数据集分为训练 75% 和测试 25%。我想将训练测试比例控制为 80-20 或 90-10。有人可以帮助我了解如何将数据集分割成我想要的任何比例吗？

这个概念借用自https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/。

注意：我无法随机分割数据集进行训练和测试，并且最新值必须用于测试。我已经包含了我的数据集的屏幕截图。

如果有人可以解释代码，请帮助我理解上述内容。谢谢。

Answer 1

Aka*_*rey 4

首先，您应该使用切片或 sklearn 的train_test_split将数据分为训练和测试（记住用于shuffle=False时间序列数据）。

#divide data into train and test
train_ind = int(len(df)*0.8)
train = df[:train_ind]
test = df[train_ind:]

Run Code Online (Sandbox Code Playgroud)

然后，您想要使用 Keras 的TimeseriesGenerator生成序列，供 LSTM 用作输入。这个博客很好地解释了它的用法。

from keras.preprocessing.sequence import TimeseriesGenerator

n_input = 2 #length of output
generator = TimeseriesGenerator(train, targets=train, length=n_input)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，3 月前
查看次数：	13988 次
最近记录：	5 年，3 月前