我需要有状态或无状态的 LSTM 吗？

Question

我需要有状态或无状态的 LSTM 吗？

我正在尝试为 Keras 中的时间序列预测制作 LSTM。特别是，一旦模型被训练，它应该预测看不见的值。时间序列的可视化如下所示。

该模型在蓝色时间序列上进行训练，并将预测与橙色时间序列进行比较。

对于预测，我想取训练数据的最后n个点（其中n是序列长度），运行一个预测，并将这个预测用于连续（第二个）预测，即：

prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n))
prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n))

Run Code Online (Sandbox Code Playgroud)

我试图让它发挥作用，但到目前为止没有成功。如果我应该使用有状态或无状态模型，以及序列长度的值可能是多少，我感到茫然。有任何人对此有经验吗？

我已经阅读并尝试了各种教程，但没有看到适用于我的数据类型。

因为我想运行连续预测，所以我需要一个有状态模型来防止每次调用model.predict后 keras 重置状态，但是用 1 的批量大小进行训练需要永远......或者有没有办法规避这个问题？

Answer 1

Big*_*dMe 8

Stateful LSTM is used when the whole sequence plays a part in forming the output. Taking an extreme case; you might have 1000-length sequence, and the very first character of that sequence is what actually defines the output:

Stateful If you were to batch this into 10 x 100 length sequences, then with stateful LSTM the connections (state) between sequences in the batch would be retained and it would (with enough examples) learn the relationship of the first character plays significant importance to the output. In effect, sequence length is immaterial because the network's state is persisted across the whole stretch of data, you simply batch it as a means of supplying the data.

Stateless During training, the state is reset after each sequence. So in the example I've given, the network wouldn't learn that it's the first character of the 1000-length sequences that defines the output, because it would never see the long-term dependency because the first character and the final output value are in separate sequences, and the state isn't retained between the sequences.

Summary What you need to determine is whether there is likely to be dependency of data at the end of your time-series being affected by what potentially happened right at the start.

我想说的是，这种长期依赖关系实际上非常罕见，您可能更好的做法是使用无状态 LSTM，但将序列长度设置为超参数，以找出哪些序列长度对数据建模最好，即提供最准确的验证数据。

Answer 2

jor*_*993 0

class LSTMNetwork(object):

def __init__(self, hidden_dim1, hidden_dim2, batch_size, seq_size):

    super(LSTMNetwork, self).__init__()

    self.model = self.build_model(hidden_dim1, hidden_dim2, batch_size, seq_size)

    self.hidden_dim1 = hidden_dim1
    self.hidden_dim2 = hidden_dim2
    self.batch_size = batch_size
    self.seq_size = seq_size

def build_model(self, hidden_dim1, hidden_dim2, batch_size, seq_size):
    """
    Build and return the model
    """
    # Define the model
    model = Sequential()

    # First LSTM and dropout layer
    model.add(LSTM(input_shape=(seq_size,1), output_dim=hidden_dim1, return_sequences=True))
    #model.add(Dropout(0.2))

    # Second LSTM and dropout layer
    model.add(LSTM(hidden_dim2, return_sequences=False))
    model.add(Dense(1))
    #model.add(Dropout(0.2))

    # Fully connected layer, with linear activation
    model.add(Activation("linear"))

    model.compile(loss="mean_squared_error", optimizer="adam")

    return model

def predict(self, x):
    """
    Given a vector of x, predict the output
    """
    out = self.model.predict(x)
    return out

def train_model(self, x, y, num_epochs):

    self.model.fit(x, y, epochs=num_epochs, batch_size=self.batch_size)

def predict_sequence(self, x, n, seq_size):
    """
    Given a sequence of [num_samples x seq_size x num_features], predict the next n values
    """

    curr_window = x[-1, :, :]

    predicted = []

    for i in range(n):
        predicted.append(self.predict(curr_window[np.newaxis, :, :])[0,0])
        curr_window = curr_window[1:]
        curr_window = np.insert(curr_window, [seq_size-1], predicted[-1], axis=0)

    return predicted

def preprocess_data(self, data, seq_size):
    """
    Generate training and target samples in a sliding window fashion. 
    Training samples are of size [num_samples x seq_size x num_features]
    Target samples are of size [num_samples, ]
    """
    x = []
    y = []

    for i in range(len(data) - seq_size-1):
        window = data[i:(i+seq_size)]

        after_window = data[i+seq_size]
        window = [[x] for x in window]

        x.append(window)
        y.append(after_window)

    x = np.array(x)
    y = np.array(y)

    return x, y

Run Code Online (Sandbox Code Playgroud)

当将训练集的最后一行作为输入并对其运行Predict_sequence时，这会在训练后预测一条直线。这可能是因为每次调用model.predict()后模型的状态都会重置吗？

归档时间：	7 年，3 月前
查看次数：	2006 次
最近记录：	7 年，2 月前