有状态LSTM和流预测

Shl*_*rtz 13 python stateful lstm keras tensorflow

我已经在7批样品的多批次上训练了一个LSTM模型(用Keras和TF构建),每个样品有3个特征,下面的样本形状类似(下面的数字只是占位符以便解释),每个批次标记为0或1:

数据:

[
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
   ...
]
Run Code Online (Sandbox Code Playgroud)

即:m个序列的批次,每个长度为7,其元素是三维向量(因此批次具有形状(m*7*3))

目标:

[
   [1]
   [0]
   [1]
   ...
]
Run Code Online (Sandbox Code Playgroud)

在我的生产环境中,数据是具有3个特征([1,2,3],[1,2,3]...)的样本流.我希望在每个样本到达我的模型时流式传输并获得中间概率而不等待整个批次(7) - 请参阅下面的动画.

在此输入图像描述

我的一个想法是用缺少的样本填充批处理0, [[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[1,2,3]]但这似乎是低效的.

我将非常感谢任何帮助,这些帮助将指引我以持久的方式保存LSTM中间状态,同时等待下一个样本并预测使用部分数据训练特定批量大小的模型.


更新,包括型号代码:

opt = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=10e-8, decay=0.001)
model = Sequential()

num_features = data.shape[2]
num_samples = data.shape[1]

first_lstm = LSTM(32, batch_input_shape=(None, num_samples, num_features), return_sequences=True, activation='tanh')
model.add(
    first_lstm)
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(LSTM(16, return_sequences=True, activation='tanh'))
model.add(Dropout(0.2))
model.add(LeakyReLU())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer=opt,
              metrics=['accuracy', keras_metrics.precision(), keras_metrics.recall(), f1])
Run Code Online (Sandbox Code Playgroud)

型号摘要:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 100, 32)           6272      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 100, 32)           0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 32)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 16)           3136      
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 16)           0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 100, 16)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 1601      
=================================================================
Total params: 11,009
Trainable params: 11,009
Non-trainable params: 0
_________________________________________________________________
Run Code Online (Sandbox Code Playgroud)

Dan*_*ler 8

我认为可能有一个更简单的解决方案。

如果您的模型没有卷积层或任何其他作用于长度/步长维度的层,您可以简单地将其标记为 stateful=True

警告:您的模型具有作用于长度维度的层!

Flatten层将长度维度转换为特征维度。这将完全阻止您实现目标。如果Flatten图层需要 7 个步骤,则您将始终需要 7 个步骤。

因此,在应用下面的答案之前,请修复您的模型以不使用该Flatten图层。相反,它可以只取出return_sequences=True最后LSTM层。

以下代码修复了该问题,并准备了一些与以下答案一起使用的内容:

def createModel(forTraining):

    #model for training, stateful=False, any batch size   
    if forTraining == True:
        batchSize = None
        stateful = False

    #model for predicting, stateful=True, fixed batch size
    else:
        batchSize = 1
        stateful = True

    model = Sequential()

    first_lstm = LSTM(32, 
        batch_input_shape=(batchSize, num_samples, num_features), 
        return_sequences=True, activation='tanh', 
        stateful=stateful)   

    model.add(first_lstm)
    model.add(LeakyReLU())
    model.add(Dropout(0.2))

    #this is the last LSTM layer, use return_sequences=False
    model.add(LSTM(16, return_sequences=False, stateful=stateful,  activation='tanh'))

    model.add(Dropout(0.2))
    model.add(LeakyReLU())

    #don't add a Flatten!!!
    #model.add(Flatten())

    model.add(Dense(1, activation='sigmoid'))

    if forTraining == True:
        compileThisModel(model)
Run Code Online (Sandbox Code Playgroud)

有了这个,您将能够进行 7 步训练并一步进行预测。否则是不可能的。

使用有状态模型作为您问题的解决方案

首先,再次训练这个新模型,因为它没有 Flatten 层:

trainingModel = createModel(forTraining=True)
trainThisModel(trainingModel)
Run Code Online (Sandbox Code Playgroud)

现在,使用这个经过训练的模型,您可以简单地创建一个与创建训练模型完全相同的模型,但要stateful=True在其所有 LSTM 层中进行标记。我们应该从训练好的模型中复制权重。

由于这些新层需要固定的批量大小(Keras 的规则),我假设它是 1(一个单一的流即将到来,而不是 m 个流)并将其添加到上面的模型创建中。

predictingModel = createModel(forTraining=False)
predictingModel.set_weights(trainingModel.get_weights())
Run Code Online (Sandbox Code Playgroud)

瞧。只需一步即可预测模型的输出:

pseudo for loop as samples arrive to your model:
    prob = predictingModel.predict_on_batch(sample)

    #where sample.shape == (1, 1, 3)
Run Code Online (Sandbox Code Playgroud)

当您决定到达您认为是连续序列的末尾时,请调用,predictingModel.reset_states()以便您可以安全地开始一个新序列,而模型不会认为它应该在前一个序列的末尾修复。


保存和加载状态

只需获取并设置它们,用 h5py 保存:

def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, 
            #consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s),
                                 data=K.eval(stat), 
                                 dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()
Run Code Online (Sandbox Code Playgroud)

保存/加载状态的工作测试

import h5py, numpy as np
from keras.layers import RNN, LSTM, Dense, Input
from keras.models import Model
import keras.backend as K




def createModel():
    inp = Input(batch_shape=(1,None,3))
    out = LSTM(5,return_sequences=True, stateful=True)(inp)
    out = LSTM(2, stateful=True)(out)
    out = Dense(1)(out)
    model = Model(inp,out)
    return model


def saveStates(model, saveName):

    f = h5py.File(saveName,'w')

    for l, lay in enumerate(model.layers):
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(lay,RNN):
            for s, stat in enumerate(lay.states):
                f.create_dataset('states_' + str(l) + '_' + str(s), data=K.eval(stat), dtype=K.dtype(stat))

    f.close()


def loadStates(model, saveName):

    f = h5py.File(saveName, 'r')
    allStates = list(f.keys())

    for stateKey in allStates:
        name, layer, state = stateKey.split('_')
        layer = int(layer)
        state = int(state)

        K.set_value(model.layers[layer].states[state], f.get(stateKey))

    f.close()

def printStates(model):

    for l in model.layers:
        #if you have nested models, consider making this recurrent testing for layers in layers
        if isinstance(l,RNN):
            for s in l.states:
                print(K.eval(s))   

model1 = createModel()
model2 = createModel()
model1.predict_on_batch(np.ones((1,5,3))) #changes model 1 states

print('model1')
printStates(model1)
print('model2')
printStates(model2)

saveStates(model1,'testStates5')
loadStates(model2,'testStates5')

print('model1')
printStates(model1)
print('model2')
printStates(model2)
Run Code Online (Sandbox Code Playgroud)

对数据方面的考虑

在您的第一个模型中(如果是stateful=False),它认为 中的每个序列m都是独立的,并且与其他序列无关。它还考虑到每个批次都包含唯一的序列。

如果不是这种情况,您可能希望改为训练有状态模型(考虑到每个序列实际上都连接到前一个序列)。然后你需要m一批 1 个序列。-> m x (1, 7 or None, 3).

  • 我认为 `reset_states()` 会弄乱训练过的参数,但我可以确认它不会,所以这与 `predict_on_batch`(克服了必须为 `stateful=True` 指定 batch_size 的问题)相结合使它一个非常紧凑和优雅的解决方案,我一定会自己使用!较少冗长的缺点是它没有解决“以持久方式保存 LSTM 中间状态”的要求 (2认同)

fr_*_*lio 5

如果我理解正确,您有一批m序列,每个长度为 7,其元素是 3 维向量(因此批次具有 shape (m*7*3))。在任何Keras RNN 中,您都可以将return_sequences标志设置为 True中间状态,即,对于每个批次,您将获得相应的 7 个输出,而不是最终预测,其中输出i表示i给定从 0 到 的所有输入的阶段的预测i

但你最终会一下子得到所有。据我所知,Keras 没有提供直接接口用于在处理批处理时检索吞吐量。如果您使用任何CUDNN-optimized 变体,这可能会受到更多限制。您可以做的基本上是将您的批次视为 7 个连续批次的 shape (m*1*3),并将它们逐步提供给您的 LSTM,记录每一步的隐藏状态和预测。为此,您可以设置return_stateTrue并手动执行,也可以简单地设置statefulTrue并让对象跟踪它。


下面的 Python2+Keras 示例应该完全代表你想要的。具体来说:

  • 允许以持久的方式保存整个 LSTM 中间状态
  • 在等待下一个样品时
  • 并预测在特定批量大小上训练的模型,该批量大小可能是任意的和未知的。

为此,它包含一个stateful=True最简单的训练和return_state=True最精确推理的示例,因此您可以了解这两种方法。它还假设您获得了一个已序列化的模型,并且您对此知之甚少。结构与吴恩达课程中的结构密切相关,在主题上绝对比我更权威。由于您没有指定模型是如何训练的,我假设了一个多对一的训练设置,但这可以很容易地进行调整。

from __future__ import print_function
from keras.layers import Input, LSTM, Dense
from keras.models import Model, load_model
from keras.optimizers import Adam
import numpy as np

# globals
SEQ_LEN = 7
HID_DIMS = 32
OUTPUT_DIMS = 3 # outputs are assumed to be scalars


##############################################################################
# define the model to be trained on a fixed batch size:
# assume many-to-one training setup (otherwise set return_sequences=True)
TRAIN_BATCH_SIZE = 20

x_in = Input(batch_shape=[TRAIN_BATCH_SIZE, SEQ_LEN, 3])
lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, stateful=True)
dense = Dense(OUTPUT_DIMS, activation='linear')
m_train = Model(inputs=x_in, outputs=dense(lstm(x_in)))
m_train.summary()

# a dummy batch of training data of shape (TRAIN_BATCH_SIZE, SEQ_LEN, 3), with targets of shape (TRAIN_BATCH_SIZE, 3):
batch123 = np.repeat([[1, 2, 3]], SEQ_LEN, axis=0).reshape(1, SEQ_LEN, 3).repeat(TRAIN_BATCH_SIZE, axis=0)
targets = np.repeat([[123,234,345]], TRAIN_BATCH_SIZE, axis=0) # dummy [[1,2,3],,,]-> [123,234,345] mapping to be learned


# train the model on a fixed batch size and save it
print(">> INFERECE BEFORE TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))
m_train.compile(optimizer=Adam(lr=0.5), loss='mean_squared_error', metrics=['mae'])
m_train.fit(batch123, targets, epochs=100, batch_size=TRAIN_BATCH_SIZE)
m_train.save("trained_lstm.h5")
print(">> INFERECE AFTER TRAINING MODEL:", m_train.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))


##############################################################################
# Now, although we aren't training anymore, we want to do step-wise predictions
# that do alter the inner state of the model, and keep track of that.


m_trained = load_model("trained_lstm.h5")
print(">> INFERECE AFTER RELOADING TRAINED MODEL:", m_trained.predict(batch123, batch_size=TRAIN_BATCH_SIZE, verbose=0))

# now define an analogous model that allows a flexible batch size for inference:
x_in = Input(shape=[SEQ_LEN, 3])
h_in = Input(shape=[HID_DIMS])
c_in = Input(shape=[HID_DIMS])
pred_lstm = LSTM(HID_DIMS, activation="tanh", return_sequences=False, return_state=True, name="lstm_infer")
h, cc, c = pred_lstm(x_in, initial_state=[h_in, c_in])
prediction = Dense(OUTPUT_DIMS, activation='linear', name="dense_infer")(h)
m_inference = Model(inputs=[x_in, h_in, c_in], outputs=[prediction, h,cc,c])

#  Let's confirm that this model is able to load the trained parameters:
# first, check that the performance from scratch is not good:
print(">> INFERENCE BEFORE SWAPPING MODEL:")
predictions, hs, zs, cs = m_inference.predict([batch123,
                                               np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
                                               np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
                                              batch_size=1)
print(predictions)


# import state from the trained model state and check that it works:
print(">> INFERENCE AFTER SWAPPING MODEL:")
for layer in m_trained.layers:
    if "lstm" in layer.name:
        m_inference.get_layer("lstm_infer").set_weights(layer.get_weights())
    elif "dense" in layer.name:
        m_inference.get_layer("dense_infer").set_weights(layer.get_weights())

predictions, _, _, _ = m_inference.predict([batch123,
                                            np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)),
                                            np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))],
                                           batch_size=1)
print(predictions)


# finally perform granular predictions while keeping the recurrent activations. Starting the sequence with zeros is a common practice, but depending on how you trained, you might have an <END_OF_SEQUENCE> character that you might want to propagate instead:
h, c = np.zeros((TRAIN_BATCH_SIZE, HID_DIMS)), np.zeros((TRAIN_BATCH_SIZE, HID_DIMS))
for i in range(len(batch123)):
    # about output shape: https://keras.io/layers/recurrent/#rnn
    # h,z,c hold the network's throughput: h is the proper LSTM output, c is the accumulator and cc is (probably) the candidate
    current_input = batch123[i:i+1] # the length of this feed is arbitrary, doesn't have to be 1
    pred, h, cc, c = m_inference.predict([current_input, h, c])
    print("input:", current_input)
    print("output:", pred)
    print(h.shape, cc.shape, c.shape)
    raw_input("do something with your prediction and hidden state and press any key to continue")
Run Code Online (Sandbox Code Playgroud)

附加信息:

由于我们有两种形式的状态持续性:
1.保存/训练所述模型的参数是针对每个序列的相同
2. ac指出在整个序列,其演变,并且可以被“重新启动”

看看 LSTM 对象的内部结构很有趣。在我提供的 Python 示例中,权重ac权重被显式处理,但训练后的参数不是,而且它们是如何在内部实现的或它们的含义可能并不明显。可以按以下方式检查它们:

for w in lstm.weights:
    print(w.name, w.shape)
Run Code Online (Sandbox Code Playgroud)

在我们的例子中(32 个隐藏状态)返回以下内容:

lstm_1/kernel:0 (3, 128)
lstm_1/recurrent_kernel:0 (32, 128)
lstm_1/bias:0 (128,)
Run Code Online (Sandbox Code Playgroud)

我们观察到 128 维。这是为什么呢?这个链接描述了 Keras LSTM 实现如下:

在此处输入图片说明

g 是循环激活,p 是激活,Ws 是内核,Us 是循环内核,h 是隐藏变量,它也是输出,符号 * 是元素乘法。

这解释了128=32*4在 4 个门中的每一个内发生的仿射变换的参数,连接:

  • 形状矩阵(3, 128)(named kernel) 处理给定序列元素的输入
  • 形状矩阵(32, 128)(named recurrent_kernel) 处理最后一个循环状态的输入h
  • 形状向量(128,)(命名为bias),与任何其他 NN 设置中一样。