通过使用 TimeDistributed 堆叠 Convolution2D 和 LSTM 层来理解 ConvLSTM2D 以获得相似的结果

Mus*_*ser 6 python conv-neural-network lstm keras

我有 950 个训练视频样本和 50 个测试视频样本。每个视频样本有 10 帧,每帧的形状为 (n_row=28, n_col=28, n_channels=1)。我的输入 (x) 和输出 (y) 具有相同的形状。

x_train 形状: (950, 10, 28, 28,1),

y_train 形状:(950, 10, 28, 28,1),

x_test 形状:(50, 10, 28, 28,1),

y_test 形状:(50, 10, 28, 28,1)。

我想将输入视频样本 (x) 作为输入到我的模型以预测输出视频样本 (y)。

到目前为止,我的模型是:

from keras.layers import Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed

import numpy as np
########################################################################################
model = Sequential()

model.add(TimeDistributed(Convolution2D(16, (3, 3), padding='same'), input_shape=(None, 28, 28, 1))) 
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.2))

model.add(TimeDistributed(Convolution2D(32, (3, 3), padding='same'))) 
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.2))

model.add(TimeDistributed(Convolution2D(64, (3, 3), padding='same'))) 
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))

model.add(TimeDistributed(Flatten()))

model.add(LSTM(64, return_sequences=True, stateful=False))
model.add(LSTM(64, return_sequences=True, stateful=False))
model.add(Activation('sigmoid'))
model.add(Dense(784, activation='sigmoid'))
model.add(Reshape((-1, 28,28,1)))

model.compile(loss='mean_squared_error', optimizer='rmsprop')
print(model.summary())
Run Code Online (Sandbox Code Playgroud)

该模型的总结是:

Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_1 (TimeDist (None, None, 28, 28, 16)  160       
_________________________________________________________________
activation_1 (Activation)    (None, None, 28, 28, 16)  0         
_________________________________________________________________
time_distributed_2 (TimeDist (None, None, 14, 14, 16)  0         
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 14, 14, 16)  0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, None, 14, 14, 32)  4640      
_________________________________________________________________
activation_2 (Activation)    (None, None, 14, 14, 32)  0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, None, 7, 7, 32)    0         
_________________________________________________________________
dropout_2 (Dropout)          (None, None, 7, 7, 32)    0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, None, 7, 7, 64)    18496     
_________________________________________________________________
activation_3 (Activation)    (None, None, 7, 7, 64)    0         
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 3, 3, 64)    0         
_________________________________________________________________
time_distributed_7 (TimeDist (None, None, 576)         0         
_________________________________________________________________
lstm_1 (LSTM)                (None, None, 64)          164096    
_________________________________________________________________
lstm_2 (LSTM)                (None, None, 64)          33024     
_________________________________________________________________
activation_4 (Activation)    (None, None, 64)          0         
_________________________________________________________________
dense_1 (Dense)              (None, None, 784)         50960     
_________________________________________________________________
reshape_1 (Reshape)          (None, None, 28, 28, 1)   0         
=================================================================
Total params: 271,376
Trainable params: 271,376
Non-trainable params: 0
Run Code Online (Sandbox Code Playgroud)

我知道我的模型有问题,但我不知道如何纠正它。

我想可能model.add(Reshape((-1,28,28,1))) 无法正常工作。老实说,我不知道如何处理model.add(Dense(784, activation='sigmoid')). 所以我放了一个 Reshape 层来使它正确。或者LSTM由于我当前的设计,层可能无法正确检测时间相关性。

编辑 1: 我将所有 Convolution2D 激活从 更改sigmoidrelu. 这是改变模型的预测结果。如图所示,它目前无法做出合理的预测。

编辑2: 我改变model.add(Reshape((-1, 28,28,1)))model.add(TimeDistributed(Reshape((28,28,1))))和增加的LSTM单位512并用2层的LSTMs。也使用BatchNormalization并更改input_shape(10, 28, 28, 1). 通过使用这个输入形状,我可以生成一个many to many模型。

但预测并没有太大变化。我想我忽略了一些基本的东西。这是新模型:

# from keras.layers import Dense, Dropout, Activation, LSTM 
from keras.layers.normalization import BatchNormalization
from keras.layers import Lambda, Convolution2D, MaxPooling2D, Flatten, Reshape, Conv2D
from keras.layers.convolutional import Conv3D
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model
import keras.backend as K

import numpy as np

import pylab as plt
model = Sequential()


model.add(TimeDistributed(Convolution2D(16, (3, 3), activation='relu', kernel_initializer='glorot_uniform', padding='same'), input_shape=(10, 28, 28, 1))) 
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))

model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))

model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))

# extract features and dropout 
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.3))
model.add(Dense(784, activation='linear'))
model.add(TimeDistributed(BatchNormalization()))

# input to LSTM
model.add(LSTM(units=512, activation='tanh', recurrent_activation='hard_sigmoid', kernel_initializer='glorot_uniform', unit_forget_bias=True, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))
model.add(LSTM(units=512, activation='tanh', recurrent_activation='hard_sigmoid', kernel_initializer='glorot_uniform', unit_forget_bias=True, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))

# classifier with sigmoid activation for multilabel
model.add(Dense(784, activation='linear'))
# model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Reshape((28,28,1))))
model.compile(loss='mae', optimizer='rmsprop')
print(model.summary())
Run Code Online (Sandbox Code Playgroud)

编辑 3: 因为 ConvLSTM2D 完全符合我的要求,并且编写问题的目的是为了理解 ConvLSTM2D,所以我更改了问题的标题,以便更好地说明我的问题。