用于视频输入的 LSTM

siv*_*iva 4 neural-network lstm keras recurrent-neural-network

我是一个尝试 LSTM 的新手。

我基本上使用 LSTM 来确定动作类型(5 种不同的动作),例如跑步、跳舞等。我的输入是每个动作 60 帧,大致可以说大约 120 个这样的视频

train_x.shape = (120,192,192,60)

其中 120 是用于训练的样本视频数量,192X192 是帧大小,60 是帧数。

train_y.shape = (120*5) [1 0 0 0 0 ..... 0 0 0 0 1] 一个热编码

我不清楚如何将 3d 参数传递给 lstm (时间戳和功能)

model.add(LSTM(100, input_shape=(train_x.shape[1],train_x.shape[2])))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dense(len(uniquesegments), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=100, batch_size=batch_size, verbose=1)
Run Code Online (Sandbox Code Playgroud)

我收到以下错误

层顺序的输入 0 与层不兼容:预期 ndim=3,发现 ndim=4。收到的完整形状:(无、192、192、60)

训练数据算法

Loop through videos
            Loop through each frame of a video
                    logic
                    append to array
            convert to numpy array
            roll axis to convert 60 192 192 to 192 192 60
  add to training list
convert training list to numpy array
Run Code Online (Sandbox Code Playgroud)

训练列表形状 <120, 192, 192, 60>

Mr.*_*ple 6

首先你应该知道,解决视频分类任务的方法比LSTM或任何RNN Cell更适合卷积 RNN,就像CNN比MLP更适合图像分类任务一样

这些 RNN 单元(例如 LSTM、GRU)期望输入具有 shape (samples, timesteps, channels),因为您正在处理具有 shape 的输入(samples, timesteps, width, height, channels),所以您应该使用tf.keras.layers.ConvLSTM2D代替

以下示例代码将向您展示如何构建可以处理视频分类任务的模型:

import tensorflow as tf
from tensorflow.keras import models, layers

timesteps = 60
width = 192
height = 192
channels = 1
action_num = 5

model = models.Sequential(
    [
        layers.Input(
            shape=(timesteps, width, height, channels)
        ),
        layers.ConvLSTM2D(
            filters=64, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
        ),
        layers.MaxPool3D(
            pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
        ),
        layers.BatchNormalization(),
        layers.ConvLSTM2D(
            filters=32, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
        ),
        layers.MaxPool3D(
            pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
        ),
        layers.BatchNormalization(),
        layers.ConvLSTM2D(
            filters=16, kernel_size=(3, 3), padding="same", return_sequences=False, dropout=0.1, recurrent_dropout=0.1
        ),
        layers.MaxPool2D(
            pool_size=(2, 2), strides=(2, 2), padding="same"
        ),
        layers.BatchNormalization(),
        layers.Flatten(),
        layers.Dense(256, activation='relu'),
        layers.Dense(action_num, activation='softmax')
    ]
)

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Run Code Online (Sandbox Code Playgroud)

输出:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv_lst_m2d (ConvLSTM2D)    (None, 60, 192, 192, 64)  150016    
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 60, 96, 96, 64)    0         
_________________________________________________________________
batch_normalization (BatchNo (None, 60, 96, 96, 64)    256       
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D)  (None, 60, 96, 96, 32)    110720    
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 60, 48, 48, 32)    0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 60, 48, 48, 32)    128       
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, 48, 48, 16)        27712     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 16)        0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 24, 24, 16)        64        
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 256)               2359552   
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 1285      
=================================================================
Total params: 2,649,733
Trainable params: 2,649,509
Non-trainable params: 224
_________________________________________________________________
Run Code Online (Sandbox Code Playgroud)

请注意,您应该在输入上述模型之前将数据重新排序为形状(samples, timesteps, width, height, channels)(即不是像np.reshape,而是像np.moveaxis),在您的情况下,形状应该是(120, 60, 192, 192, 1),然后您可以将120视频拆分为批次并输入模型