siv*_*iva 4 neural-network lstm keras recurrent-neural-network
我是一个尝试 LSTM 的新手。
我基本上使用 LSTM 来确定动作类型(5 种不同的动作),例如跑步、跳舞等。我的输入是每个动作 60 帧,大致可以说大约 120 个这样的视频
train_x.shape = (120,192,192,60)
其中 120 是用于训练的样本视频数量,192X192 是帧大小,60 是帧数。
train_y.shape = (120*5) [1 0 0 0 0 ..... 0 0 0 0 1] 一个热编码
我不清楚如何将 3d 参数传递给 lstm (时间戳和功能)
model.add(LSTM(100, input_shape=(train_x.shape[1],train_x.shape[2])))
model.add(Dropout(0.5))
model.add(Dense(100, activation='relu'))
model.add(Dense(len(uniquesegments), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=100, batch_size=batch_size, verbose=1)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误
层顺序的输入 0 与层不兼容:预期 ndim=3,发现 ndim=4。收到的完整形状:(无、192、192、60)
训练数据算法
Loop through videos
Loop through each frame of a video
logic
append to array
convert to numpy array
roll axis to convert 60 192 192 to 192 192 60
add to training list
convert training list to numpy array
Run Code Online (Sandbox Code Playgroud)
训练列表形状 <120, 192, 192, 60>
首先你应该知道,解决视频分类任务的方法比LSTM或任何RNN Cell更适合卷积 RNN,就像CNN比MLP更适合图像分类任务一样
这些 RNN 单元(例如 LSTM、GRU)期望输入具有 shape (samples, timesteps, channels),因为您正在处理具有 shape 的输入(samples, timesteps, width, height, channels),所以您应该使用tf.keras.layers.ConvLSTM2D代替
以下示例代码将向您展示如何构建可以处理视频分类任务的模型:
import tensorflow as tf
from tensorflow.keras import models, layers
timesteps = 60
width = 192
height = 192
channels = 1
action_num = 5
model = models.Sequential(
[
layers.Input(
shape=(timesteps, width, height, channels)
),
layers.ConvLSTM2D(
filters=64, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=32, kernel_size=(3, 3), padding="same", return_sequences=True, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool3D(
pool_size=(1, 2, 2), strides=(1, 2, 2), padding="same"
),
layers.BatchNormalization(),
layers.ConvLSTM2D(
filters=16, kernel_size=(3, 3), padding="same", return_sequences=False, dropout=0.1, recurrent_dropout=0.1
),
layers.MaxPool2D(
pool_size=(2, 2), strides=(2, 2), padding="same"
),
layers.BatchNormalization(),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dense(action_num, activation='softmax')
]
)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Run Code Online (Sandbox Code Playgroud)
输出:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv_lst_m2d (ConvLSTM2D) (None, 60, 192, 192, 64) 150016
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 60, 96, 96, 64) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 60, 96, 96, 64) 256
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D) (None, 60, 96, 96, 32) 110720
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 60, 48, 48, 32) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 60, 48, 48, 32) 128
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D) (None, 48, 48, 16) 27712
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 16) 0
_________________________________________________________________
batch_normalization_2 (Batch (None, 24, 24, 16) 64
_________________________________________________________________
flatten (Flatten) (None, 9216) 0
_________________________________________________________________
dense (Dense) (None, 256) 2359552
_________________________________________________________________
dense_1 (Dense) (None, 5) 1285
=================================================================
Total params: 2,649,733
Trainable params: 2,649,509
Non-trainable params: 224
_________________________________________________________________
Run Code Online (Sandbox Code Playgroud)
请注意,您应该在输入上述模型之前将数据重新排序为形状(samples, timesteps, width, height, channels)(即不是像np.reshape,而是像np.moveaxis),在您的情况下,形状应该是(120, 60, 192, 192, 1),然后您可以将120视频拆分为批次并输入模型