Keras - 冻结模型然后添加可训练层

ajl*_*123 5 keras tensorflow

我正在采用预训练的 CNN 模型,然后尝试使用并行 CNN 实现 CNN-LSTM,所有这些并行 CNN 都具有相同的预训练权重。

# load in CNN
weightsfile = 'final_weights.h5'
modelfile = '2dcnn_model.json'

# load model from json
json_file = open(modelfile, 'r')
loaded_model_json = json_file.read()
json_file.close()
fixed_cnn_model = keras.models.model_from_json(loaded_model_json)
fixed_cnn_model.load_weights(weightsfile)

# remove the last 2 dense FC layers and freeze it
fixed_cnn_model.pop()
fixed_cnn_model.pop()
fixed_cnn_model.trainable = False

print(fixed_cnn_model.summary())
This will produce the summary:
Run Code Online (Sandbox Code Playgroud)

_

________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32, 32, 4)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 30, 30, 32)        1184      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 32)        9248      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 26, 26, 32)        9248      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 24, 24, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 10, 10, 64)        18496     
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 2, 2, 128)         73856     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 128)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               66048     
=================================================================
Total params: 224,256
Trainable params: 0
Non-trainable params: 224,256
_________________________________________________________________
Run Code Online (Sandbox Code Playgroud)

现在,我将添加它并编译并显示不可训练的全部变为可训练的。

# create sequential model to get this all before the LSTM

# initialize loss function, SGD optimizer and metrics
loss = 'binary_crossentropy'
optimizer = keras.optimizers.Adam(lr=1e-4, 
                                beta_1=0.9, 
                                beta_2=0.999,
                                epsilon=1e-08,
                                decay=0.0)
metrics = ['accuracy']

currmodel = Sequential()
currmodel.add(TimeDistributed(fixed_cnn_model, input_shape=(num_timewins, imsize, imsize, n_colors)))
currmodel.add(LSTM(units=size_mem, 
            activation='relu', 
            return_sequences=False))
currmodel.add(Dense(1024, activation='relu')
currmodel.add(Dense(2, activation='softmax')

currmodel = Model(inputs=currmodel.input, outputs = currmodel.output)
config = currmodel.compile(optimizer=optimizer, loss=loss, metrics=metrics) 
print(currmodel.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_3_input (In (None, 5, 32, 32, 4)      0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 512)            224256    
_________________________________________________________________
lstm_3 (LSTM)                (None, 50)                112600    
_________________________________________________________________
dropout_1 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              52224     
_________________________________________________________________
dropout_2 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 2050      
=================================================================
Total params: 391,130
Trainable params: 391,130
Non-trainable params: 0
_________________________________________________________________
Run Code Online (Sandbox Code Playgroud)

在这种情况下我该如何冻结图层?我几乎 100% 肯定我在早期的 keras 版本中有这种格式的工作代码。这似乎是正确的方向,因为您定义了一个模型并声明某些层可训练或不可训练。

然后添加层,默认情况下这些层是可训练的。然而,这似乎将所有层转换为可训练层。

小智 3

尝试添加

    for layer in currmodel.layers[:5]:
        layer.trainable = False
Run Code Online (Sandbox Code Playgroud)

  • 虽然这个答案可能是正确且有用的,但最好包含一些解释以解释它如何帮助解决问题。如果有变化(可能不相关)导致它停止工作并且用户需要了解它曾经是如何工作的,这在将来会变得特别有用。 (3认同)