keras模型的平均权重

Mił*_*zak 6 neural-network deep-learning keras tensorflow keras-layer

当我训练几个具有不同初始化的相同架构的模型时,如何在Keras模型中平均权重?

现在我的代码看起来像这样?

datagen = ImageDataGenerator(rotation_range=15,
                             width_shift_range=2.0/28,
                             height_shift_range=2.0/28
                            )

epochs = 40 
lr = (1.234e-3)
optimizer = Adam(lr=lr)

main_input = Input(shape= (28,28,1), name='main_input')

sub_models = []

for i in range(5):

    x = Conv2D(32, kernel_size=(3,3), strides=1)(main_input)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Conv2D(64, kernel_size=(3,3), strides=1)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    x = Flatten()(x)

    x = Dense(1024)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.1)(x)

    x = Dense(256)(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Dropout(0.4)(x)

    x = Dense(10, activation='softmax')(x)

    sub_models.append(x)

x = keras.layers.average(sub_models)

main_output = keras.layers.average(sub_models)

model = Model(inputs=[main_input], outputs=[main_output])

model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
              optimizer=optimizer)

print(model.summary())

plot_model(model, to_file='model.png')

filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
tensorboard = TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
callbacks = [checkpoint, tensorboard]

model.fit_generator(datagen.flow(X_train, y_train, batch_size=128),
                    steps_per_epoch=len(X_train) / 128,
                    epochs=epochs,
                    callbacks=callbacks,
                    verbose=1,
                    validation_data=(X_test, y_test))
Run Code Online (Sandbox Code Playgroud)

所以现在我只对最后一层进行平均,但我希望在分别训练每一层之后平均所有层中的权重.

谢谢!

小智 8

我无法对已接受的答案发表评论,但为了使其能够正常工作tensorflow 2.0tf.keras我必须将循环中的列表变成一个 numpy 数组:

new_weights = list()
for weights_list_tuple in zip(*weights): 
    new_weights.append(
        np.array([np.array(w).mean(axis=0) for w in zip(*weights_list_tuple)])
    )
Run Code Online (Sandbox Code Playgroud)

如果不同的输入模型需要不同的加权,np.array(w).mean(axis=0)需要替换为np.average(np.array(w),axis=0, weights=relative_weights)whererelative_weights是一个数组,每个模型都有一个权重因子。


Mar*_*jko 7

所以我们假设这models是你模型的集合.首先 - 收集所有重量:

weights = [model.get_weights() for model in models]
Run Code Online (Sandbox Code Playgroud)

现在 - 创建一个新的平均权重:

new_weights = list()

for weights_list_tuple in zip(*weights):
    new_weights.append(
        [numpy.array(weights_).mean(axis=0)\
            for weights_ in zip(*weights_list_tuple)])
Run Code Online (Sandbox Code Playgroud)

剩下的就是在新模型中设置这些权重:

new_model.set_weights(new_weights)
Run Code Online (Sandbox Code Playgroud)

当然 - 平均权重可能是一个坏主意,但如果你尝试 - 你应该遵循这种方法.

  • 为什么这是一个坏主意?我受到了http://cs231n.github.io/neural-networks-3/#ensemble的启发,据说这是一个好主意;) (2认同)
  • 只是举一个例子,说明为什么这可能会出错——采用一个模型并以一致的方式排列所有过滤器。该网络在数学上是等效的 - 但平均值可能与原始函数有很大差异。我并不是说这是个坏主意——我认为这可能是个坏主意;) (2认同)