在非线性之前或在Keras之后立即添加批量标准化?

dao*_*ker 6 theano deep-learning keras tensorflow

def conv2d_bn(x, nb_filter, nb_row, nb_col,
              border_mode='same', subsample=(1, 1),
              name=None):
    '''Utility function to apply conv + BN.
    '''

    x = Convolution2D(nb_filter, nb_row, nb_col,
                      subsample=subsample,
                      activation='relu',
                      border_mode=border_mode,
                      name=conv_name)(x)
    x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
    return x
Run Code Online (Sandbox Code Playgroud)

当我在keras中使用官方的inception_v3模型时,我发现他们在'relu'非线性之后使用BatchNormalization作为上面的代码脚本.

但是在批量标准化论文中,作者说

我们通过归一化x = Wu + b,在非线性之前立即添加BN变换.

然后我查看tensorflow中的初始实现,它在非线性之前立即添加BN.有关初始ops.py的更多详细信息

我糊涂了.为什么人们在Keras中使用以上风格而不是以下?

def conv2d_bn(x, nb_filter, nb_row, nb_col,
              border_mode='same', subsample=(1, 1),
              name=None):
    '''Utility function to apply conv + BN.
    '''

    x = Convolution2D(nb_filter, nb_row, nb_col,
                      subsample=subsample,
                      border_mode=border_mode,
                      name=conv_name)(x)
    x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
    x = Activation('relu')(x)
    return x
Run Code Online (Sandbox Code Playgroud)

在密集案件中:

x = Dense(1024, name='fc')(x)
x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
x = Activation('relu')(x)
Run Code Online (Sandbox Code Playgroud)

gde*_*lab 1

我也在激活之前使用它,这确实是它的设计方式,其他库也是如此,例如烤宽面条的batch_norm http://lasagne.readthedocs.io/en/latest/modules/layers/normalization.html#lasagne。层.batch_norm

然而,实际上,在激活之后放置它似乎效果更好一些:

https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md (但这只是一个基准)