dao*_*ker 6 theano deep-learning keras tensorflow
def conv2d_bn(x, nb_filter, nb_row, nb_col,
border_mode='same', subsample=(1, 1),
name=None):
'''Utility function to apply conv + BN.
'''
x = Convolution2D(nb_filter, nb_row, nb_col,
subsample=subsample,
activation='relu',
border_mode=border_mode,
name=conv_name)(x)
x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
return x
Run Code Online (Sandbox Code Playgroud)
当我在keras中使用官方的inception_v3模型时,我发现他们在'relu'非线性之后使用BatchNormalization作为上面的代码脚本.
但是在批量标准化论文中,作者说
我们通过归一化x = Wu + b,在非线性之前立即添加BN变换.
然后我查看tensorflow中的初始实现,它在非线性之前立即添加BN.有关初始ops.py的更多详细信息
我糊涂了.为什么人们在Keras中使用以上风格而不是以下?
def conv2d_bn(x, nb_filter, nb_row, nb_col,
border_mode='same', subsample=(1, 1),
name=None):
'''Utility function to apply conv + BN.
'''
x = Convolution2D(nb_filter, nb_row, nb_col,
subsample=subsample,
border_mode=border_mode,
name=conv_name)(x)
x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
x = Activation('relu')(x)
return x
Run Code Online (Sandbox Code Playgroud)
在密集案件中:
x = Dense(1024, name='fc')(x)
x = BatchNormalization(axis=bn_axis, name=bn_name)(x)
x = Activation('relu')(x)
Run Code Online (Sandbox Code Playgroud)
我也在激活之前使用它,这确实是它的设计方式,其他库也是如此,例如烤宽面条的batch_norm http://lasagne.readthedocs.io/en/latest/modules/layers/normalization.html#lasagne。层.batch_norm。
然而,实际上,在激活之后放置它似乎效果更好一些:
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md (但这只是一个基准)
| 归档时间: |
|
| 查看次数: |
1842 次 |
| 最近记录: |