tf.nn_conv2d和tf.nn.depthwise_conv2d之间的区别

Question

tf.nn_conv2d和tf.nn.depthwise_conv2d之间的区别

Cha*_*ine 4 python deep-learning conv-neural-network tensorflow

是什么区别tf.nn_conv2d,并tf.nn.depthwise_conv2d在Tensorflow？

Answer 1

对于depthwise_conv2d,

output[b, i, j, k * channel_multiplier + q] =
    sum_{di, dj} input[b, strides[1] * i + rate[0] * di,
                          strides[2] * j + rate[1] * dj, k] *
                 filter[di, dj, k, q]

Run Code Online (Sandbox Code Playgroud)

过滤器是 [filter_height, filter_width, in_channels, channel_multiplier]

对于conv2d,

output[b, i, j, k] =
    sum_{di, dj, q} input[b, strides[1] * i + di,
                             strides[2] * j + dj, q] *
                    filter[di, dj, q, k]

Run Code Online (Sandbox Code Playgroud)

过滤器是 [filter_height, filter_width, in_channels, out_channels]

着眼于k和q,我们可以看到上面显示的差异.

默认格式是NHWC,其中b是批量大小,(i, j)在特征图的坐标.

(注意k并q参考这两个函数中的不同内容.)

为depthwise_conv2d,k指的是一个输入通道和q,0 <= q < channel_multiplier是指一个输出通道.每个输入通道k都扩展为k*channel_multiplier不同的过滤器[filter_height, filter_width, channel_multiplier].它不进行跨通道操作,在一些文献中,它被称为channel-wise spatial convolution.上述过程可以归结为将每个过滤器的内核分别应用于每个通道并连接输出.
对于conv2d,k指的是输出通道并且q指的是输入通道.它在所有输入通道之间进行求和,这意味着每个输出通道k都q通过[filter_height, filter_width, in_channels]滤波器与所有输入通道相关联.

例如,

input_size: (_, 14, 14, 32)
filter of conv2d: (3, 3, 32, 64)
params of conv2d filter: 3x3x32x64
filter of depthwise_conv2d: (3, 3, 32, 64)
params of depthwise_conv2d filter: 3x3x32x64

Run Code Online (Sandbox Code Playgroud)

假设stride = 1,填充,然后

output of conv2d: (_, 14, 14, 64)
output of depthwise_conv2d: (_, 14, 14, 32*64)

Run Code Online (Sandbox Code Playgroud)

更多见解:

标准卷积运算可以分为两个步骤:深度卷积和减少(和).
Depthwise Convolution相当于在Group Convolution中设置组到输入通道的数量.
通常,depthwise_conv2d接下来是pointwise_conv2d(1x1卷积用于减少目的),制作一个separable_conv2d.查看Xception,MobileNet了解更多详情.

Answer 2

小智 9

我不是这方面的专家,但据我所知,区别在于:

假设您有一个长度为100,宽度为100的输入彩色图像.因此尺寸为100x100x3.对于这两个例子,我们使用相同的宽度和高度5的过滤器.让我们说我们希望下一层的深度为8.

在tf.nn.conv2d中,您将内核形状定义为[width,height,in_channels,out_channels].在我们的例子中,这意味着内核具有形状[5,5,3,out_channels].跨越图像的权重内核具有5x5x3的形状,并且在整个图像上跨越8次以产生8个不同的特征图.

在tf.nn.depthwise_conv2d中,将内核形状定义为[width,height,in_channels,channel_multiplier].现在输出的产生方式不同了.5x5x1的单独过滤器跨越输入图像的每个维度,每个维度一个过滤器,每个过滤器每个维度产生一个特征图.所以在这里,内核大小[5,5,3,1]会生成深度为3的输出.channel_multiplier会告诉您每个维度要应用多少个不同的过滤器.因此,对于3个输入尺寸,不可能获得深度为8的原始期望输出.只有3的倍数是可能的.

归档时间：	8 年，9 月前
查看次数：	4854 次
最近记录：	7 年，11 月前