tensorflow的tf.nn.max_pool中'SAME'和'VALID'填充有什么区别？

Question

tensorflow的tf.nn.max_pool中'SAME'和'VALID'填充有什么区别？

kar*_*TUM 265 python deep-learning tensorflow

什么是"相同"和"有效"填充之间的区别tf.nn.max_pool的tensorflow？

在我看来,'VALID'意味着当我们做最大池时,边缘外没有零填充.

根据深度学习的卷积算法指南,它表示池操作符中没有填充,即只使用'VALID' tensorflow.但是什么是最大池的"相同"填充tensorflow？

Answer 1

Min*_*ark 544

如果你喜欢ascii art:

"VALID" =没有填充:

   inputs:         1  2  3  4  5  6  7  8  9  10 11 (12 13)
                  |________________|                dropped
                                 |_________________|

Run Code Online (Sandbox Code Playgroud)

"SAME" =零填充:

               pad|                                      |pad
   inputs:      0 |1  2  3  4  5  6  7  8  9  10 11 12 13|0  0
               |________________|
                              |_________________|
                                             |________________|

Run Code Online (Sandbox Code Playgroud)

在这个例子中:

输入宽度= 13
滤镜宽度= 6
步幅= 5

笔记:

"VALID" 只会丢掉最右边的列(或最底部的行).
"SAME" 尝试向左和向右均匀填充,但如果要添加的列数是奇数,它将向右添加额外的列,如本示例中的情况(相同的逻辑垂直应用:可能有一个额外的行在底部的零).

是否公平地说“相同”意味着“使用零填充以确保如果图像宽度不是过滤器宽度的倍数或图像高度不是过滤器高度的倍数，则过滤器大小不必更改“？如果宽度是问题，那么“用零填充到过滤器宽度的倍数”？ (4认同)
回答我自己的问题：不，这不是零填充的意义。您可以选择要与输入配合使用的过滤器大小（包括零填充），但不要在过滤器大小之后选择零填充。 (2认同)
我不明白你自己的回答@StatsSorceress。在我看来，您添加了足够的零（以尽可能对称的方式），以便所有输入都被某个过滤器覆盖，对吗？ (2认同)
很好的答案是，只需添加一下：如果张量值可以为负，则max_pooling的填充使用`-inf`。 (2认同)

Answer 2

Oli*_*rot 144

我举一个例子来说明一点:

x:输入形状[2,3],1通道的图像
valid_pad:最大池,2x2内核,步幅2和VALID填充.
same_pad:最大池有2x2内核,步幅2和SAME填充(这是经典的方法)

输出形状为:

valid_pad:这里没有填充,所以输出形状是[1,1]
same_pad:在这里,我们将图像填充到形状[2,4](-inf然后应用最大池),因此输出形状为[1,2]

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

x = tf.reshape(x, [1, 2, 3, 1])  # give a shape accepted by tf.nn.max_pool

valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

valid_pad.get_shape() == [1, 1, 1, 1]  # valid_pad is [5.]
same_pad.get_shape() == [1, 1, 2, 1]   # same_pad is  [5., 6.]

Run Code Online (Sandbox Code Playgroud)

Answer 3

Yve*_*reY 138

当stride为1时(更常见的是卷积而不是合并),我们可以想到以下区别:

"SAME":输出大小与输入大小相同.这需要过滤器窗口在输入映射外滑动,因此需要填充.
"VALID":过滤器窗口保持在输入映射内的有效位置,因此输出大小缩小filter_size - 1.没有填充.

这终于有帮助了.到目前为止,似乎`SAME`和`VALID`也可能被称为`foo`和`bar` (51认同)
我认为"输出大小与输入大小相同**"仅在步长为1时才为真. (4认同)

Answer 4

Roy*_*eIX 85

所述TensorFlow卷积示例给出关于之间的差的概述SAME和VALID:

对于SAME填充,输出高度和宽度计算如下:

out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

Run Code Online (Sandbox Code Playgroud)

和

对于VALID填充,输出高度和宽度计算如下:

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

Run Code Online (Sandbox Code Playgroud)

Answer 5

zmx*_*zmx 54

补充 YvesgereY 的好答案，我发现这个可视化非常有帮助：

填充“有效”是第一个数字。过滤器窗口保留在图像内。

填充“相同”是第三个数字。输出大小相同。

在这篇文章中找到了

可视化学分：vdumoulin@GitHub

非常立即的答复！ (3认同)
这对我来说是最好的解决方案。可视化讲述故事。谢谢 (3认同)

Answer 6

Sal*_*ali 44

填充是一种增加输入数据大小的操作.在一维数据的情况下,您只需在数组中附加/前置一个常数,在二维模拟环绕矩阵中使用这些常数.在n-dim中,您可以使用常量围绕n-dim超立方体.在大多数情况下,此常量为零,称为零填充.

以下是p=1应用于2-d张量的零填充示例:

您可以为内核使用任意填充,但某些填充值的使用频率高于其他填充值:

有效填充.最简单的情况,意味着根本没有填充.只需保留您的数据即可.
SAME填充有时称为HALF填充.它被称为SAME,因为对于stride = 1的卷积(或用于汇集),它应该产生与输入相同大小的输出.它被称为HALF,因为对于一个大小的内核k
FULL填充是最大填充,不会导致仅填充元素的卷积.对于大小的内核k,此填充等于k - 1.

要在TF中使用任意填充,您可以使用 tf.pad()

Answer 7

Shi*_*hah 28

快速解释

VALID:不要应用任何填充,即假设所有尺寸都有效,以便输入图像完全被过滤器覆盖并按指定步幅.

SAME:将填充应用于输入(如果需要),以便输入图像被过滤器完全覆盖并按指定步幅.对于步幅1,这将确保输出图像大小与输入相同.

笔记

这适用于转换层以及最大池层以相同方式
术语"有效"有点用词不当,因为如果丢弃部分图像,事情就不会变得"无效".有时你甚至可能想要那样.这可能应该被称为NO_PADDING.
术语"相同"也是用词不当,因为当输出维度与输入维度相同时,它只对1的步幅有意义.例如,对于2的步幅,输出尺寸将是一半.这可能应该被称为AUTO_PADDING.
在SAME(即自动填充模式)中,Tensorflow将尝试在左右两侧均匀分布填充.
在VALID(即无填充模式)中,如果过滤器和步幅未完全覆盖输入图像,则Tensorflow将向右和/或底部单元格下降.

Answer 8

Vai*_*xit 14

我从官方tensorflow文档中引用这个答案https://www.tensorflow.org/api_guides/python/nn#Convolution 对于'SAME'填充,输出高度和宽度计算如下:

out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

Run Code Online (Sandbox Code Playgroud)

并且顶部和左侧的填充计算如下:

pad_along_height = max((out_height - 1) * strides[1] +
                    filter_height - in_height, 0)
pad_along_width = max((out_width - 1) * strides[2] +
                   filter_width - in_width, 0)
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left

Run Code Online (Sandbox Code Playgroud)

对于'VALID'填充,输出高度和宽度计算如下:

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

Run Code Online (Sandbox Code Playgroud)

并且填充值始终为零.

坦率地说，这是唯一有效且完整的答案，不限于步幅为 1。所需要的只是引用文档。+1 (2认同)
拥有这个答案非常有用，特别是因为您指向的链接不再有效，而且谷歌似乎从 tf 网站上删除了该信息！ (2认同)

Answer 9

Cha*_*rld 12

填充有三种选择:有效(无填充),相同(或一半),填充.你可以在这里找到解释(在Theano):http: //deeplearning.net/software/theano/tutorial/conv_arithmetic.html

有效或无填充:

有效填充不涉及零填充,因此它仅覆盖有效输入,不包括人工生成的零.如果步幅s = 1,则输出的长度是((输入的长度) - (k-1))对于内核大小k.

相同或半填充:

当s = 1时,相同的填充使输出的大小与输入的大小相同.如果s = 1,则填充的零的数量是(k-1).

全填充:

完整填充意味着内核在整个输入上运行,因此在最后,内核可能只满足一个输入和零.如果s = 1,填充的零的数量是2(k-1).如果s = 1,则输出长度为((输入长度)+(k-1)).

因此,填充数量:(有效)<=(相同)<=(完整)

Answer 10

Fre*_*ong 9

总而言之，“有效”填充意味着没有填充。卷积层的输出大小取决于输入大小和内核大小。

相反，“相同”填充意味着使用填充。当stride设置为1时，卷积层的输出大小通过在计算卷积时在输入数据周围附加一定数量的'0-border'来保持为输入大小。

希望这个直观的描述有帮助。

Answer 11

ahm*_*sny 7

基于此处的解释并跟进Tristan的答案,我通常使用这些快速功能进行健全性检查.

# a function to help us stay clean
def getPaddings(pad_along_height,pad_along_width):
    # if even.. easy..
    if pad_along_height%2 == 0:
        pad_top = pad_along_height / 2
        pad_bottom = pad_top
    # if odd
    else:
        pad_top = np.floor( pad_along_height / 2 )
        pad_bottom = np.floor( pad_along_height / 2 ) +1
    # check if width padding is odd or even
    # if even.. easy..
    if pad_along_width%2 == 0:
        pad_left = pad_along_width / 2
        pad_right= pad_left
    # if odd
    else:
        pad_left = np.floor( pad_along_width / 2 )
        pad_right = np.floor( pad_along_width / 2 ) +1
        #
    return pad_top,pad_bottom,pad_left,pad_right

# strides [image index, y, x, depth]
# padding 'SAME' or 'VALID'
# bottom and right sides always get the one additional padded pixel (if padding is odd)
def getOutputDim (inputWidth,inputHeight,filterWidth,filterHeight,strides,padding):
    if padding == 'SAME':
        out_height = np.ceil(float(inputHeight) / float(strides[1]))
        out_width  = np.ceil(float(inputWidth) / float(strides[2]))
        #
        pad_along_height = ((out_height - 1) * strides[1] + filterHeight - inputHeight)
        pad_along_width = ((out_width - 1) * strides[2] + filterWidth - inputWidth)
        #
        # now get padding
        pad_top,pad_bottom,pad_left,pad_right = getPaddings(pad_along_height,pad_along_width)
        #
        print 'output height', out_height
        print 'output width' , out_width
        print 'total pad along height' , pad_along_height
        print 'total pad along width' , pad_along_width
        print 'pad at top' , pad_top
        print 'pad at bottom' ,pad_bottom
        print 'pad at left' , pad_left
        print 'pad at right' ,pad_right

    elif padding == 'VALID':
        out_height = np.ceil(float(inputHeight - filterHeight + 1) / float(strides[1]))
        out_width  = np.ceil(float(inputWidth - filterWidth + 1) / float(strides[2]))
        #
        print 'output height', out_height
        print 'output width' , out_width
        print 'no padding'


# use like so
getOutputDim (80,80,4,4,[1,1,1,1],'SAME')

Run Code Online (Sandbox Code Playgroud)

Answer 12

GPr*_*hap 6

有效填充：这是零填充。希望没有混乱。

x = tf.constant([[1., 2., 3.], [4., 5., 6.],[ 7., 8., 9.], [ 7., 8., 9.]])
x = tf.reshape(x, [1, 4, 3, 1])
valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
print (valid_pad.get_shape()) # output-->(1, 2, 1, 1)

Run Code Online (Sandbox Code Playgroud)

相同的 填充：首先要理解这有点棘手，因为我们必须分别考虑两个条件，如官方文档中所述。

让我们输入为 $n_i$ ，输出为 $n_o$ ，填充为 $p_i$ ，跨步为 $s$ 和内核大小为 $k$ （仅考虑一个维度）

案例01： $n_i \ mod s = 0$ ： $p_i = max(k-s ,0)$

案例02： $n_i \ mod s \ neq 0$ ： $p_i = max（k-（n_i \ mod s）），0）$

$p_i$ 计算出使得填充可用的最小值。由于价值 $p_i$ 众所周知，价值 $n_0$ 可以用这个公式找到 $(n_i - k + 2p_i)/2 + 1 = n_0$ 。

让我们算出这个例子：

x = tf.constant([[1., 2., 3.], [4., 5., 6.],[ 7., 8., 9.], [ 7., 8., 9.]])
x = tf.reshape(x, [1, 4, 3, 1])
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
print (same_pad.get_shape()) # --> output (1, 2, 2, 1)

Run Code Online (Sandbox Code Playgroud)

x的维数为（3,4）。然后，如果采取水平方向（3）：

$n_i = 3，k = 2，s = 2，p_i = 2-（3 \ mod 2）= 1，n_0 =底数（\ frac {3-2 + 2 * 1} {2} + 1）= 2$