pytorch中的自适应池如何工作？

Question

pytorch中的自适应池如何工作？

自适应池是一个很棒的功能，但是它如何工作？似乎是以一种似乎很but琐但相当武断的方式插入填充或缩小/扩展内核大小。我可以找到的pytorch文档比“在此处输入所需的输出大小”更具描述性。有谁知道这是如何工作的，或者可以指出它的解释之处？

在1x1x6张量（1,2,3,4,5,6）上的一些测试代码，自适应输出大小为8：

import torch
import torch.nn as nn

class TestNet(nn.Module):
    def __init__(self):
        super(TestNet, self).__init__()
        self.avgpool = nn.AdaptiveAvgPool1d(8)

    def forward(self,x):
        print(x)
        x = self.avgpool(x)
        print(x)
        return x

def test():
    x = torch.Tensor([[[1,2,3,4,5,6]]])
    net = TestNet()
    y = net(x)
    return y

test()

Run Code Online (Sandbox Code Playgroud)

输出：

tensor([[[ 1.,  2.,  3.,  4.,  5.,  6.]]])
tensor([[[ 1.0000,  1.5000,  2.5000,  3.0000,  4.0000,  4.5000,  5.5000,
       6.0000]]])

Run Code Online (Sandbox Code Playgroud)

如果它左右镜像（操作（1,1,2,3,4,5,6,6）），并且内核为2，则除4和5以外的所有位置的输出合理，当然输出不是正确的大小。是否也在内部填充3和4？如果是这样，它将在（1,1,2,3,3,4,4,5,6,6）上运行，如果使用大小为2的内核，则会产生错误的输出大小，并且还会丢失3.5输出。它会改变内核的大小吗？

我是否缺少有关此工作方式的明显信息？

Answer 1

hkc*_*rex 32

一般来说，池化会减少维度。如果您想增加维度，您可能需要查看插值。

不管怎样，让我们来谈谈一般的自适应池。您可以在此处查看源代码。有人声称自适应池化与标准池化相同，其步幅和内核大小根据输入和输出大小计算。具体使用以下参数：

步幅 = (input_size//output_size)
内核大小 = input_size - (output_size-1)*stride
填充 = 0

这些与池化公式相反。虽然他们DO所需尺寸的农产品输出，它的输出不一定是相同的自适应池的。这是一个测试片段：

import torch
import torch.nn as nn

in_length = 5
out_length = 3

x = torch.arange(0, in_length).view(1, 1, -1).float()
print(x)

stride = (in_length//out_length)
avg_pool = nn.AvgPool1d(
        stride=stride,
        kernel_size=(in_length-(out_length-1)*stride),
        padding=0,
    )
adaptive_pool = nn.AdaptiveAvgPool1d(out_length)

print(avg_pool.stride, avg_pool.kernel_size)

y_avg = avg_pool(x)
y_ada = adaptive_pool(x)

print(y_avg)
print(y_ada)

Run Code Online (Sandbox Code Playgroud)

输出：

tensor([[[0., 1., 2., 3., 4.]]])
(1,) (3,)
tensor([[[1., 2., 3.]]])
tensor([[[0.5000, 2.0000, 3.5000]]])
Error:  1.0

Run Code Online (Sandbox Code Playgroud)

来自元素 (0, 1, 2), (1, 2, 3) 和 (2, 3, 4) 的平均池化池。

来自元素 (0, 1)、(1, 2, 3) 和 (3, 4) 的自适应池。（稍微更改代码以查看它不是仅来自（2）的池化）

您可以告诉自适应池尝试减少池中的重叠。
可以使用填充来减轻这种差异count_include_pad=True，但总的来说，我认为对于 2D 或更高版本的所有输入/输出大小，它们不会完全相同。我会想象为左/右使用不同的填充。目前，池化层不支持此功能。
从实际的角度来看，它应该没有太大关系。
检查实际实现的代码。

Answer 2

alg*_*gal 15

As hkchengrex's answer points out, the PyTorch documentation does not explain what rule is used by adaptive pooling layers to determine the size and locations of the pooling kernels. (In fact, there is a fixme in the PyTorch code indicating the documentation needs to be improved.)

However, the calculation of the kernel sizes and locations is implemented by this cpp function and the key logic is actually in the calls to the functions start_index and end_index, which define the location and offset of the kernels.

I believe this Python code re-implements that code and shows how kernels are calculated:

from typing import List
import math
def kernels(ind,outd) -> List:
    """Returns a List [(kernel_offset_start,kernel_length)] defining all the pooling kernels for a 1-D adaptive pooling layer that takes an input of dimension `ind` and yields an output of dimension `outd`"""
    def start_index(a,b,c):
        return math.floor((float(a) * float(c)) / b)
    def end_index(a,b,c):
        return math.ceil((float(a + 1) * float(c)) / b)
    results = []
    for ow in range(outd):
        start = start_index(ow,outd,ind)
        end = end_index(ow,outd,ind)
        sz = end - start
        results.append((start,sz))
    return results

def kernel_indexes(ind,out) -> List:
    """Returns a List [[*ind]] containing the indexes of the pooling kernels"""
    startsLengths = kernels(ind,out)
    return [list(range(start,start+length)) for (start,length) in startsLengths]

Run Code Online (Sandbox Code Playgroud)

Here are the key points to notice.

First, it matters a lot whether the input dimension (ind) is an integer multiple of the output dimension (outd).

Second, when this is the case, then the adaptive layer's kernels are equally-sized and non-overlapping, and are exactly what would be produced by defining kernels and a stride based on the following rule:

stride = ind // outd
kernel_size = ind - (outd-1)*stride
padding = 0

Run Code Online (Sandbox Code Playgroud)

In other words, in this case it is possible to reproduce the effect of an adaptive pooling layer by using a non-adaptive pooling layer defined with suitable stride, kernel_size, and padding. (Example further below.)

Finally, when instead it is the case that the input size is not an integer multiple of the output size, then PyTorch's adaptive pooling rule produces kernels which overlap and are of variable size.

Since the non-adaptive pooling API does not allow for variably-sized kernels, in this case it seems to me there is no way to reproduce the effect of adaptive pooling by feeding suitable values into a non-adaptive pooling layer.

Here's an example which shows both cases. This helper function lets us compare what's happening with adapative average pooling layer and an ordinary average pooling layer which uses fixed stride and kernel:

import torch
import torch.nn as nn

def compare1DAdaptivity(ind,outd,inputpattern):
    c = 1
    padding = 0

    input = torch.Tensor(inputpattern).view(1,c,ind)

    stride = ind // outd
    kernel_size = (ind - (outd-1)*stride)
    avg_pool = nn.AvgPool1d(stride=stride,kernel_size=kernel_size,padding=padding)
    avg_out = avg_pool(input)

    adap_avg_pool = torch.nn.AdaptiveAvgPool1d(outd)
    adap_avg_out = adap_avg_pool(input)
    
    try:
        equal_output = torch.allclose(avg_out,adap_avg_out)
    except:
        equal_output = False

    print("input.shape: {}".format(input.shape))
    print("in_dims: {}".format(ind))
    print("out_dims: {}".format(outd))
    print("")
    print("AAL strides: {}".format(stride))
    print("AAL kernel_sizes: {}".format(kernel_size))
    print("AAL pad: {}".format(padding))
    print("")
    print("outputs equal: {}".format(equal_output))
    print("")
    print("AAL input -> output: {} -> {}".format(input,avg_out))
    print("adap input -> output: {} -> {}".format(input,adap_avg_out))
    return equal_output

Run Code Online (Sandbox Code Playgroud)

So, to give an example of the first case, where the input dimension is a multiple of the output dimension, we can go from 6 to 3. We can see that the approximate adaptive layer and the true adaptive layer give the same output:

compare1DAdaptivity(6,3,[1,0,0,0,0]) # => Tue
AAL input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]])
adap input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]])

Run Code Online (Sandbox Code Playgroud)

However, this no longer works if we go from 5 to 3.

compare1DAdaptivity(5,3,[1,0,0,0,0]) # => False
AAL input -> output: tensor([[[1., 0., 0., 0., 0.]]]) -> tensor([[[0.3333, 0.0000, 0.0000]]])
adap input -> output: tensor([[[1., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]])

Run Code Online (Sandbox Code Playgroud)

But we can reproduce the result of the adaptive layers by manually computing over the indexes:

compare1DAdaptivity(6,3,[1,0,0,0,0]) # => Tue
AAL input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]])
adap input -> output: tensor([[[1., 0., 0., 0., 0., 0.]]]) -> tensor([[[0.5000, 0.0000, 0.0000]]])

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，10 月前
查看次数：	4391 次
最近记录：	6 年，6 月前