如何实现超像素池化层？

Question

如何实现超像素池化层？

Par*_*kar 5 neural-network theano deep-learning keras tensorflow

我想实现以下论文“使用超像素池网络的弱监督语义分割”中定义的超像素池层，最初是在 Torch 中实现的（无法实现）。我希望在 Keras 中使用 Theano 后端来完成（最好）。

我将举一个小例子来展示该层的作用。它需要以下输入：

feature_map：形状=(batch_size, height, width, feature_dim)

superpixel_map：形状=(batch_size, height, width)

让我们假设两个小矩阵batch_size = 1, height = width = 2, feature_dim = 1

feature_map = np.array([[[[ 0.1], [ 0.2 ]], [[ 0.3], [ 0.4]]]])  
superpixel_map = np.array([[[ 0,  0], [ 1,  2]]])

Run Code Online (Sandbox Code Playgroud)

现在，输出的形状为 = (batch_size, n_superpixels, feature_dim)。这里n_superpixels基本上就是 = np.amax(superpixel_map) + 1。

输出计算如下。

找到位置其中superpixel_map == i，其中从到i变化。让我们考虑一下。的位置是和0n_superpixels - 1i = 0i = 0(0, 0, 0)(0, 0, 1)

现在对特征图中这些位置的元素进行平均。这给了我们价值(0.1 + 0.2) / 2 = 0.15。i = 1对和执行此操作i = 2，分别给出值0.3和0.4。

现在，问题变得复杂了，因为通常batch_size > 1和height, width >> 1。

我在 Keras 中实现了一个新层，基本上可以实现此目的，但我使用了 for 循环。现在，如果height = width = 32. Theano 给出最大递归深度误差。有人知道如何解决这个问题吗？如果 TensorFlow 提供了新的东西，那么我也准备切换到 TensorFlow 后端。

我的新层的代码如下：

class SuperpixelPooling(Layer):
    def __init__(self, n_superpixels=None, n_features=None, batch_size=None, 
                 input_shapes=None, **kwargs):
        super(SuperpixelPooling, self).__init__(**kwargs)
        self.n_superpixels = n_superpixels
        self.n_features = n_features
        self.batch_size = batch_size
        self.input_shapes = input_shapes  # has to be a length-2 tuple, First tuple has the
                                          # shape of feature map and the next tuple has the
                                          # length of superpixel map. Shapes are of the
                                          # form (height, width, feature_dim)
    def compute_output_shape(self, input_shapes):
        return (input_shapes[0][0],
                    self.n_superpixels,
                    self.n_features)
    def call(self, inputs):
        # x = feature map
        # y = superpixel map, index from [0, n-1]
        x = inputs[0]  # batch_size x m x n x k
        y = inputs[1]  # batch_size x m x n
        ht = self.input_shapes[0][0]
        wd = self.input_shapes[0][1]
        z = K.zeros(shape=(self.batch_size, self.n_superpixels, self.n_features), 
                    dtype=float)
        count = K.zeros(shape=(self.batch_size, self.n_superpixels, self.n_features), 
                        dtype=int)
        for b in range(self.batch_size):
            for i in range(ht):
                for j in range(wd):
                    z = T.inc_subtensor(z[b, y[b, i, j], :], x[b, i, j, :])
                    count = T.inc_subtensor(count[b, y[b, i, j], :], 1)
        z /= count   
        return z

Run Code Online (Sandbox Code Playgroud)

我认为递归深度超出问题是由于我使用的嵌套 for 循环造成的。我没有看到避免这些循环的方法。如果有人有任何建议，请告诉我。

交叉发布在这里。如果我在那里得到任何答案，我会更新这篇文章。

Answer 1

Par*_*kar 4

我的 GitHub上有我的初步实现。它还没有准备好使用。请阅读以获得更多详情。为了完整起见，我将在这里发布实现及其简要说明（基本上来自自述文件）。

class SuperpixelPooling(Layer):
    def __init__(self, n_superpixels=None, n_features=None, batch_size=None, input_shapes=None, positions=None, superpixel_positions=None, superpixel_hist=None, **kwargs):
        super(SuperpixelPooling, self).__init__(**kwargs)

        # self.input_spec = InputSpec(ndim=4)
        self.n_superpixels = n_superpixels
        self.n_features = n_features
        self.batch_size = batch_size
        self.input_shapes = input_shapes  # has to be a length-2 tuple, First tuple has shape of feature map and the next tuple has 
                                          # length of superpixel map. Shapes are of the form (height, width, feature_dim)
        self.positions = positions  # has three columns
        self.superpixel_positions = superpixel_positions  # has two columns
        self.superpixel_hist = superpixel_hist  # is a vector
    def compute_output_shape(self, input_shapes):
        return (self.batch_size, self.n_superpixels, self.n_features)
    def call(self, inputs):
        # x = feature map
        # y = superpixel map, index from [0, n-1]
        x = inputs[0]  # batch_size x k x m x n
        y = inputs[1]  # batch_size x m x n
        ht = self.input_shapes[0][0]
        wd = self.input_shapes[0][1]
        z = K.zeros(shape=(self.batch_size, self.n_superpixels, self.n_features), dtype=float)
        z = T.inc_subtensor(z[self.superpixel_positions[:, 0], self.superpixel_positions[:, 1], :], x[self.positions[:, 0], :, self.positions[:, 1], self.positions[:, 2]])
        z /= self.superpixel_hist
        return z

Run Code Online (Sandbox Code Playgroud)

解释：

Keras 中超像素池化层的实现。请参阅 keras.layers.pooling 的实现。

超像素池化层的概念可以在论文中找到：“Weakly Supervised Semantic Segmentation using Superpixel Pooling Network”，AAAI 2017。该层接受两个输入，一个超像素图（size M x N）和一个特征图（size K x M x N）。它汇集属于同一超像素的特征（在此实现中为平均池）并形成一个1 x K向量，其中K是特征图深度/通道。

一个简单的实现将需要三个 for 循环：一个在批上迭代，另一个在行上迭代，最后一个在特征图的列上迭代并即时池化。然而，每当您尝试编译包含该层的模型时，这都会在 Theano 中出现“超出最大递归深度”错误。即使特征图宽度和高度仅为 32 时也会出现此错误。

为了克服这个问题，我认为将所有东西作为参数传递给这一层将摆脱至少两个 for 循环。最终，我能够创建一个单行代码来实现整个平均池操作的核心。您需要通过：

图像中的超像素数量
特征图深度/通道
批量大小
特征图和超像素图的形状
包含对应于被调用的N x 3所有可能的索引组合的矩阵。只要输入图像大小和批量大小保持不变，只需在训练期间生成一次。(batch_size, row, column)positions
一个N x 2矩阵称为superpixel_positions. i第 i 行包含与矩阵行中的索引相对应的超像素索引positions。例如，如果i矩阵的行positions包含(12, 10, 20)，则超像素位置的同一行将包含(12, sp_i)其中sp_i = superpixel_map[12, 10, 20]。
一个N x S矩阵 - superpixel_hist- 其中S该图像中的超像素数量。顾名思义，该矩阵保留当前图像中存在的超像素的直方图。

这种实现的缺点是必须根据每个图像更改这些参数（具体来说，第 6 点和第 7 点中提到的参数）。当 GPU 一次处理整个批次时，这是不切实际的。我认为这可以通过将所有这些参数作为外部层的输入传递来解决。基本上，它们可以从（例如）HDF5 文件中读取。我计划很快就这么做。完成后我会更新这个。

归档时间：	8 年，9 月前
查看次数：	1183 次
最近记录：	6 年，7 月前