对卷积神经网络中一维,二维和三维卷积的直观理解

Question

对卷积神经网络中一维,二维和三维卷积的直观理解

xla*_*lax 100 signal-processing machine-learning convolution deep-learning conv-neural-network

任何人都可以通过实例清楚地解释CNN(深度学习)中的1D,2D和3D卷积之间的区别吗？

Answer 1

run*_*ani 339

我想用C3D的图片来解释.

简而言之,卷积方向和输出形状很重要!

↑↑↑↑↑ 1D卷积 - 基本 ↑↑↑↑↑

只需1方向(时间轴)来计算conv
输入= [W],滤波器= [k],输出= [W]
ex)输入= [1,1,1,1,1],过滤器= [0.25,0.5,0.25],输出= [1,1,1,1,1]
输出形状是1D数组
例子)图表平滑

tf.nn.conv1d代码玩具示例

import tensorflow as tf
import numpy as np

sess = tf.Session()

ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1

in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)

in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])

input_1d   = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)

Run Code Online (Sandbox Code Playgroud)

↑↑↑↑↑ 2D Convolutions - 基本 ↑↑↑↑↑

2方向(x,y)来计算conv
输出形状是2D矩阵
输入= [W,H],滤波器= [k,k]输出= [W,H]
例子)Sobel Egde Fllter

tf.nn.conv2d - 玩具示例

ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]

in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)

in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])

filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])

input_2d   = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

Run Code Online (Sandbox Code Playgroud)

↑↑↑↑↑ 3D Convolutions - 基本 ↑↑↑↑↑

3方向(x,y,z)计算转换
输出形状是3D体积
input = [W,H,L ],filter = [k,k,d ] output = [W,H,M]
d <L很重要!用于制作音量输出
例)C3D

tf.nn.conv3d - 玩具示例

ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])

input_3d   = tf.reshape(in_3d, [1, in_depth, in_height, in_depth, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])

output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)

Run Code Online (Sandbox Code Playgroud)

↑↑↑↑↑ 2D输入的2D卷积 - LeNet,VGG,...,↑↑↑↑↑

尽管输入是3D ex)224x224x3,112x112x32
输出形状不是3D体积,而是2D矩阵
因为滤波器深度= L必须与输入通道= L匹配
2方向(x,y)计算转换!不是3D
输入= [W,H,L ],滤波器= [k,k,L ]输出= [W,H]
输出形状是2D矩阵
如果我们想训练N个滤波器(N是滤波器的数量)怎么办？
然后输出形状是(堆叠2D)3D = 2D×N矩阵.

conv2d - LeNet,VGG,......用于1个过滤器

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels)) 
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])

output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)

Run Code Online (Sandbox Code Playgroud)

conv2d - LeNet,VGG,...用于N个滤波器

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

Run Code Online (Sandbox Code Playgroud)

↑↑↑↑↑ 在CNN的奖金1x1转换 - GoogLeNet,...,↑↑↑↑↑

当您将此视为像sobel这样的2D图像过滤器时,1x1 conv会让您感到困惑
对于CNN中的1x1转换,输入为3D形状,如上图所示.
它计算深度过滤
输入= [W,H,L],滤波器= [1,1,L]输出= [W,H]
输出堆叠形状是3D = 2D×N矩阵.

tf.nn.conv2d - 特殊情况1x1转换

in_channels = 32 # 3 for RGB, 32, 64, 128, ... 
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]

in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)

in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])

filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])

input_3d   = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])

#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)

Run Code Online (Sandbox Code Playgroud)

动画(2D输入带3D输入)

- 原始链接:LINK
- 作者:MartinGörner
- 推特:@martin_gorner
- Google +:plus.google.com/+MartinGorne

使用2D输入获得1D卷积

↑↑↑↑↑ 一维输入的1D卷积 ↑↑↑↑↑

↑↑↑↑↑ 2D输入的1D卷积 ↑↑↑↑↑

尽管输入是2D ex)20x14
输出形状不是2D,而是1D矩阵
因为滤波器高度= L必须与输入高度= L匹配
1-方向(x)来计算转换!不是2D
input = [W,L ],filter = [k,L ] output = [W]
输出形状是1D矩阵
如果我们想训练N个滤波器(N是滤波器的数量)怎么办？
然后输出形状是(堆叠1D)2D = 1D×N矩阵.

奖金C3D

in_channels = 32 # 3, 32, 64, 128, ... 
out_channels = 64 # 3, 32, 64, 128, ... 
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]

in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)

in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])
filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])

input_4d   = tf.reshape(in_4d, [1, in_depth, in_height, in_depth, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])

output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)

sess.close()

Run Code Online (Sandbox Code Playgroud)

Tensorflow中的输入和输出

摘要

考虑到你的解释中的劳动和清晰度,8的赞成太少了. (12认同)
具有3D输入的2d转换是一个很好的触摸.我建议编辑包括1d转换和2d输入(例如多通道阵列),并将其与2d转换与2d输入的差异进行比较. (2认同)
惊人的答案！ (2认同)

Answer 2

thu*_*v89 15

在@runhani 的回答之后，我添加了更多细节以使解释更加清晰，并将尝试对此进行更多解释（当然还有来自 TF1 和 TF2 的示例）。

我包括的主要附加位之一是，

重视应用
的用法 tf.Variable
更清晰地解释输入/内核/输出 1D/2D/3D 卷积
步幅/填充的影响

一维卷积

以下是使用 TF 1 和 TF 2 进行一维卷积的方法。

具体来说，我的数据具有以下形状，

一维向量 - [batch size, width, in channels]（例如1, 5, 1）
内核 - [width, in channels, out channels]（例如5, 1, 4）
输出 - [batch size, width, out_channels]（例如1, 5, 4）

TF1 示例

import tensorflow as tf
import numpy as np

inp = tf.placeholder(shape=[None, 5, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  print(sess.run(out, feed_dict={inp: np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]])}))

Run Code Online (Sandbox Code Playgroud)

TF2 示例

import tensorflow as tf
import numpy as np

inp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')
print(out)

Run Code Online (Sandbox Code Playgroud)

使用 TF2 的方式更少，因为 TF2 不需要Session，variable_initializer例如。

这在现实生活中会是什么样子？

因此，让我们使用信号平滑示例来了解这是做什么的。左边是原始的，右边是具有 3 个输出通道的 Convolution 1D 的输出。

多渠道是什么意思？

多通道基本上是输入的多个特征表示。在这个例子中，你有三个不同的过滤器获得的三个表示。第一个通道是等权重平滑滤波器。第二个是过滤器中间的权重大于边界的权重。最后一个过滤器的作用与第二个相反。所以你可以看到这些不同的过滤器如何带来不同的效果。

一维卷积的深度学习应用

一维卷积已成功用于句子分类任务。

二维卷积

关闭到 2D 卷积。如果您是一名深度学习者，那么您没有遇到过 2D 卷积的可能性……几乎为零。它在 CNN 中用于图像分类、对象检测等以及涉及图像的 NLP 问题（例如图像标题生成）。

让我们试试一个例子，我在这里得到了一个带有以下过滤器的卷积核，

边缘检测内核（3x3 窗口）
模糊内核（3x3 窗口）
锐化内核（3x3 窗口）

具体来说，我的数据具有以下形状，

图像（黑白）- [batch_size, height, width, 1]（例如1, 340, 371, 1）
内核（又名过滤器）- [height, width, in channels, out channels]（例如3, 3, 1, 3）
输出（又名特征图）- [batch_size, height, width, out_channels]（例如1, 340, 371, 3）

TF1 示例，

import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0

kernel_init = np.array(
    [
     [[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
     [[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
     [[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
     ])

inp = tf.placeholder(shape=[None, image_height, image_width, 1], dtype=tf.float32)
kernel = tf.Variable(kernel_init, dtype=tf.float32)
out = tf.nn.conv2d(inp, kernel, strides=[1,1,1,1], padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  res = sess.run(out, feed_dict={inp: np.expand_dims(np.expand_dims(im,0),-1)})

Run Code Online (Sandbox Code Playgroud)

TF2 示例

import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0
x = np.expand_dims(np.expand_dims(im,0),-1)

kernel_init = np.array(
    [
     [[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
     [[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
     [[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
     ])

kernel = tf.Variable(kernel_init, dtype=tf.float32)

out = tf.nn.conv2d(x, kernel, strides=[1,1,1,1], padding='SAME')

Run Code Online (Sandbox Code Playgroud)

这在现实生活中会是什么样子？

在这里你可以看到上面代码产生的输出。第一个图像是原始图像，顺时针方向您有第一个过滤器、第二个过滤器和第 3 个过滤器的输出。

多渠道是什么意思？

在 2D 卷积的上下文中，更容易理解这些多个通道的含义。假设你在做人脸识别。您可以想到（这是一个非常不切实际的简化，但可以理解）每个过滤器代表眼睛、嘴巴、鼻子等。因此，每个特征图都是您提供的图像中是否存在该特征的二进制表示. 我认为我不需要强调，对于人脸识别模型来说，这些都是非常有价值的特征。这篇文章中有更多信息。

这是我试图表达的内容的说明。

二维卷积的深度学习应用

2D 卷积在深度学习领域非常普遍。

CNN（卷积神经网络）对几乎所有的计算机视觉任务（例如图像分类、对象检测、视频分类）都使用 2D 卷积运算。

3D 卷积

现在，随着维度数量的增加，说明会发生什么变得越来越困难。但是通过对 1D 和 2D 卷积的工作原理有很好的理解，将这种理解推广到 3D 卷积是非常简单的。所以就到这里了。

具体来说，我的数据具有以下形状，

3D 数据 (LIDAR) - [batch size, height, width, depth, in channels]（例如1, 200, 200, 200, 1）
内核 - [height, width, depth, in channels, out channels]（例如5, 5, 5, 1, 3）
输出 - [batch size, width, height, width, depth, out_channels]（例如1, 200, 200, 2000, 3）

TF1 示例

import tensorflow as tf
import numpy as np

tf.reset_default_graph()

inp = tf.placeholder(shape=[None, 200, 200, 200, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(inp, kernel, strides=[1,1,1,1,1], padding='SAME')

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  res = sess.run(out, feed_dict={inp: np.random.normal(size=(1,200,200,200,1))})

Run Code Online (Sandbox Code Playgroud)

TF2 示例

import tensorflow as tf
import numpy as np

x = np.random.normal(size=(1,200,200,200,1))
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(x, kernel, strides=[1,1,1,1,1], padding='SAME')

Run Code Online (Sandbox Code Playgroud)

3D卷积的深度学习应用

在开发涉及 3 维 LIDAR（光检测和测距）数据的机器学习应用程序时，已使用 3D 卷积。

什么......更多行话？：步幅和填充

好的，你快到了。所以坚持。让我们看看 stride 和 padding 是什么。如果您考虑一下，它们会非常直观。

如果你跨过一条走廊，你可以用更少的步子更快地到达那里。但这也意味着您观察到的周围环境比您穿过房间时要少。现在让我们也用一张漂亮的图片来加强我们的理解！让我们通过 2D 卷积来理解这些。

理解步幅

tf.nn.conv2d例如，当您使用时，您需要将其设置为 4 个元素的向量。没有理由被这吓倒。它只包含以下顺序的步幅。

2D 卷积 - [batch stride, height stride, width stride, channel stride]. 在这里，批量步幅和通道步幅您刚刚设置为 1（我已经实施了 5 年的深度学习模型，除了 1 之外，我从未将它们设置为任何其他内容）。因此，您只需设置 2 个步幅即可。
3D 卷积 - [batch stride, height stride, width stride, depth stride, channel stride]. 在这里你只关心高度/宽度/深度步幅。

了解填充

现在，您注意到无论您的步幅有多小（即 1），在卷积过程中都会发生不可避免的降维（例如，在对 4 个单位宽的图像进行卷积后，宽度为 3）。这是不可取的，尤其是在构建深度卷积神经网络时。这就是填充来救援的地方。有两种最常用的填充类型。

SAME 和 VALID

下面你可以看到区别。

最后一句话：如果你很好奇，你可能想知道。我们刚刚在全自动降维上扔了一颗炸弹，现在谈论的是不同的步幅。但是关于 stride 的最好的事情是你可以控制何时以及如何减少维度。

Answer 3

小智 5

总之，在一维 CNN 中，内核朝 1 个方向移动。一维 CNN 的输入和输出数据是二维的。主要用于时间序列数据。

在 2D CNN 中，内核在两个方向上移动。2D CNN 的输入和输出数据是 3 维的。主要用于图像数据。

在 3D CNN 中，内核向 3 个方向移动。3D CNN 的输入和输出数据是 4 维的。主要用于 3D 图像数据（MRI、CT 扫描）。

您可以在这里找到更多详细信息：https://medium.com/@xzz201920/conv1d-conv2d-and-conv3d-8a59182c4d6

归档时间：	8 年，9 月前
查看次数：	55977 次
最近记录：	6 年前