Keras：VGG16 中的 model.inputs 是什么

Question

Keras：VGG16 中的 model.inputs 是什么

我最近开始玩 keras 和 vgg16，我正在使用 keras.applications.vgg16。

但是在这里我提出了一个问题，model.inputs因为我看到其他人在https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py 中使用它，尽管它没有初始化它

    ...
    input_img = model.input
    ...
    layer_output = layer_dict[layer_name].output
    if K.image_data_format() == 'channels_first':
        loss = K.mean(layer_output[:, filter_index, :, :])
    else:
        loss = K.mean(layer_output[:, :, :, filter_index])

    # we compute the gradient of the input picture wrt this loss
    grads = K.gradients(loss, input_img)[0]

Run Code Online (Sandbox Code Playgroud)

我检查了 keras 站点，但它只说这是一个形状为 (1,224,224,3) 的输入张量，但我仍然不明白那到底是什么。那是来自 ImageNet 的图像吗？还是由 keras 为 keras 模型提供的默认图像？

如果我对深度学习没有足够的了解，我很抱歉，但有人可以向我解释一下。谢谢

Answer 1

dat*_*sta 5

的4个维度(1,224,224,3)是batch_size，image_width，image_height和image_channels分别。(1,224,224,3)意味着该VGG16模型接受批量大小1（一次一个图像）的形状224x224和三个通道 (RGB)。

有关 abatch和 abatch size是什么的更多信息，您可以查看此交叉验证问题。

回到VGG16，架构的输入是(1, 224, 224, 3)。这是什么意思？为了将图像输入网络，您需要：

对其进行预处理以达到 (224, 224) 和 3 个通道 (RGB) 的形状
将其转换为实际的形状矩阵 (224, 224, 3)
将需要网络的batch大小的各种图像组合在一起（本例中batch大小为1，但需要在矩阵中增加一个维度，以获得(1, 224, 224, 3)

完成此操作后，您可以将图像输入到模型中。

Keras 提供了很少的实用函数来完成这些任务。下面我展示了从文档中的图像分类模型的使用示例中使用 VGG16 提取特征中显示的代码片段的修改版本。

为了让它真正工作，你需要一个jpg名为elephant.jpg. 你可以通过运行这个 bash 命令来获取它：

wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg

Run Code Online (Sandbox Code Playgroud)

为了清晰起见，我将拆分图像预处理和模型预测中的代码：

加载图像

import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

Run Code Online (Sandbox Code Playgroud)

您可以沿途添加打印件以查看发生了什么，但这里是一个简短的摘要：

image.load_img() 加载一个 PIL 图像，已经在 RGB 中并且已经将其整形为 (224, 224)
image.img_to_array()正在将此图像转换为形状矩阵 (224, 224, 3)。如果您访问 x[0, 0, 0] ，您将获得第一个像素的红色分量作为 0 到 255 之间的数字
np.expand_dims(x, axis=0)正在添加第一个维度。x 之后是形状(1, 224, 224, 3)
preprocess_input正在做图像网络训练架构所需的额外预处理。从它的 docstring (run help(preprocess_input)) 你可以看到它：

将图像从 RGB 转换为 BGR，然后将每个颜色通道相对于 ImageNet 数据集归零，而不进行缩放

这似乎是 ImageNet 训练集的标准输入。

预处理就是这样，现在您只需在预训练模型中输入图像并获得预测

预测

y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)

Run Code Online (Sandbox Code Playgroud)

y_hat 包含模型分配给该图像的 1000 个 imagenet 类中的每一个的概率。

为了获得类名和可读的输出，keras 也提供了一个实用函数：

from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)

Run Code Online (Sandbox Code Playgroud)

输出，对于Zoorashia_elephant.jpg我之前下载的图像：

[[('n02504013', 'Indian_elephant', 0.48041093),
  ('n02504458', 'African_elephant', 0.47474155),
  ('n01871265', 'tusker', 0.03912963),
  ('n02437312', 'Arabian_camel', 0.0038948185),
  ('n01704323', 'triceratops', 0.00062475674)]]

Run Code Online (Sandbox Code Playgroud)

这看起来不错！

归档时间：	7 年，1 月前
查看次数：	933 次
最近记录：	7 年，1 月前