如何从 Pytorch 中的单个图像中提取特征向量？

Question

如何从 Pytorch 中的单个图像中提取特征向量？

use*_*903 5 python feature-extraction computer-vision pytorch

我正在尝试更多地了解计算机视觉模型，并且我正在尝试对它们的工作方式进行一些探索。为了理解如何更多地解释特征向量，我尝试使用 Pytorch 来提取特征向量。下面是我从不同地方拼凑起来的代码。

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image



img=Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    
def get_vector(image_name):
    # Load the image with Pillow library
    img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
    # Create a PyTorch Variable with the transformed image
    t_img = transforms(img)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)
    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.data)
    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    model(t_img)
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding

pic_vector = get_vector(img)

Run Code Online (Sandbox Code Playgroud)

当我这样做时，我收到以下错误：

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead

Run Code Online (Sandbox Code Playgroud)

我确定这是一个基本错误，但我似乎无法弄清楚如何解决这个问题。我的印象是“totensor”转换会使我的数据变成 4-d，但它似乎无法正常工作，或者我误解了它。感谢我可以用来了解更多信息的任何帮助或资源！

Answer 1

jod*_*dag 6

nn.Modulespytorch 中的所有默认值都需要一个额外的批处理维度。如果模块的输入是形状 (B, ...)，那么输出也将是 (B, ...)（尽管后面的维度可能会因层而异）。这种行为允许同时对 B 输入的批次进行有效推理。为了使您的代码符合要求，您可以unsqueeze在将t_img张量发送到您的模型以使其成为 (1, ...) 张量之前，在张量的前面添加一个额外的单一维度。如果要将其复制到一维张量中，则还需要在存储之前flatten输出。layermy_embedding

其他几件事：

您应该在torch.no_grad()上下文中进行推断以避免计算梯度，因为您不需要它们（请注意，它model.eval()只会更改某些层的行为，例如 dropout 和批量标准化，它不会禁用计算图的构建，但torch.no_grad()会禁用）。
我认为这只是一个复制粘贴问题，但它transforms是导入模块的名称以及全局变量。
o.data只是返回o. 在旧Variable界面（大约 PyTorch 0.3.1 及更早版本）中，这曾经是必要的，但该Variable界面在 PyTorch 0.4.0 中已被弃用，不再做任何有用的事情；现在它的使用只会造成混乱。不幸的是，许多教程仍在使用这个旧的和不必要的界面编写。

更新后的代码如下：

import torch
import torchvision
import torchvision.models as models
from PIL import Image

img = Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


def get_vector(image):
    # Create a PyTorch tensor with the transformed image
    t_img = transforms(image)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)

    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.flatten())                 # <-- flatten

    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    with torch.no_grad():                               # <-- no_grad context
        model(t_img.unsqueeze(0))                       # <-- unsqueeze
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding


pic_vector = get_vector(img)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，4 月前
查看次数：	2195 次
最近记录：	4 年，4 月前