PyTorch进度条中的二值图像分类器以及检查训练是否有效的方法

Question

PyTorch进度条中的二值图像分类器以及检查训练是否有效的方法

我想构建和训练一个二元分类器，PyTorch从路径中读取图像，处理它们并使用它们的标签训练分类器。我的图像可以在以下文件夹中找到：

-data
  - class_1_folder
  - class_2_folder

Run Code Online (Sandbox Code Playgroud)

因此，为了以张量读取它们，我正在执行以下操作：

PATH = "data/"

transform = transforms.Compose([transforms.Resize(256),
                            transforms.RandomCrop(224),
                            transforms.ToTensor(),
                            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224,0.225])])

dataset = datasets.ImageFolder(PATH, transform=transform)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=356, shuffle=True)
images, labels = next(iter(dataloader))

Run Code Online (Sandbox Code Playgroud)

该代码实际上读取图像并执行一些必要的转换预处理。下一步创建模型并执行训练：

acc = model_train(images, labels, images, labels)

Run Code Online (Sandbox Code Playgroud)

与model_train是：

import pdb
import torch
from torchvision import datasets, transforms, models
from matplotlib import pyplot as plt
from torchvision.transforms.functional import to_pil_image
import torch.nn as nn
import numpy as np
import torch.optim as optim
import tqdm
import copy
from time import sleep

def my_model():
   device = "cuda" if torch.cuda.is_available() else "cpu"

   model = models.resnet18(pretrained=False)
   num_features = model.fc.in_features
   model.fc = nn.Linear(num_features, 1)  # Binary classifier with 2 output classes

   # Move the model to the device
   model = model.to(device)
   return model

def model_train(X_train, y_train, X_val, y_val):

   model = my_model()
   dtype = torch.FloatTensor
   loss_fn = nn.CrossEntropyLoss().type(dtype) # binary cross entropy
   optimizer = optim.Adam(model.parameters(), lr=0.0001)

   n_epochs = 20   # number of epochs to run
   batch_size = 10  # size of each batch
   batch_start = torch.arange(0, len(X_train), batch_size)

   # Hold the best model
   best_acc = - np.inf   # init to negative infinity
   best_weights = None

   for epoch in range(n_epochs):
       model.train()
       with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
           bar.set_description(f"Epoch {epoch}")
           for start in bar:
               # take a batch
               bar.set_description(f"Epoch {epoch}")
               X_batch = X_train[start:start+batch_size]
               y_batch = y_train[start:start+batch_size]
               # forward pass
               y_pred = model(X_batch)
               y_pred = torch.max(y_pred, 1)[0]
            
               loss = loss_fn(y_pred, y_batch.float())
               # backward pass
               print(loss)
            
               optimizer.zero_grad()
               loss.backward()
               # update weights
               optimizer.step()
               # print progress
               acc = (y_pred.round() == y_batch).float().mean()
               bar.set_postfix(
                   loss=float(loss),
                   acc=float(acc)
               )
            
               bar.set_postfix(loss=loss.item(), accuracy=100. * acc)
               sleep(0.1)
       # evaluate accuracy at end of each epoch
       model.eval()
       y_pred = model(X_val)
       acc = (y_pred.round() == y_val).float().mean()
       acc = float(acc)
       if acc > best_acc:
           best_acc = acc
           best_weights = copy.deepcopy(model.state_dict())
   # restore model and return best accuracy
   torch.save(model.state_dict(), "model/my_model.pth")
   model.load_state_dict(best_weights)
   return best_acc

Run Code Online (Sandbox Code Playgroud)

我试图了解如何progress bar在训练期间正确地描绘，其次，我怎样才能validate that the training process took place correctly。对于后者，我注意到一个奇怪的行为。因为class zero我总是得到零损失，而对于第一类，它在 range 之间13-24。这似乎是不正确的，但是，我知道如何深入研究！

    tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.0986, -0.0806, -0.0161,  0.0287, -0.0279,  0.0083, -0.0526, -0.1393,
        -0.2082, -0.0141], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.1779,  0.0936, -0.0341, -0.1531, -0.1222, -0.1169, -0.0160, -0.0674,
         0.1230, -0.1181], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.0438, -0.1269, -0.1624, -0.0976, -0.0132, -0.1944, -0.0034, -0.0454,
        -0.1559,  0.0657], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.1655,  0.0222, -0.0801, -0.1390, -0.0905, -0.1472, -0.0395, -0.0180,
        -0.1492,  0.0914], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.7035, -0.1989,  0.0921, -0.1082, -0.2588, -0.3557,  0.3093,  0.0909,
         0.1603,  0.1838], grad_fn=<MaxBackward0>)
tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(20.4545, grad_fn=<DivBackward1>)
tensor([-0.4783, -0.1027, -0.0357,  0.0882, -0.2955, -0.0968,  0.3323, -0.0472,
         0.1017, -0.2186], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.2550, grad_fn=<DivBackward1>)
tensor([ 0.1554, -0.2664,  0.1419,  0.0203,  0.0895, -0.0085, -0.2867, -0.1957,
        -0.1315, -0.2340], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.1584, grad_fn=<DivBackward1>)
tensor([-0.0406, -0.2144,  0.1997,  0.2196, -0.3464,  0.1311, -0.0743, -0.2440,
        -0.1751, -0.2371], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.2112, grad_fn=<DivBackward1>)
tensor([-0.0080, -0.1138, -0.1035,  0.0697, -0.1745, -0.1438, -0.2360, -0.1308,
         0.0146,  0.1209], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.0853, grad_fn=<DivBackward1>)
tensor([-0.1235,  0.0081, -0.1073, -0.1036, -0.2037, -0.1204, -0.0570, -0.1146,
         0.0849,  0.0798], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.0666, grad_fn=<DivBackward1>)
tensor([-0.0660, -0.0832, -0.0414, -0.0334, -0.0123, -0.0203, -0.0549, -0.0747,
        -0.0779, -0.1629], grad_fn=<MaxBackward0>)

Run Code Online (Sandbox Code Playgroud)

在这种情况下可能出现什么问题？

Answer 1

Mac*_*ski 2

概要

\n

所附代码存在一些问题：

\n

torch API 未得到充分利用（因为代码太长）
数据集未正确输入（训练部分迭代返回一次的数据块iter），
最后一层和损失看起来不正确（因为其中有一个神经元nn.Linear(num_features, 1)并且使用了非二元交叉熵）。

\n

虽然前两个问题可能会影响质量和效率，但我首先将其归咎于“二进制设置”。为了解决这个问题，我建议首先设置 2 个类的训练 - 即形状标签[batch,2]- 如果可行，请小心地转向二进制编码的情况。

\n

工作方案

\n

让我们在 FashionMNIST 上完成区分数字 3 和 8 的任务：

\n

curl -L -o data.zip https://github.com/DeepLenin/fashion-mnist_png/raw/master/data.zip\nunzip data.zip\n

Run Code Online (Sandbox Code Playgroud)\n

这是适当的dataloader：

\n

curl -L -o data.zip https://github.com/DeepLenin/fashion-mnist_png/raw/master/data.zip\nunzip data.zip\n

Run Code Online (Sandbox Code Playgroud)\n

第二步，我们构建模型

\n

## Dataset\n\nimport torch\nfrom torchvision import datasets\nimport torchvision.transforms as transforms\n\nPATH = "data/train"\n\ntransform = transforms.Compose([transforms.Resize(256),\n                            transforms.RandomCrop(224),\n                            transforms.ToTensor(),\n                            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224,0.225])])\n\ndataset = datasets.ImageFolder(PATH, transform=transform)\ndataset.classes = [\'3\',\'8\']\ndataset.class_to_idx = {\'3\':0,\'8\':1}\ndataset.samples = list(filter(lambda s: s[1] in [0,1], dataset.samples))\n\ndataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)\n

Run Code Online (Sandbox Code Playgroud)\n

最后，我们用进度条进行训练（参见Adam Oudad 的帖子tqdm）：

\n

## Model construction\n\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\nfrom torchvision import datasets, transforms, models\n\ndevice = "cuda" if torch.cuda.is_available() else "cpu"\n\n\ndef my_model():\n   \n   model = models.resnet18(pretrained=False)\n   num_features = model.fc.in_features\n   model.fc = nn.Linear(num_features, 2)  # classifier with 2 output classes\n\n   # Move the model to the device\n   model = model.to(device)\n   return model\n\n\nmodel = my_model()\ndtype = torch.FloatTensor\nloss_fn = nn.CrossEntropyLoss()\noptimizer = optim.Adam(model.parameters(), lr=0.001)\n

Run Code Online (Sandbox Code Playgroud)\n

损失由进度条监视，如下所示：

\n

Epoch 1: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [01:01<00:00,  6.13batch/s, loss=0.211]\nEpoch 2: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00,  6.25batch/s, loss=0.0149]\nEpoch 3: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [01:00<00:00,  6.24batch/s, loss=0.00139]\nEpoch 4: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00,  6.27batch/s, loss=0.00149]\nEpoch 5: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00,  6.29batch/s, loss=0.00252]\n

Run Code Online (Sandbox Code Playgroud)\n

如果有任何疑问，可以在测试数据集上评估准确性：

\n

## Model traning\n\nfrom tqdm import tqdm\n\nmodel.train()\nfor epoch in range(1, 6):\n  with tqdm(dataloader, unit="batch") as tepoch:\n    for X_batch, y_batch in tepoch:\n        tepoch.set_description(f"Epoch {epoch}")\n        X_batch, y_batch = X_batch.to(device), y_batch.to(device)\n        optimizer.zero_grad()\n        y_pred = model(X_batch)\n        loss = loss_fn(y_pred, y_batch)\n        loss.backward()\n        optimizer.step()\n        tepoch.set_postfix(loss=loss.item())\n

Run Code Online (Sandbox Code Playgroud)\n

这样我们就完成了！

\n

可重现的代码

\n

请参阅此笔记本，它可以在 Colab 的 GPU 上运行。

\n

归档时间：	2 年，8 月前
查看次数：	368 次
最近记录：	2 年，7 月前