PyTorch进度条中的二值图像分类器以及检查训练是否有效的方法

Jos*_*mon 5 python pytorch

我想构建和训练一个二元分类器,PyTorch从路径中读取图像,处理它们并使用它们的标签训练分类器。我的图像可以在以下文件夹中找到:

-data
  - class_1_folder
  - class_2_folder
Run Code Online (Sandbox Code Playgroud)

因此,为了以张量读取它们,我正在执行以下操作:

PATH = "data/"

transform = transforms.Compose([transforms.Resize(256),
                            transforms.RandomCrop(224),
                            transforms.ToTensor(),
                            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224,0.225])])

dataset = datasets.ImageFolder(PATH, transform=transform)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=356, shuffle=True)
images, labels = next(iter(dataloader))
Run Code Online (Sandbox Code Playgroud)

该代码实际上读取图像并执行一些必要的转换预处理。下一步创建模型并执行训练:

acc = model_train(images, labels, images, labels)
Run Code Online (Sandbox Code Playgroud)

model_train是:

import pdb
import torch
from torchvision import datasets, transforms, models
from matplotlib import pyplot as plt
from torchvision.transforms.functional import to_pil_image
import torch.nn as nn
import numpy as np
import torch.optim as optim
import tqdm
import copy
from time import sleep

def my_model():
   device = "cuda" if torch.cuda.is_available() else "cpu"

   model = models.resnet18(pretrained=False)
   num_features = model.fc.in_features
   model.fc = nn.Linear(num_features, 1)  # Binary classifier with 2 output classes

   # Move the model to the device
   model = model.to(device)
   return model

def model_train(X_train, y_train, X_val, y_val):

   model = my_model()
   dtype = torch.FloatTensor
   loss_fn = nn.CrossEntropyLoss().type(dtype) # binary cross entropy
   optimizer = optim.Adam(model.parameters(), lr=0.0001)

   n_epochs = 20   # number of epochs to run
   batch_size = 10  # size of each batch
   batch_start = torch.arange(0, len(X_train), batch_size)

   # Hold the best model
   best_acc = - np.inf   # init to negative infinity
   best_weights = None

   for epoch in range(n_epochs):
       model.train()
       with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
           bar.set_description(f"Epoch {epoch}")
           for start in bar:
               # take a batch
               bar.set_description(f"Epoch {epoch}")
               X_batch = X_train[start:start+batch_size]
               y_batch = y_train[start:start+batch_size]
               # forward pass
               y_pred = model(X_batch)
               y_pred = torch.max(y_pred, 1)[0]
            
               loss = loss_fn(y_pred, y_batch.float())
               # backward pass
               print(loss)
            
               optimizer.zero_grad()
               loss.backward()
               # update weights
               optimizer.step()
               # print progress
               acc = (y_pred.round() == y_batch).float().mean()
               bar.set_postfix(
                   loss=float(loss),
                   acc=float(acc)
               )
            
               bar.set_postfix(loss=loss.item(), accuracy=100. * acc)
               sleep(0.1)
       # evaluate accuracy at end of each epoch
       model.eval()
       y_pred = model(X_val)
       acc = (y_pred.round() == y_val).float().mean()
       acc = float(acc)
       if acc > best_acc:
           best_acc = acc
           best_weights = copy.deepcopy(model.state_dict())
   # restore model and return best accuracy
   torch.save(model.state_dict(), "model/my_model.pth")
   model.load_state_dict(best_weights)
   return best_acc
Run Code Online (Sandbox Code Playgroud)

我试图了解如何progress bar在训练期间正确地描绘,其次,我怎样才能validate that the training process took place correctly。对于后者,我注意到一个奇怪的行为。因为class zero我总是得到零损失,而对于第一类,它在 range 之间13-24。这似乎是不正确的,但是,我知道如何深入研究!

    tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.0986, -0.0806, -0.0161,  0.0287, -0.0279,  0.0083, -0.0526, -0.1393,
        -0.2082, -0.0141], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.1779,  0.0936, -0.0341, -0.1531, -0.1222, -0.1169, -0.0160, -0.0674,
         0.1230, -0.1181], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.0438, -0.1269, -0.1624, -0.0976, -0.0132, -0.1944, -0.0034, -0.0454,
        -0.1559,  0.0657], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.1655,  0.0222, -0.0801, -0.1390, -0.0905, -0.1472, -0.0395, -0.0180,
        -0.1492,  0.0914], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.7035, -0.1989,  0.0921, -0.1082, -0.2588, -0.3557,  0.3093,  0.0909,
         0.1603,  0.1838], grad_fn=<MaxBackward0>)
tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(20.4545, grad_fn=<DivBackward1>)
tensor([-0.4783, -0.1027, -0.0357,  0.0882, -0.2955, -0.0968,  0.3323, -0.0472,
         0.1017, -0.2186], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.2550, grad_fn=<DivBackward1>)
tensor([ 0.1554, -0.2664,  0.1419,  0.0203,  0.0895, -0.0085, -0.2867, -0.1957,
        -0.1315, -0.2340], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.1584, grad_fn=<DivBackward1>)
tensor([-0.0406, -0.2144,  0.1997,  0.2196, -0.3464,  0.1311, -0.0743, -0.2440,
        -0.1751, -0.2371], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.2112, grad_fn=<DivBackward1>)
tensor([-0.0080, -0.1138, -0.1035,  0.0697, -0.1745, -0.1438, -0.2360, -0.1308,
         0.0146,  0.1209], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.0853, grad_fn=<DivBackward1>)
tensor([-0.1235,  0.0081, -0.1073, -0.1036, -0.2037, -0.1204, -0.0570, -0.1146,
         0.0849,  0.0798], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.0666, grad_fn=<DivBackward1>)
tensor([-0.0660, -0.0832, -0.0414, -0.0334, -0.0123, -0.0203, -0.0549, -0.0747,
        -0.0779, -0.1629], grad_fn=<MaxBackward0>)
Run Code Online (Sandbox Code Playgroud)

在这种情况下可能出现什么问题?

Mac*_*ski 2

概要

\n

所附代码存在一些问题:

\n
    \n
  • torch API 未得到充分利用(因为代码太长)
  • \n
  • 数据集未正确输入(训练部分迭代返回一次的数据块iter),
  • \n
  • 最后一层和损失看起来不正确(因为其中有一个神经元nn.Linear(num_features, 1)并且使用了非二元交叉熵)。
  • \n
\n

虽然前两个问题可能会影响质量和效率,但我首先将其归咎于“二进制设置”。为了解决这个问题,我建议首先设置 2 个类的训练 - 即形状标签[batch,2]- 如果可行,请小心地转向二进制编码的情况。

\n

工作方案

\n

让我们在 FashionMNIST 上完成区分数字 3 和 8 的任务:

\n
curl -L -o data.zip https://github.com/DeepLenin/fashion-mnist_png/raw/master/data.zip\nunzip data.zip\n
Run Code Online (Sandbox Code Playgroud)\n

这是适当的dataloader

\n
curl -L -o data.zip https://github.com/DeepLenin/fashion-mnist_png/raw/master/data.zip\nunzip data.zip\n
Run Code Online (Sandbox Code Playgroud)\n

第二步,我们构建模型

\n
## Dataset\n\nimport torch\nfrom torchvision import datasets\nimport torchvision.transforms as transforms\n\nPATH = "data/train"\n\ntransform = transforms.Compose([transforms.Resize(256),\n                            transforms.RandomCrop(224),\n                            transforms.ToTensor(),\n                            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224,0.225])])\n\ndataset = datasets.ImageFolder(PATH, transform=transform)\ndataset.classes = [\'3\',\'8\']\ndataset.class_to_idx = {\'3\':0,\'8\':1}\ndataset.samples = list(filter(lambda s: s[1] in [0,1], dataset.samples))\n\ndataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)\n
Run Code Online (Sandbox Code Playgroud)\n

最后,我们用进度条进行训练(参见Adam Oudad 的帖子tqdm):

\n
## Model construction\n\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\nfrom torchvision import datasets, transforms, models\n\ndevice = "cuda" if torch.cuda.is_available() else "cpu"\n\n\ndef my_model():\n   \n   model = models.resnet18(pretrained=False)\n   num_features = model.fc.in_features\n   model.fc = nn.Linear(num_features, 2)  # classifier with 2 output classes\n\n   # Move the model to the device\n   model = model.to(device)\n   return model\n\n\nmodel = my_model()\ndtype = torch.FloatTensor\nloss_fn = nn.CrossEntropyLoss()\noptimizer = optim.Adam(model.parameters(), lr=0.001)\n
Run Code Online (Sandbox Code Playgroud)\n

损失由进度条监视,如下所示:

\n
Epoch 1: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [01:01<00:00,  6.13batch/s, loss=0.211]\nEpoch 2: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00,  6.25batch/s, loss=0.0149]\nEpoch 3: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [01:00<00:00,  6.24batch/s, loss=0.00139]\nEpoch 4: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00,  6.27batch/s, loss=0.00149]\nEpoch 5: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00,  6.29batch/s, loss=0.00252]\n
Run Code Online (Sandbox Code Playgroud)\n

如果有任何疑问,可以在测试数据集上评估准确性:

\n
## Model traning\n\nfrom tqdm import tqdm\n\nmodel.train()\nfor epoch in range(1, 6):\n  with tqdm(dataloader, unit="batch") as tepoch:\n    for X_batch, y_batch in tepoch:\n        tepoch.set_description(f"Epoch {epoch}")\n        X_batch, y_batch = X_batch.to(device), y_batch.to(device)\n        optimizer.zero_grad()\n        y_pred = model(X_batch)\n        loss = loss_fn(y_pred, y_batch)\n        loss.backward()\n        optimizer.step()\n        tepoch.set_postfix(loss=loss.item())\n
Run Code Online (Sandbox Code Playgroud)\n

这样我们就完成了!

\n

可重现的代码

\n

请参阅此笔记本,它可以在 Colab 的 GPU 上运行

\n