我想构建和训练一个二元分类器,PyTorch从路径中读取图像,处理它们并使用它们的标签训练分类器。我的图像可以在以下文件夹中找到:
-data
- class_1_folder
- class_2_folder
Run Code Online (Sandbox Code Playgroud)
因此,为了以张量读取它们,我正在执行以下操作:
PATH = "data/"
transform = transforms.Compose([transforms.Resize(256),
transforms.RandomCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224,0.225])])
dataset = datasets.ImageFolder(PATH, transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=356, shuffle=True)
images, labels = next(iter(dataloader))
Run Code Online (Sandbox Code Playgroud)
该代码实际上读取图像并执行一些必要的转换预处理。下一步创建模型并执行训练:
acc = model_train(images, labels, images, labels)
Run Code Online (Sandbox Code Playgroud)
与model_train是:
import pdb
import torch
from torchvision import datasets, transforms, models
from matplotlib import pyplot as plt
from torchvision.transforms.functional import to_pil_image
import torch.nn as nn
import numpy as np
import torch.optim as optim
import tqdm
import copy
from time import sleep
def my_model():
device = "cuda" if torch.cuda.is_available() else "cpu"
model = models.resnet18(pretrained=False)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 1) # Binary classifier with 2 output classes
# Move the model to the device
model = model.to(device)
return model
def model_train(X_train, y_train, X_val, y_val):
model = my_model()
dtype = torch.FloatTensor
loss_fn = nn.CrossEntropyLoss().type(dtype) # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.0001)
n_epochs = 20 # number of epochs to run
batch_size = 10 # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)
# Hold the best model
best_acc = - np.inf # init to negative infinity
best_weights = None
for epoch in range(n_epochs):
model.train()
with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
bar.set_description(f"Epoch {epoch}")
for start in bar:
# take a batch
bar.set_description(f"Epoch {epoch}")
X_batch = X_train[start:start+batch_size]
y_batch = y_train[start:start+batch_size]
# forward pass
y_pred = model(X_batch)
y_pred = torch.max(y_pred, 1)[0]
loss = loss_fn(y_pred, y_batch.float())
# backward pass
print(loss)
optimizer.zero_grad()
loss.backward()
# update weights
optimizer.step()
# print progress
acc = (y_pred.round() == y_batch).float().mean()
bar.set_postfix(
loss=float(loss),
acc=float(acc)
)
bar.set_postfix(loss=loss.item(), accuracy=100. * acc)
sleep(0.1)
# evaluate accuracy at end of each epoch
model.eval()
y_pred = model(X_val)
acc = (y_pred.round() == y_val).float().mean()
acc = float(acc)
if acc > best_acc:
best_acc = acc
best_weights = copy.deepcopy(model.state_dict())
# restore model and return best accuracy
torch.save(model.state_dict(), "model/my_model.pth")
model.load_state_dict(best_weights)
return best_acc
Run Code Online (Sandbox Code Playgroud)
我试图了解如何progress bar在训练期间正确地描绘,其次,我怎样才能validate that the training process took place correctly。对于后者,我注意到一个奇怪的行为。因为class zero我总是得到零损失,而对于第一类,它在 range 之间13-24。这似乎是不正确的,但是,我知道如何深入研究!
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.0986, -0.0806, -0.0161, 0.0287, -0.0279, 0.0083, -0.0526, -0.1393,
-0.2082, -0.0141], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.1779, 0.0936, -0.0341, -0.1531, -0.1222, -0.1169, -0.0160, -0.0674,
0.1230, -0.1181], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.0438, -0.1269, -0.1624, -0.0976, -0.0132, -0.1944, -0.0034, -0.0454,
-0.1559, 0.0657], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.1655, 0.0222, -0.0801, -0.1390, -0.0905, -0.1472, -0.0395, -0.0180,
-0.1492, 0.0914], grad_fn=<MaxBackward0>)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
torch.float32
torch.int64
tensor(-0., grad_fn=<DivBackward1>)
tensor([-0.7035, -0.1989, 0.0921, -0.1082, -0.2588, -0.3557, 0.3093, 0.0909,
0.1603, 0.1838], grad_fn=<MaxBackward0>)
tensor([0, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(20.4545, grad_fn=<DivBackward1>)
tensor([-0.4783, -0.1027, -0.0357, 0.0882, -0.2955, -0.0968, 0.3323, -0.0472,
0.1017, -0.2186], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.2550, grad_fn=<DivBackward1>)
tensor([ 0.1554, -0.2664, 0.1419, 0.0203, 0.0895, -0.0085, -0.2867, -0.1957,
-0.1315, -0.2340], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.1584, grad_fn=<DivBackward1>)
tensor([-0.0406, -0.2144, 0.1997, 0.2196, -0.3464, 0.1311, -0.0743, -0.2440,
-0.1751, -0.2371], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.2112, grad_fn=<DivBackward1>)
tensor([-0.0080, -0.1138, -0.1035, 0.0697, -0.1745, -0.1438, -0.2360, -0.1308,
0.0146, 0.1209], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.0853, grad_fn=<DivBackward1>)
tensor([-0.1235, 0.0081, -0.1073, -0.1036, -0.2037, -0.1204, -0.0570, -0.1146,
0.0849, 0.0798], grad_fn=<MaxBackward0>)
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
torch.float32
torch.int64
tensor(23.0666, grad_fn=<DivBackward1>)
tensor([-0.0660, -0.0832, -0.0414, -0.0334, -0.0123, -0.0203, -0.0549, -0.0747,
-0.0779, -0.1629], grad_fn=<MaxBackward0>)
Run Code Online (Sandbox Code Playgroud)
在这种情况下可能出现什么问题?
概要
\n所附代码存在一些问题:
\niter),nn.Linear(num_features, 1)并且使用了非二元交叉熵)。虽然前两个问题可能会影响质量和效率,但我首先将其归咎于“二进制设置”。为了解决这个问题,我建议首先设置 2 个类的训练 - 即形状标签[batch,2]- 如果可行,请小心地转向二进制编码的情况。
工作方案
\n让我们在 FashionMNIST 上完成区分数字 3 和 8 的任务:
\ncurl -L -o data.zip https://github.com/DeepLenin/fashion-mnist_png/raw/master/data.zip\nunzip data.zip\nRun Code Online (Sandbox Code Playgroud)\n这是适当的dataloader:
curl -L -o data.zip https://github.com/DeepLenin/fashion-mnist_png/raw/master/data.zip\nunzip data.zip\nRun Code Online (Sandbox Code Playgroud)\n第二步,我们构建模型
\n## Dataset\n\nimport torch\nfrom torchvision import datasets\nimport torchvision.transforms as transforms\n\nPATH = "data/train"\n\ntransform = transforms.Compose([transforms.Resize(256),\n transforms.RandomCrop(224),\n transforms.ToTensor(),\n transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224,0.225])])\n\ndataset = datasets.ImageFolder(PATH, transform=transform)\ndataset.classes = [\'3\',\'8\']\ndataset.class_to_idx = {\'3\':0,\'8\':1}\ndataset.samples = list(filter(lambda s: s[1] in [0,1], dataset.samples))\n\ndataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)\nRun Code Online (Sandbox Code Playgroud)\n最后,我们用进度条进行训练(参见Adam Oudad 的帖子tqdm):
## Model construction\n\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\nfrom torchvision import datasets, transforms, models\n\ndevice = "cuda" if torch.cuda.is_available() else "cpu"\n\n\ndef my_model():\n \n model = models.resnet18(pretrained=False)\n num_features = model.fc.in_features\n model.fc = nn.Linear(num_features, 2) # classifier with 2 output classes\n\n # Move the model to the device\n model = model.to(device)\n return model\n\n\nmodel = my_model()\ndtype = torch.FloatTensor\nloss_fn = nn.CrossEntropyLoss()\noptimizer = optim.Adam(model.parameters(), lr=0.001)\nRun Code Online (Sandbox Code Playgroud)\n损失由进度条监视,如下所示:
\nEpoch 1: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [01:01<00:00, 6.13batch/s, loss=0.211]\nEpoch 2: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00, 6.25batch/s, loss=0.0149]\nEpoch 3: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [01:00<00:00, 6.24batch/s, loss=0.00139]\nEpoch 4: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00, 6.27batch/s, loss=0.00149]\nEpoch 5: 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 375/375 [00:59<00:00, 6.29batch/s, loss=0.00252]\nRun Code Online (Sandbox Code Playgroud)\n如果有任何疑问,可以在测试数据集上评估准确性:
\n## Model traning\n\nfrom tqdm import tqdm\n\nmodel.train()\nfor epoch in range(1, 6):\n with tqdm(dataloader, unit="batch") as tepoch:\n for X_batch, y_batch in tepoch:\n tepoch.set_description(f"Epoch {epoch}")\n X_batch, y_batch = X_batch.to(device), y_batch.to(device)\n optimizer.zero_grad()\n y_pred = model(X_batch)\n loss = loss_fn(y_pred, y_batch)\n loss.backward()\n optimizer.step()\n tepoch.set_postfix(loss=loss.item())\nRun Code Online (Sandbox Code Playgroud)\n这样我们就完成了!
\n可重现的代码
\n\n| 归档时间: |
|
| 查看次数: |
368 次 |
| 最近记录: |