损失在 Pytorch 中没有收敛,但在 Tensorflow 中收敛

Muh*_*fil 5 python neural-network deep-learning pytorch loss-function

Epoch: 1    Training Loss: 0.816370     Validation Loss: 0.696534
Validation loss decreased (inf --> 0.696534).  Saving model ...
Epoch: 2    Training Loss: 0.507756     Validation Loss: 0.594713
Validation loss decreased (0.696534 --> 0.594713).  Saving model ...
Epoch: 3    Training Loss: 0.216438     Validation Loss: 1.119294
Epoch: 4    Training Loss: 0.191799     Validation Loss: 0.801231
Epoch: 5    Training Loss: 0.111334     Validation Loss: 1.753786
Epoch: 6    Training Loss: 0.064309     Validation Loss: 1.348847
Epoch: 7    Training Loss: 0.058158     Validation Loss: 1.839139
Epoch: 8    Training Loss: 0.015489     Validation Loss: 1.370469
Epoch: 9    Training Loss: 0.082856     Validation Loss: 1.701200
Epoch: 10   Training Loss: 0.003859     Validation Loss: 2.657933
Epoch: 11   Training Loss: 0.018133     Validation Loss: 0.593986
Validation loss decreased (0.594713 --> 0.593986).  Saving model ...
Epoch: 12   Training Loss: 0.160197     Validation Loss: 1.499911
Epoch: 13   Training Loss: 0.012942     Validation Loss: 1.879732
Epoch: 14   Training Loss: 0.002037     Validation Loss: 2.399405
Epoch: 15   Training Loss: 0.035908     Validation Loss: 1.960887
Epoch: 16   Training Loss: 0.051137     Validation Loss: 2.226335
Epoch: 17   Training Loss: 0.003953     Validation Loss: 2.619108
Epoch: 18   Training Loss: 0.000381     Validation Loss: 2.746541
Epoch: 19   Training Loss: 0.094646     Validation Loss: 3.555713
Epoch: 20   Training Loss: 0.022620     Validation Loss: 2.833098
Epoch: 21   Training Loss: 0.004800     Validation Loss: 4.181845
Epoch: 22   Training Loss: 0.014128     Validation Loss: 1.933705
Epoch: 23   Training Loss: 0.026109     Validation Loss: 2.888344
Epoch: 24   Training Loss: 0.000768     Validation Loss: 3.029443
Epoch: 25   Training Loss: 0.000327     Validation Loss: 3.079959
Epoch: 26   Training Loss: 0.000121     Validation Loss: 3.578420
Epoch: 27   Training Loss: 0.148478     Validation Loss: 3.297387
Epoch: 28   Training Loss: 0.030328     Validation Loss: 2.218535
Epoch: 29   Training Loss: 0.001673     Validation Loss: 2.934132
Epoch: 30   Training Loss: 0.000253     Validation Loss: 3.215722
Run Code Online (Sandbox Code Playgroud)

我的损失没有收敛。我正在研究马匹与人类数据集。tensorflow 中有一个官方笔记本,它的作用就像一个魅力。当我尝试用 pytorch 复制相同的内容时,损失并没有收敛。你能看看吗?

我正在使用criterion = nn.BCEWithLogitsLoss()optimizer = optim.RMSprop(model.parameters(), lr=0.001)。虽然它似乎对 Training Loss 有一些影响,但 Validation loss 看起来像随机数并且没有形成任何模式。损失不收敛的可能原因是什么?

这是我的 CNN 架构:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # convolutional layer (sees 298x298x3 image tensor)
        self.conv1 = nn.Conv2d(3, 16, 3)
        # convolutional layer (sees 147x147x16 tensor)
        self.conv2 = nn.Conv2d(16, 32, 3)
        # convolutional layer (sees 71x71x32 tensor)
        self.conv3 = nn.Conv2d(32, 64, 3)
        # convolutional layer (sees 33x33x64 tensor)
        self.conv4 = nn.Conv2d(64, 64, 3)
        # convolutional layer (sees 14x14x64 tensor)
        self.conv5 = nn.Conv2d(64, 64, 3)
        # max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        # linear layer (64 * 7 * 7 -> 500)
        self.fc1 = nn.Linear(3136, 512)
        # linear layer (512 -> 1)
        self.fc2 = nn.Linear(512, 1)
        # dropout layer (p=0.25)
        self.dropout = nn.Dropout(0.25)

    def forward(self, x):
        # add sequence of convolutional and max pooling layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = self.pool(F.relu(self.conv4(x)))
        x = self.pool(F.relu(self.conv5(x)))

        # flatten image input
        x = x.view(-1, 64 * 7 * 7)
        # add dropout layer
        x = self.dropout(x)
        # add 1st hidden layer, with relu activation function
        x = F.relu(self.fc1(x))
        # add dropout layer
        x = self.dropout(x)
        # add 2nd hidden layer
        x = self.fc2(x)
        return x
Run Code Online (Sandbox Code Playgroud)

这是完整的 jupyter 笔记本。对于无法创建最小的可复制示例代码,我们深表歉意。

trs*_*chn 0

我认为问题出在dataloaders,我注意到,你没有传递samplersloaders这里:

\n\n
# define samplers for obtaining training and validation batches\ntrain_sampler = SubsetRandomSampler(train_idx)\nvalid_sampler = SubsetRandomSampler(valid_idx)\n\ntrain_loader = torch.utils.data.DataLoader(\n        train_dataset,\n        batch_size=16,\n        num_workers=0,\n        shuffle=True\n    )\n\ntest_loader = torch.utils.data.DataLoader(\n        test_dataset,\n        batch_size=16,\n        num_workers=0,\n        shuffle=True\n    )\n
Run Code Online (Sandbox Code Playgroud)\n\n

我从未使用过Samplers,所以我现在不知道如何正确使用它们,但我想你想做这样的事情:

\n\n
# define samplers for obtaining training and validation batches\ntrain_sampler = SubsetRandomSampler(train_idx)\nvalid_sampler = SubsetRandomSampler(valid_idx)\n\ntrain_loader = torch.utils.data.DataLoader(\n        train_dataset,\n        sampler=train_sampler,\n        batch_size=16,\n        num_workers=0,\n        shuffle=True\n    )\n\ntest_loader = torch.utils.data.DataLoader(\n        test_dataset,\n        sampler=valid_sampler,\n        batch_size=16,\n        num_workers=0,\n        shuffle=True\n    )\n
Run Code Online (Sandbox Code Playgroud)\n\n

根据文档:

\n\n
\n

采样器(采样器,可选)\xe2\x80\x93 定义从数据集中抽取样本的策略。如果指定,shuffle 必须为 False。

\n
\n\n

如果您使用采样器,您应该关闭随机播放。

\n