Pytorch动态层数?

SLu*_*uck 7 pytorch pytorch-lightning

我试图指定动态的层数,但我似乎做错了。我的问题是,当我在这里定义 100 层时,我会在前进步骤中收到错误。但是当我正确定义图层时它会起作用吗?下面是简化的示例

class PredictFromEmbeddParaSmall(LightningModule):
    def __init__(self, hyperparams={'lr': 0.0001}):
        super(PredictFromEmbeddParaSmall, self).__init__()
        #Input is something like tensor.size=[768*100]
        self.TO_ILLUSTRATE = nn.Linear(768, 5)
        self.enc_ref=[]
        for i in range(100):
            self.enc_red.append(nn.Linear(768, 5))
        # gather the layers output sth
        self.dense_simple1 = nn.Linear(5*100, 2)
        self.output = nn.Sigmoid()
    def forward(self, x):
        # first input to enc_red
        x_vecs = []
        for i in range(self.para_count):
            layer = self.enc_red[i]
            # The first dim is the batch size here, output is correct
            processed_slice = x[:, i * 768:(i + 1) * 768]
            # This works and give the out of size 5
            rand = self.TO_ILLUSTRATE(processed_slice)
            #This will fail? Error below
            ret = layer(processed_slice)
            #more things happening we can ignore right now since we fail earlier
Run Code Online (Sandbox Code Playgroud)

执行“ret = layer.forward(processed_slice)”时出现此错误

RuntimeError:预期的设备类型 cuda 对象,但在调用 _th_addmm 时为参数 #1“self”获取了设备类型 cpu

有没有更聪明的方法来编程?或者解决错误?

Vic*_*zzi 10

您应该使用 pytorch 中的 ModuleList 而不是列表: https: //pytorch.org/docs/master/ generated/torch.nn.ModuleList.html 。这是因为 Pytorch 必须保留模型的所有模块的图表,如果您只是将它们添加到列表中,它们不会在图表中正确索引,从而导致您面临的错误。

你的代码应该是类似的:

class PredictFromEmbeddParaSmall(LightningModule):
    def __init__(self, hyperparams={'lr': 0.0001}):
        super(PredictFromEmbeddParaSmall, self).__init__()
        #Input is something like tensor.size=[768*100]
        self.TO_ILLUSTRATE = nn.Linear(768, 5)
        self.enc_ref=nn.ModuleList()                     # << MODIFIED LINE <<
        for i in range(100):
            self.enc_red.append(nn.Linear(768, 5))
        # gather the layers output sth
        self.dense_simple1 = nn.Linear(5*100, 2)
        self.output = nn.Sigmoid()
    def forward(self, x):
        # first input to enc_red
        x_vecs = []
        for i in range(self.para_count):
            layer = self.enc_red[i]
            # The first dim is the batch size here, output is correct
            processed_slice = x[:, i * 768:(i + 1) * 768]
            # This works and give the out of size 5
            rand = self.TO_ILLUSTRATE(processed_slice)
            #This will fail? Error below
            ret = layer(processed_slice)
            #more things happening we can ignore right now since we fail earlier
Run Code Online (Sandbox Code Playgroud)

那么它应该可以正常工作!

编辑:替代方式。

除了使用之外,ModuleList您还可以只使用nn.Sequential,这可以让您避免for在前向传递中使用循环。这也意味着您将无法访问中间激活,因此如果您需要它们,这不是适合您的解决方案。

class PredictFromEmbeddParaSmall(LightningModule):
    def __init__(self, hyperparams={'lr': 0.0001}):
        super(PredictFromEmbeddParaSmall, self).__init__()
        #Input is something like tensor.size=[768*100]
        self.TO_ILLUSTRATE = nn.Linear(768, 5)
        self.enc_ref=[]
        for i in range(100):
            self.enc_red.append(nn.Linear(768, 5))

        self.enc_red = nn.Seqential(*self.enc_ref)       # << MODIFIED LINE <<
        # gather the layers output sth
        self.dense_simple1 = nn.Linear(5*100, 2)
        self.output = nn.Sigmoid()
    def forward(self, x):
        # first input to enc_red
        x_vecs = []
        out = self.enc_red(x)                            # << MODIFIED LINE <<

Run Code Online (Sandbox Code Playgroud)


Out*_*ime 5

此处发布了更多可调整的解决方案,具体取决于您的具体情况的品味或复杂性。

作为参考,我在这里发布了代码的调整版本:

import torch
from torch import nn, optim
from torch.nn.modules import Module
from implem.settings import settings


class Model(nn.Module):
    def __init__(self, input_size, layers_data: list, learning_rate=0.01, optimizer=optim.Adam):
        super().__init__()
        self.layers = nn.ModuleList()
        self.input_size = input_size  # Can be useful later ...
        for size, activation in layers_data:
            self.layers.append(nn.Linear(input_size, size))
            input_size = size  # For the next layer
            if activation is not None:
                assert isinstance(activation, Module), \
                    "Each tuples should contain a size (int) and a torch.nn.modules.Module."
                self.layers.append(activation)

        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.learning_rate = learning_rate
        self.optimizer = optimizer(params=self.parameters(), lr=learning_rate)

    def forward(self, input_data):
        for layer in self.layers:
            input_data = layer(input_data)
        return input_data


# test that the net is working properly 
if __name__ == "__main__":
    data_size = 5
    layer1, layer2 = 10, 10
    output_size = 2
    data = torch.randn(data_size)
    mlp = Model(data_size, [(layer1, nn.ReLU()), (layer2, nn.ReLU()), (output_size, nn.Sigmoid())])
    output = mlp(data)
    print("done")
Run Code Online (Sandbox Code Playgroud)