为什么 PyTorch nn.Module.cuda() 不移动模块张量而只移动参数和缓冲区到 GPU？

Question

为什么 PyTorch nn.Module.cuda() 不移动模块张量而只移动参数和缓冲区到 GPU？

nn.Module.cuda() 将所有模型参数和缓冲区移动到 GPU。

但是为什么不是模型成员张量呢？

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        self.expected_moved_cuda_tensor = torch.tensor([0, 2, 3])

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

toy_module = ToyModule()
toy_module.cuda()

Run Code Online (Sandbox Code Playgroud)

next(toy_module.layer.parameters()).device
>>> device(type='cuda', index=0)

Run Code Online (Sandbox Code Playgroud)

对于模型成员张量，设备保持不变。

>>> toy_module.expected_moved_cuda_tensor.device
device(type='cpu')

Run Code Online (Sandbox Code Playgroud)

Answer 1

jod*_*dag 7

如果您在模块内定义张量，则需要将其注册为参数或缓冲区，以便模块知道它。

参数是要训练的张量，将由返回model.parameters()。它们很容易注册，您需要做的就是将张量包装在nn.Parameter类型中，它会自动注册。请注意，只有浮点张量可以作为参数。

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a trainable parameter
        self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.]))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

Run Code Online (Sandbox Code Playgroud)

缓冲区是将在模块中注册的张量，因此像这样的方法.cuda()会影响它们但它们不会被model.parameters(). 缓冲区不限于特定的数据类型。

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a buffer
        # Note: this creates a new member variable named expected_moved_cuda_tensor
        self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3])))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

Run Code Online (Sandbox Code Playgroud)

在上述两种情况下，以下代码的行为相同

>>> toy_module = ToyModule()
>>> toy_module.cuda()
>>> next(toy_module.layer.parameters()).device
device(type='cuda', index=0)
>>> toy_module.expected_moved_cuda_tensor.device
device(type='cuda', index=0)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，7 月前
查看次数：	2412 次
最近记录：	5 年，7 月前