使用 pytorch dataparallel 时设备 ID 无效?

Tou*_*ind 5 python-3.x deep-learning pytorch

环境\xef\xbc\x9a

\n\n
    \n
  • WIN10
  • \n
  • 火炬1.3.0
  • \n
  • python3.7
  • \n
\n\n

问题\xef\xbc\x9a

\n\n

我在用dataparallel在 Pytorch 中使用两个 2080Ti GPU。代码如下:

\n\n
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")\n\nmodel = Darknet(opt.model_def)  \nmodel.apply(weights_init_normal) \n\nmodel = nn.DataParallel(model, device_ids=[0, 1]).to(device)\n
Run Code Online (Sandbox Code Playgroud)\n\n

但是当运行此代码时,我遇到以下错误:

\n\n
Traceback (most recent call last):\n  File "C:/Users/Administrator/Desktop/PyTorch-YOLOv3-master/train.py", line 74, in <module>\n    model = nn.DataParallel(model, device_ids=[0, 1]).to(device)\n  File "C:\\Users\\Administrator\\Anaconda3\\envs\\py37_torch1.3\\lib\\site-packages\\torch\\nn\\parallel\\data_parallel.py", line 133, in __init__\n    _check_balance(self.device_ids)\n  File "C:\\Users\\Administrator\\Anaconda3\\envs\\py37_torch1.3\\lib\\site-packages\\torch\\nn\\parallel\\data_parallel.py", line 19, in _check_balance\n    dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]\n  File "C:\\Users\\Administrator\\Anaconda3\\envs\\py37_torch1.3\\lib\\site-packages\\torch\\nn\\parallel\\data_parallel.py", line 19, in <listcomp>\n    dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]\n  File "C:\\Users\\Administrator\\Anaconda3\\envs\\py37_torch1.3\\lib\\site-packages\\torch\\cuda\\__init__.py", line 337, in get_device_properties\n    raise AssertionError("Invalid device id")\nAssertionError: Invalid device id\n
Run Code Online (Sandbox Code Playgroud)\n\n

当我调试它时,我发现该函数device_count()返回get_device_properties()1,而我的机器上有 2 个 GPU。并torch._C._cuda_getDeviceCount()在 Anaconda Prompt 中返回 2。怎么了?

\n\n

问题:

\n\n

如何解决这个问题?\n我怎样才能通过 dataparallel 来使用两个 GPU?\n谢谢你们!

\n

cer*_*rou 5

基本上正如@ToughMind 所指出的,我们需要指定

os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1"
Run Code Online (Sandbox Code Playgroud)

但这取决于一个人的设备中可用的 CUDA 设备,因此,如果有人有一个 GPU,则可能适合放置,例如,

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
Run Code Online (Sandbox Code Playgroud)