我试图用 torchmeta 创建一个 pytorch 分布式数据laoder,但它因死锁而失败:
python ~/ultimate-utils/tutorials_for_myself/my_torchmeta/torchmeta_ddp.py
test_basic_ddp_example
ABOUT TO SPAWN WORKERS (via mp.spawn)
-> started ps with rank=0
-> rank=0
-> mp.current_process()=<SpawnProcess name='SpawnProcess-1' parent=54167 started>
-> os.getpid()=54171
device=device(type='cpu')
----> setting up rank=0 (with world_size=4)
---> MASTER_ADDR='127.0.0.1'
---> 57813
---> backend='gloo'
-> started ps with rank=2
-> rank=2
-> mp.current_process()=<SpawnProcess name='SpawnProcess-3' parent=54167 started>
-> os.getpid()=54173
device=device(type='cpu')
----> setting up rank=2 (with world_size=4)
---> MASTER_ADDR='127.0.0.1'
---> 57813
---> backend='gloo'
-> started ps with rank=1
-> rank=1
-> mp.current_process()=<SpawnProcess name='SpawnProcess-2' parent=54167 started> …
Run Code Online (Sandbox Code Playgroud)