本文主要是介绍pytorch-distributed traning,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
1. 单机多卡数据并行
```
model = xxx
losses = torch.nn.parallel.data_parallel(model, inputs=(), device_ids=[], dim=x) # functional style
```
2. pytorch-1.0 distributed
2.1 单机多卡
2.2 多级多卡
----
reference: https://pytorch.org/docs/stable/distributed.html
----
errors:
File "inference.py", line 111, in <module>model.load_state_dict(torch.load('./output/state-ckpt-epoch-final', map_location='cpu')) # ['model']File "/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 367, in loadreturn _load(f, map_location, pickle_module)File "/home/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 545, in _loaddeserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: storage has wrong size: expected 4363873583357797660 got 1
这篇关于pytorch-distributed traning的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!