当使用单卡训练时运行命令:python tools/train.py ${CONFIG_FILE} [ARGS]是可以跑通的,但是使用官方提供的:bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]进行单机多卡训练时却报如下错误: ....torch.cuda.OutOfMemoryError: CUDA out of m
现象:exits with return code -7原因:Setting the shm-size to a large number instead of default 64MB when creating docker container solves the problem in my case. It appears that multi-gpu training relies on