ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)

问题

1	ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)

出现这个错误的情况是,在服务器上的docker中运行训练代码时,batch size设置得过大,shared memory不够(因为docker限制了shm).

根据PyTorch README：

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with —ipc=host or —shm-size command line options to nvidia-docker run.