Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
I use the command CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/ssd512_coco.py 4 --validate to do multi-gpu training. After the message about loading coco annotations, it comes out with error messages, provided on Error traceback part:
It seems to be OpenMP problem, but I have no idea how to solve it.
Reproduction
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/ssd512_coco.py 4 --validate
Environment
$PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)Error traceback
OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Bug fix
The issue addresses that there are some errors about intel-openmp=2019.5.
So the suggested solution would be downgrading the intel-openmp version by
conda install -y intel-openmp-2019.4
Hi @wakananai ,
I got the same error and then solve it by conda install -y intel-openmp=2019.4. I followed this issue.
Hi @LcDog ,
After downgrading the intel-openmp version by conda install -y intel-openmp=2019.4, the multi-gpu training code can be run without error.
Thank you for your kind assistance.
Most helpful comment
Hi @wakananai ,
I got the same error and then solve it by
conda install -y intel-openmp=2019.4. I followed this issue.