Fairseq: OpenMP error

Created on 6 Jan 2020  路  2Comments  路  Source: pytorch/fairseq

馃悰 Bug

OpenMP error on running fairseq-train (even with --num-workers 0 )
fairseq-generate works crash-free

To Reproduce

RUN:
DATA_FOLDER=errorsim-pw1; ARCH=fconv; FOLDER=fconv_pw1_test2 fairseq-train ./data-bin/$DATA_FOLDER --lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000
--arch $ARCH --save-dir ./checkpoints/$FOLDER --no-progress-bar --log-interval 50
--num-workers 0 --cpu
(--cpu can be removed if you have a GPU, and the error is the same with any value of --num-workers)

ERROR LOG:
OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
OMP: Error #13: Assertion failure at z_Linux_util.cpp(2361).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Traceback (most recent call last):
File "/homes/3/serai/pytorch_vib/bin/fairseq-train", line 11, in
load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
File "/homes/3/serai/fairseq_vib/fairseq_cli/train.py", line 329, in cli_main
nprocs=args.distributed_world_size,
File "/homes/3/serai/pytorch_vib/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/homes/3/serai/pytorch_vib/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 107, in join
(error_index, name)
Exception: process 1 terminated with signal SIGABRT

Environment

  • fairseq Version: 0.8.0
  • PyTorch Version: 1.3.1
  • OS: RHEL7
  • How you installed fairseq (pip, source): Using pip, as an editable
  • Build command you used (if compiling from source): git clone https://github.com/pytorch/fairseq; cd fairseq; pip install --editable .
  • Python version: 3.7.5
  • CUDA/cuDNN version:
  • GPU models and configuration: Nvidia GeForce GTX 1080
  • Any other relevant information: OpenMP version is 3.1
    echo |cpp -fopenmp -dM |grep -i open prints "#define _OPENMP 201107"

This is a new-ish RHEL7 environment I'm trying to work with. So far whatever I've done with pytorch, fairseq, other libraries I'm using works fine.

bug

All 2 comments

Maybe this is relevant? It looks like they've since pulled the packages from the registry, so maybe you can just try to upgrade the package? Otherwise it looks like setting KMP_INIT_AT_FORK=FALSE helped others.

Looks like a bullseye! Setting KMP_INIT_AT_FORK=FALSE helped me too. Gonna look into upgrading my Anaconda to see if that helps.

Was this page helpful?
0 / 5 - 0 ratings