Insightface: HOROVOD_WITH_MXNET=1 to debug the build error. any bady can help me.

Created on 30 Nov 2020  路  7Comments  路  Source: deepinsight/insightface

run.sh but I get a error like this, I have tried mxnet 1.6.0, mxnet-cu101, but it is not work .the horovodrun --check like this .

Horovod v0.19.2:

Available Frameworks:
    [X] TensorFlow
    [X] PyTorch
    [ ] MXNet

Available Controllers:
    [X] MPI
    [X] Gloo

Available Tensor Operations:
    [X] NCCL
    [ ] DDL
    [ ] CCL
    [X] MPI
    [X] Gloo
  • my cuda version is 10.02 . so , Is my cuda version is wrong ???

when I run.sh , the problem like this .

[ps-SYS-4028GR-TR:13182] Warning: could not find environment variable "LD_LIBRARY_PATH"
Traceback (most recent call last):
  File "train_memory.py", line 14, in <module>
    import horovod.mxnet as hvd
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
    __file__, 'mpi_lib')
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
  File "train_memory.py", line 14, in <module>
    import horovod.mxnet as hvd
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
    __file__, 'mpi_lib')
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
  File "train_memory.py", line 14, in <module>
    import horovod.mxnet as hvd
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
    __file__, 'mpi_lib')
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
  File "train_memory.py", line 14, in <module>
    import horovod.mxnet as hvd
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
    __file__, 'mpi_lib')
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "train_memory.py", line 14, in <module>
    import horovod.mxnet as hvd
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
    __file__, 'mpi_lib')
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
  File "train_memory.py", line 14, in <module>
    import horovod.mxnet as hvd
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
    __file__, 'mpi_lib')
  File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
    'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built.  If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[35340,1],1]
  Exit code:    1

Most helpful comment

if you can help me, thank u so much

maybe you have a mxnet of cpu version, we use the specifed version of mxnet is [mxnet-cu101 1.6.0.post0]. you can check this.

All 7 comments

@yingfeng

@nttstar

@ppwwyyxx

@leondgarse

if you can help me, thank u so much

if you can help me, thank u so much

maybe you have a mxnet of cpu version, we use the specifed version of mxnet is [mxnet-cu101 1.6.0.post0]. you can check this.

Thank you so much!, Have a good day!

Was this page helpful?
0 / 5 - 0 ratings