run.sh but I get a error like this, I have tried mxnet 1.6.0, mxnet-cu101, but it is not work .the horovodrun --check like this .
Horovod v0.19.2:
Available Frameworks:
[X] TensorFlow
[X] PyTorch
[ ] MXNet
Available Controllers:
[X] MPI
[X] Gloo
Available Tensor Operations:
[X] NCCL
[ ] DDL
[ ] CCL
[X] MPI
[X] Gloo
when I run.sh , the problem like this .
[ps-SYS-4028GR-TR:13182] Warning: could not find environment variable "LD_LIBRARY_PATH"
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
Traceback (most recent call last):
File "train_memory.py", line 14, in <module>
import horovod.mxnet as hvd
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/mxnet/__init__.py", line 23, in <module>
__file__, 'mpi_lib')
File "/home/liuyang/anaconda3/envs/mxnet_partial/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.mxnet has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_MXNET=1 to debug the build error.
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[35340,1],1]
Exit code: 1
@yingfeng
@nttstar
@ppwwyyxx
@leondgarse
if you can help me, thank u so much
if you can help me, thank u so much
maybe you have a mxnet of cpu version, we use the specifed version of mxnet is [mxnet-cu101 1.6.0.post0]. you can check this.
Thank you so much!, Have a good day!
Most helpful comment
maybe you have a mxnet of cpu version, we use the specifed version of mxnet is [mxnet-cu101 1.6.0.post0]. you can check this.