Incubator-mxnet: run errors

Created on 17 Jan 2017 · 28Comments · Source: apache/incubator-mxnet

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System:ubuntu 14

Compiler:AuthenticAMD

Package used (Python/R/Scala/Julia):Python

MXNet version:0.9

Or if installed from source:yes

MXNet commit hash (git rev-parse HEAD):git

If you are using python package, please provide

Python version and distribution:2.7

If you are using R package, please provide

R sessionInfo():

Error Message:

mxnet.base.MXNetError: src/c_api/c_api_ndarray.cc:270: Operrator _zeros cannot be run; requires at least one of FCompute, NDArrayFunction, FCreateOperator be registered

Please paste the full error message, including stack trace.

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

run the demo.py of https://github.com/Seanlinx/mtcnn
2.
3.

What have you tried to solve it?

1.have no idea to solve it
2.
3.

Source

Cv9527

Most helpful comment

I looked deeper into it and noticed that this error occurs when not switching from mx.gpu(0) to mx.cpu(0) when the GPU is not available. This might shed some light on the OP original problem: maybe the gpu is not recognized.

mateuszr on 20 Feb 2017

👍4 👎1

All 28 comments

please update to latest master and make clean && make

piiswrong on 17 Jan 2017

Thx, however, I update the latest master and the errors can not be solved

Cv9527 on 18 Jan 2017

@Cv9527 @piiswrong I encounter the same problem, have you already solved it? this problem only occured if I set gpu mode, so I can not run test in gpu environment. my mxnet is the latest version. there is no error when compile.

[16:37:30] src/c_api/c_api_ndarray.cc:270: Operator _zeros cannot be run; requires at least one of FCompute<xpu>, NDArrayFunction, FCreateOperator be registered

Stack trace returned 40 entries:
[bt] (0) 0   libmxnet.so                         0x000000010efc8238 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x000000010f58a37d MXImperativeInvoke + 13389
[bt] (2) 2   ndarray.so                          0x000000011b3e3ff7 _ZL67__pyx_pf_7ndarray_22_make_ndarray_function_generic_ndarray_functionP7_objectS0_S0_ + 6519
[bt] (3) 3   ndarray.so                          0x000000011b3e25a0 _ZL68__pyx_pw_7ndarray_22_make_ndarray_function_1generic_ndarray_functionP7_objectS0_S0_ + 64
[bt] (4) 4   python                              0x000000010e133663 PyObject_Call + 99
[bt] (5) 5   python                              0x000000010e1ddf3d PyEval_EvalFrameEx + 30349
[bt] (6) 6   python                              0x000000010e1d6567 PyEval_EvalCodeEx + 2119
[bt] (7) 7   python                              0x000000010e1e29a6 fast_function + 118
[bt] (8) 8   python                              0x000000010e1ddc24 PyEval_EvalFrameEx + 29556
[bt] (9) 9   python                              0x000000010e1d6567 PyEval_EvalCodeEx + 2119
[bt] (10) 10  python                              0x000000010e1e29a6 fast_function + 118
[bt] (11) 11  python                              0x000000010e1ddc24 PyEval_EvalFrameEx + 29556
[bt] (12) 12  python                              0x000000010e1e2a82 fast_function + 338
[bt] (13) 13  python                              0x000000010e1ddc24 PyEval_EvalFrameEx + 29556
[bt] (14) 14  python                              0x000000010e1e2a82 fast_function + 338
[bt] (15) 15  python                              0x000000010e1ddc24 PyEval_EvalFrameEx + 29556
[bt] (16) 16  python                              0x000000010e1d6567 PyEval_EvalCodeEx + 2119
[bt] (17) 17  python                              0x000000010e15a88b function_call + 363
[bt] (18) 18  python                              0x000000010e133663 PyObject_Call + 99
[bt] (19) 19  python                              0x000000010e141786 instancemethod_call + 182
[bt] (20) 20  python                              0x000000010e133663 PyObject_Call + 99
[bt] (21) 21  python                              0x000000010e192bef slot_tp_init + 175
[bt] (22) 22  python                              0x000000010e18cfdb type_call + 347
[bt] (23) 23  python                              0x000000010e133663 PyObject_Call + 99
[bt] (24) 24  python                              0x000000010e1ddf3d PyEval_EvalFrameEx + 30349
[bt] (25) 25  python                              0x000000010e1d6567 PyEval_EvalCodeEx + 2119
[bt] (26) 26  python                              0x000000010e1e29a6 fast_function + 118
[bt] (27) 27  python                              0x000000010e1ddc24 PyEval_EvalFrameEx + 29556
[bt] (28) 28  python                              0x000000010e1d6567 PyEval_EvalCodeEx + 2119
[bt] (29) 29  python                              0x000000010e1e29a6 fast_function + 118
[bt] (30) 30  python                              0x000000010e1ddc24 PyEval_EvalFrameEx + 29556
[bt] (31) 31  python                              0x000000010e1d6567 PyEval_EvalCodeEx + 2119
[bt] (32) 32  python                              0x000000010e1e29a6 fast_function + 118
[bt] (33) 33  python                              0x000000010e1ddc24 PyEval_EvalFrameEx + 29556
[bt] (34) 34  python                              0x000000010e1d6567 PyEval_EvalCodeEx + 2119
[bt] (35) 35  python                              0x000000010e1d5d16 PyEval_EvalCode + 54
[bt] (36) 36  python                              0x000000010e205284 PyRun_FileExFlags + 164
[bt] (37) 37  python                              0x000000010e204dbe PyRun_SimpleFileExFlags + 702
[bt] (38) 38  python                              0x000000010e21b41d Py_Main + 2925
[bt] (39) 39  libdyld.dylib                       0x00007fffabb48255 start + 1

YuliangXiu on 19 Jan 2017

did you compile with gpu support?

piiswrong on 19 Jan 2017

yes, and I use osx 10.12 ,python2.7, openblas , my cuda is running well because I can use cuda or cudnn in torch @piiswrong

YuliangXiu on 19 Jan 2017

@YuliangXiu Do you use -D_GLIBCXX_USE_CXX11_ABI=0 ?
If so, try with -D_GLIBCXX_USE_CXX11_ABI=1

loofahcus on 29 Jan 2017

@loofahcus I've tried this with latest version but it didn't work

dongwu92 on 12 Feb 2017

I can confirm this bug, but with slightly different setup: latest mxnet from master branch, Ubuntu linux, no GPU, compiled against libopenblas 0.2.18, running the python "cnn text classification example"

```Traceback (most recent call last):
File "text_cnn.py", line 405, in
main()
File "text_cnn.py", line 357, in main
cnn_model = setup_cnn_model(mx.gpu(0), batch_size, sentence_size, num_embed, vocab_size, dropout=0.50)
File "text_cnn.py", line 82, in setup_cnn_model
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/ndarray.py", line 1064, in zeros
return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/_ctypes/ndarray.py", line 131, in generic_ndarray_function
c_array(ctypes.c_char_p, [c_str(str(i)) for i in kwargs.values()])))
File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/base.py", line 77, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [20:16:02] src/c_api/c_api_ndarray.cc:274: Operator _zeros cannot be run; requires at least one of FCompute, NDArrayFunction, FCreateOperator be registered

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f41a5d6a17c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.9.4-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvoke+0x636e) [0x7f41a68216ee]
[bt] (2) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f41b2905e40]
[bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f41b29058ab]
[bt] (4) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7f41b2b153df]
[bt] (5) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7f41b2b19d82]
[bt] (6) python(PyObject_Call+0x43) [0x4b0cb3]
[bt] (7) python(PyEval_EvalFrameEx+0x5faf) [0x4c9faf]
[bt] (8) python(PyEval_EvalCodeEx+0x255) [0x4c2765]
[bt] (9) python(PyEval_EvalFrameEx+0x68d1) [0x4ca8d1]
```

mateuszr on 20 Feb 2017

👍4 👎1

So, installing mxnet-ssd on GPU enabled machine is a most simplest solution ?

k-hashimoto on 7 Mar 2017

Yes, or if you're running one of the examples provided, just change every occurence of mx.gpu(0) to mx.cpu(0) to use CPU instead. This will be an order of magnitude slower, though.

mateuszr on 9 Mar 2017

I'm having the same error.
Installed latest mxnet with pip.

CentOS Linux release 7.3.1611 (Core)
Python 3.6.1
Cuda compilation tools, release 8.0, V8.0.61

undertherain on 11 Jun 2017

Just tried to recompile from source - same error.
used latest code from github

undertherain on 12 Jun 2017

okey, looks like with pip it always comes without GPU support

for compilation from sources, python3 setup.py install was not overwriting existing pip-installed package somehow.

After I manually deleted mxnet from dist-packages, and re-installed from sources it is working.

Would be nice to have GPU-enabled version from pip though..

undertherain on 13 Jun 2017

thanks, @undertherain is right, delete /usr/local/lib/python2.7/mxnet_* , then install again.

qxtian on 14 Jun 2017

pip install mxnet-cu80 worked for me

haehn on 12 Aug 2017

👍3

This issue is closed due to lack of activity in the last 90 days. Feel free to ping me to reopen if this is still an active issue. Thanks!
Also, do please check out our forum (and Chinese version) for general "how-to" questions.

szha on 11 Nov 2017

I still have this issue installing via pip pip install mxnet-cu92 on a machine with multiple gpus throws error:

Python 3.7.2 (default, Dec 29 2018, 06:19:36) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet as mx
>>> mx.nd.ones((3,4))

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
<NDArray 3x4 @cpu(0)>
>>> mx.nd.ones((3,4), mx.gpu())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/john/miniconda3/envs/apachenet/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 2367, in ones
    return _internal._ones(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
  File "<string>", line 34, in _ones
  File "/home/john/miniconda3/envs/apachenet/lib/python3.7/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/home/john/miniconda3/envs/apachenet/lib/python3.7/site-packages/mxnet/base.py", line 251, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [01:44:30] src/engine/threaded_engine.cc:320: Check failed: device_count_ > 0 (-1 vs. 0) GPU usage requires at least 1 GPU

kirk86 on 19 Jan 2019

usually this happens when CUDA or GPU driver is not properly installed. are you able to run nvidia-smi successfully?

szha on 19 Jan 2019

@szha thanks for the response but that's not the case. Everything works perfectly fine on this system, I literally mean everything, tensorflow, pytorch, you name it, except mxnet. The only possible combination that could work was mxnet-cu90 and that still exits with errors in end after running any example script. Please try to make binaries available through conda like everyone else does. pip is horrible trying to maintain your pkgs updated because doesn't work well with conda. conda update --all updates all conda pkgs but not those installed via pip.

kirk86 on 19 Jan 2019

thanks for the suggestion. which cuda version is available on the the system?

szha on 19 Jan 2019

I have versions 9.0 upto 10 installed in the system.

kirk86 on 19 Jan 2019

Given that you have multiple versions of CUDA (and assuming NV drivers too), it might be caused by a mix-up of libraries from different versions being loaded.

Since you have CUDA 9.0, let's make that work first.

conda activate $YOUR_ENV
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH
# assuming that you are using python3
python3 -m pip install --upgrade mxnet-cu90==1.3.1

szha on 20 Jan 2019

Given that you have multiple versions of CUDA (and assuming NV drivers too), it might be caused by a mix-up of libraries from different versions being loaded.

There are no NV drivers installed.

Since you have CUDA 9.0, let's make that work first.

Yes that's the only version of mxnet-cu90 which actually works. But it doesn't really explain why I can't use for instance mxnet-cu92 because for instance with tensorflow or pytorch I can install which ever version of cuda I want on my conda environment and both pick up that particular cuda version installed on the conda environment and work pretty well.

There's also an issue with train_imagenet.py, running that example complains about base_lr argument on the code base only after complete removing that default argument in on of the function definitions I was able to run the example code

kirk86 on 20 Jan 2019

There are no NV drivers installed.

I assume you meant that Nvidia drivers already exists on the platform, and that it's not installed through conda.

But it doesn't really explain why I can't use for instance mxnet-cu92

If you'd like to use cu92 version, currently the mxnet package relies on the users to install and make it available in LD_LIBRARY_PATH. It can be done through any runtime-environment management tool you like.

I will look into releasing mxnet and integrating with cuda toolkit on conda, given the positive feedback.

For train_imagenet.py, would you mind sharing what command you ran here or in a new issue? (cc @hetong007).

szha on 20 Jan 2019

👍1

@kirk86 Would you please share what version of GluonCV and what command you were using? Also feel free to open a separate issue in GluonCV repository instead: https://github.com/dmlc/gluon-cv/issues/new

hetong007 on 20 Jan 2019

@hetong007 I haven't installed GluonCV, only mxnet-cu90:

pip list
Package    Version   
---------- ----------
certifi    2018.11.29
chardet    3.0.4     
graphviz   0.8.4     
idna       2.6       
mxnet-cu90 1.3.1     
numpy      1.14.6    
pip        18.1      
requests   2.21.0    
setuptools 40.6.3    
urllib3    1.22      
wheel      0.32.3

This is the command I execute to train resnet on imagenet:

python ./incubator-mxnet/example/image-classification/train_imagenet.py --network resnet --num-layers 152 --data-train ./train_rec/train_imgnet_list_1.rec --data-val ./val_rec/val_imgnet_lst.rec --gpus 0,1,2,3 --batch-size 128 --model ./model/resnet152 --num-epochs 100 --kv-store device

I didn't open this issue on GluonCV because I'm not sure if it's 100% related with GluonCV.

Also please notice the --data-train and --data-valid have to point to the .rec files otherwise it doesn't work which is contrary to example given here which points only to the directory where the *.rec files live. Also multiple *.rec files don't work they have to be combined into one .rec file.

kirk86 on 20 Jan 2019

@kirk86 Sorry I misunderstood the reference of your train_imagenet.py, as this is irrelevant to GluonCV. Seems that you have MXNet 1.3.1 while the example script works for 1.4.0 and later versions. Could you update your mxnet and retry? Specifically the argument base_lr was added to MultiFactorScheduler after 1.3.0.

On the --data-train part, @rahul003 would you share your knowledge on the evolution of this argument?

hetong007 on 20 Jan 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings