Incubator-mxnet: Aborted python session when using nd.concat

Created on 10 Jul 2018  路  8Comments  路  Source: apache/incubator-mxnet

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

(Brief description of the problem in no more than 2 sentences.)
Aborted python session when using mx.nd.concat.

Environment info (Required)

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

----------Python Info----------
Version : 3.6.4
Compiler : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
Build : ('default', 'Jan 16 2018 12:04:33')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 10.0.1
Directory : /anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.1.0
Directory : /anaconda3/lib/python3.6/site-packages/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Darwin-16.7.0-x86_64-i386-64bit
system : Darwin
node : (redacted)
release : 16.7.0
version : Darwin Kernel Version 16.7.0: Fri Apr 27 17:59:46 PDT 2018; root:xnu-3789.73.13~1/RELEASE_X86_64
----------Hardware Info----------
machine : x86_64
processor : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT FPU_CSDS'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0202 sec, LOAD: 0.6980 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0241 sec, LOAD: 0.1687 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1739 sec, LOAD: 0.1249 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0207 sec, LOAD: 0.1206 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0088 sec, LOAD: 0.3548 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0203 sec, LOAD: 0.0632 sec.

Package used (Python/R/Scala/Julia):
Python

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)

Build config:
(Paste the content of config.mk, or the build command.)

Error Message:

(Paste the complete error message, including stack trace.)

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

import mxnet as mx
a = mx.nd.empty((0, 1))
b = mx.nd.ones((1, 1))
mx.nd.concat(a, b, dim=1)

I get this error:

terminate called after throwing an instance of 'dmlc::Error'
  what():  [21:01:32] src/engine/./threaded_engine.h:359: [21:01:32] src/operator/./mkl/mkl_concat-inl.h:196: Check failed: e == E_SUCCESS (-1 vs. 0)

Stack trace returned 10 entries:
[bt] (0) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276938) [0x7f32dfa2e938]
[bt] (1) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276d48) [0x7f32dfa2ed48]
[bt] (2) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2622c98) [0x7f32e1ddac98]
[bt] (3) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x25e9a63) [0x7f32e1da1a63]
[bt] (4) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x246b0fd) [0x7f32e1c230fd]
[bt] (5) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x246b436) [0x7f32e1c23436]
[bt] (6) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23ea163) [0x7f32e1ba2163]
[bt] (7) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23f0a3d) [0x7f32e1ba8a3d]
[bt] (8) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23f5109) [0x7f32e1bad109]
[bt] (9) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23f1ddb) [0x7f32e1ba9ddb]


A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276938) [0x7f32dfa2e938]
[bt] (1) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276d48) [0x7f32dfa2ed48]
[bt] (2) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23f0d40) [0x7f32e1ba8d40]
[bt] (3) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23f5109) [0x7f32e1bad109]
[bt] (4) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23f1ddb) [0x7f32e1ba9ddb]
[bt] (5) /ec2-user-anaconda3/envs/mxnet_p36/bin/../lib/libstdc++.so.6(+0xafc5c) [0x7f3315463c5c]
[bt] (6) /lib64/libpthread.so.0(+0x7123) [0x7f3321ad9123]
[bt] (7) /lib64/libc.so.6(clone+0x6d) [0x7f332180ac3d]


Aborted

and Python session is aborted. I tried it on another computer, and I got:

libc++abi.dylib: terminating with uncaught exception of type dmlc::Error: [14:04:59] src/engine/./threaded_engine.h:359: [14:04:59] src/operator/./mkl/mkl_concat-inl.h:196: Check failed: e == E_SUCCESS (-1 vs. 0)

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x0000000109960981 dmlc::StackTrace() + 305
[bt] (1) 1   libmxnet.so                         0x000000010996070f dmlc::LogMessageFatal::~LogMessageFatal() + 47
[bt] (2) 2   libmxnet.so                         0x000000010abc5af3 mxnet::op::MKLConcatOp<mshadow::cpu, float>::Forward(mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 2659
[bt] (3) 3   libmxnet.so                         0x000000010aac878a mxnet::op::OperatorState::Forward(mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 2042
[bt] (4) 4   libmxnet.so                         0x000000010a995964 mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::Resource, std::__1::allocator<mxnet::Resource> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::'lambda'(mxnet::RunContext, mxnet::engine::CallbackOnComplete)::operator()(mxnet::RunContext, mxnet::engine::CallbackOnComplete) const + 420
[bt] (5) 5   libmxnet.so                         0x000000010a995772 std::__1::__function::__func<mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::Resource, std::__1::allocator<mxnet::Resource> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::'lambda0'(mxnet::RunContext), std::__1::allocator<mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::Resource, std::__1::allocator<mxnet::Resource> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::'lambda0'(mxnet::RunContext)>, void (mxnet::RunContext)>::operator()(mxnet::RunContext&&) + 82
[bt] (6) 6   libmxnet.so                         0x000000010a93a650 std::__1::__function::__func<mxnet::engine::ThreadedEngine::PushSync(std::__1::function<void (mxnet::RunContext)>, mxnet::Context, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::$_1, std::__1::allocator<mxnet::engine::ThreadedEngine::PushSync(std::__1::function<void (mxnet::RunContext)>, mxnet::Context, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, std::__1::vector<mxnet::engine::Var*, std::__1::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::$_1>, void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>::operator()(mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&) + 64
[bt] (7) 7   libmxnet.so                         0x000000010a92f14b mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 571
[bt] (8) 8   libmxnet.so                         0x000000010a932b71 void std::__1::__invoke_void_return_wrapper<void>::__call<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)&, std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent> >(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)&&&, std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>&&) + 161
[bt] (9) 9   libmxnet.so                         0x000000010a92fc65 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void (std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)>, std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent> > >(void*) + 101


A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) 0   libmxnet.so                         0x0000000109960981 dmlc::StackTrace() + 305
[bt] (1) 1   libmxnet.so                         0x000000010996070f dmlc::LogMessageFatal::~LogMessageFatal() + 47
[bt] (2) 2   libmxnet.so                         0x000000010a92f442 mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 1330
[bt] (3) 3   libmxnet.so                         0x000000010a932b71 void std::__1::__invoke_void_return_wrapper<void>::__call<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)&, std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent> >(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)&&&, std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>&&) + 161
[bt] (4) 4   libmxnet.so                         0x000000010a92fc65 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::function<void (std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent>)>, std::__1::shared_ptr<mxnet::engine::ThreadPool::SimpleEvent> > >(void*) + 101
[bt] (5) 5   libsystem_pthread.dylib             0x00007fffc8aa693b _pthread_body + 180
[bt] (6) 6   libsystem_pthread.dylib             0x00007fffc8aa6887 _pthread_body + 0
[bt] (7) 7   libsystem_pthread.dylib             0x00007fffc8aa608d thread_start + 13


Abort trap: 6
Bug Unclear ErroDoc good first issue

All 8 comments

@alireza202 I think the reason is because you are concatenating on the wrong axis. Numpy would issue an error if you do the following:

>>> a = np.empty((0,1))
>>> b = np.ones((1,1))
>>> np.concatenate((a,b),axis=1)

That being said, MXNet should catch this error elegantly rather than throwing a core dump. @sandeep-krishnamurthy Could you please label this _Unclear Error/Doc_, _Bug_, _Good for first time_. Thanks

The issue could possibly be fixed by changing the InferShape of concat operator to adopt mutual shape inference.

you should switch to 1.2.0 to prevent the python process from crashing

The python crashing aside, how would I actually concatenate an empty tensor with another tensor? I tried both dimensions, and it fails:

>>> import mxnet as mx
>>> a = mx.nd.empty((0, 1))
>>> b = mx.nd.ones((1, 1))
>>> mx.nd.concat(a, b, dim=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 63, in concat
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [17:52:18] src/imperative/./imperative_utils.h:107: Check failed: infershape[attrs.op](attrs, &in_shapes, &out_shapes)

Stack trace returned 10 entries:
[bt] (0) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276938) [0x7fd8d40d0938]
[bt] (1) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x276d48) [0x7fd8d40d0d48]
[bt] (2) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2473416) [0x7fd8d62cd416]
[bt] (3) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x24757e9) [0x7fd8d62cf7e9]
[bt] (4) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x23b6ceb) [0x7fd8d6210ceb]
[bt] (5) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x63) [0x7fd8d6211283]
[bt] (6) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fd90e019ec0]
[bt] (7) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7fd90e01987d]
[bt] (8) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7fd90e22edee]
[bt] (9) /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x12825) [0x7fd90e22f825]

@alireza202 not fully sure what you mean by "concatenate an empty tensor with another tensor". Using the a and babove as an example case, could you paste that output that you're looking to get?

I am able to repro the issue. Will raise a PR for the fix by tomorrow

Thanks for your report.
Current MXNet does not support 0-size tensor, like mx.nd.empty((0, 1)).
We will fix it.

0-size tensor is supported in mx.np. Abort is fixed in the meantime (at least on latest master).

Was this page helpful?
0 / 5 - 0 ratings