import mxnet as mx
import numpy as np
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])
data = mx.sym.Variable('data')
out = mx.sym.LeakyReLU(data=data, act_type='rrelu')
mod = mx.mod.Module(symbol=out, label_names=None)
mod.bind(data_shapes=[('data', (1, 10))])
mod.init_params()
data1 = [mx.nd.ones((1, 10))]
mod.forward(Batch(data1))
print(mod.get_outputs()[0].asnumpy())
Using rrelu activation type of the LeakyRelu operator I either get a seg fault or it errors out with the following stack trace -
Traceback (most recent call last):
File "/Users/aanirud/Code/scripts/bug.py", line 15, in <module>
print(mod.get_outputs()[0].asnumpy())
File "/Users/aanirud/anaconda2/envs/mxnet2.7/lib/python2.7/site-packages/mxnet-1.5.0-py2.7.egg/mxnet/ndarray/ndarray.py", line 1995, in asnumpy
ctypes.c_size_t(data.size)))
File "/Users/aanirud/anaconda2/envs/mxnet2.7/lib/python2.7/site-packages/mxnet-1.5.0-py2.7.egg/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [21:18:55] include/mxnet/./resource.h:155: Check failed: req.type == ResourceRequest::kTempSpace (459100160 vs. 1)
Stack trace returned 10 entries:
[bt] (0) 0 libmxnet.so 0x00000001063f0034 dmlc::StackTrace() + 276
[bt] (1) 1 libmxnet.so 0x00000001063efdef dmlc::LogMessageFatal::~LogMessageFatal() + 47
[bt] (2) 2 libmxnet.so 0x0000000106855685 mshadow::Tensor<mshadow::cpu, 1, unsigned int> mxnet::Resource::get_space_typed<mshadow::cpu, 1, unsigned int>(mshadow::Shape<1>, mshadow::Stream<mshadow::cpu>*) const + 277
[bt] (3) 3 libmxnet.so 0x0000000107aa667e mxnet::op::LeakyReLUOp<mshadow::cpu, float>::Forward(mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 894
[bt] (4) 4 libmxnet.so 0x0000000107a16283 mxnet::op::OperatorState::Forward(mxnet::OpContext const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&, std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType> > const&, std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob> > const&) + 1795
[bt] (5) 5 libmxnet.so 0x0000000107871cc7 mxnet::exec::StatefulComputeExecutor::Run(mxnet::RunContext, bool) + 87
[bt] (6) 6 libmxnet.so 0x000000010789d105 std::__1::__function::__func<mxnet::exec::GraphExecutor::CreateCachedSegOpr(unsigned long, unsigned long)::$_7, std::__1::allocator<mxnet::exec::GraphExecutor::CreateCachedSegOpr(unsigned long, unsigned long)::$_7>, void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>::operator()(mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&) + 117
[bt] (7) 7 libmxnet.so 0x0000000107865cdc mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) + 652
[bt] (8) 8 libmxnet.so 0x0000000107869421 mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>)::operator()(std::__1::shared_ptr<dmlc::ManualEvent>) const + 129
[bt] (9) 9 libmxnet.so 0x0000000107869337 std::__1::__function::__func<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>), std::__1::allocator<mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::'lambda'()::operator()() const::'lambda'(std::__1::shared_ptr<dmlc::ManualEvent>)>, void (std::__1::shared_ptr<dmlc::ManualEvent>)>::operator()(std::__1::shared_ptr<dmlc::ManualEvent>&&) + 39
other activation types work fine.
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug
@mxnet-label-bot add [Backend, Operator, Bug]
@anirudh2290 can we close this as Training crash SSD with LeakyReLU(rrelu) #12894 is tracking the same issue.
@Vikas89 I would prefer to keep this open, as it has a minimum reproducible example. And from the issue description of #12894 it would seem #12894 is a bigger issue as it says "Replacing LeakyReLU with activations at other positions also causes the training to crash".
This issue tracks a specific bug in a specific operator, with a example that will need to be included as a test case once the fix is made.
@anirudhacharya , which MXNet version are you using? In case you are using master, can you specify the build flags?
fyi, PR #14582 is trying to solve this issue.
I used the latest master, cannot recollect the compile flags i had used back then. But this error is reproducible even with the latest PyPi package.
Hello, I installed latest 2019-08-23 build using sudo -H pip3 install mxnet-cu100==1.6.0b20190823 - issue still present there.
Most helpful comment
fyi, PR #14582 is trying to solve this issue.
I used the latest master, cannot recollect the compile flags i had used back then. But this error is reproducible even with the latest PyPi package.