Insightface: MXNetError after first detection and recognition

Created on 29 Oct 2018  Â·  10Comments  Â·  Source: deepinsight/insightface

Hi all,

I am trying to apply this repository to a server side face register and recognition service. I have tried to detect faces and generate embeddings for all the photos in a directory (contains 300 photos) and it worked fine. However, when I attach it to local server code, it can only do a single face detection and embedding generation. Once a second detection is called, it raises an error.

my configuration is CUDA 9.0, mxnet 1.3.0, cudnn7, python2.7

Detailed error messages here:

File "server.py", line 118, in login
login_res, message = face_verification(file_path, regis_path, username)
File "server.py", line 14, in face_verification
result, data = server_function.verify(embedding_dir, photo_dir, login_id)
File "/home/wenbin/project/mxnet_faceID/server_function.py", line 88, in verify
img_tmp = model.get_input(image)
File "/home/wenbin/project/mxnet_faceID/face_model.py", line 71, in get_input
ret = self.detector.detect_face(face_img, det_type = self.args.det)
File "/home/wenbin/project/mxnet_faceID/mtcnn_detector.py", line 493, in detect_face
output = self.LNet.predict(input_buf)
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/model.py", line 717, in predict
o_list.append(o_nd[0:real_size].asnumpy())
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 1894, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/base.py", line 210, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [11:00:54] src/operator/nn/./cudnn/cudnn_convolution-inl.h:156: Check failed: e == CUDNN_STATUS_SUCCESS (7 vs. 0) cuDNN: CUDNN_STATUS_MAPPING_ERROR

Stack trace returned 10 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f1372ee4dcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1372ee5938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::op::CuDNNConvolutionOp::Forward(mxnet::OpContext const&, std::vector > const&, std::vector > const&, std::vector > const&)+0x389) [0x7f1377346829]
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::op::ConvolutionCompute(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector > const&, std::vector > const&, std::vector > const&)+0xbfc) [0x7f137733bbec]
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0x59) [0x7f13754883f9]
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(+0x317c8d3) [0x7f13754348d3]
[bt] (6) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0x8e5) [0x7f1375a92185]
[bt] (7) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>
, std::shared_ptr const&)+0xeb) [0x7f1375aa931b]
[bt] (8) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#3}::operator()() const::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr&&)+0x4e) [0x7f1375aa958e]
[bt] (9) /home/wenbin/mxnet/lib/libmxnet.so(std::thread::_Impl)> (std::shared_ptr)> >::_M_run()+0x4a) [0x7f1375a9178a]

[11:00:54] src/resource.cc:262: Ignore CUDA Error [11:00:54] src/storage/./pooled_storage_manager.h:85: CUDA: an illegal memory access was encountered

Stack trace returned 10 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f1372ee4dcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1372ee5938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::storage::GPUPooledStorageManager::DirectFreeNoLock(mxnet::Storage::Handle)+0x95) [0x7f1375ab5815]
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::storage::GPUPooledStorageManager::DirectFree(mxnet::Storage::Handle)+0x3d) [0x7f1375ab81bd]
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::StorageImpl::DirectFree(mxnet::Storage::Handle)+0x68) [0x7f1375ab1418]
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler [bt] (6) /home/wenbin/mxnet/lib/libmxnet.so(+0x37dfe01) [0x7f1375a97e01]
[bt] (7) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0x8e5) [0x7f1375a92185]
[bt] (8) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock
, bool)+0x65) [0x7f1375aad085]
[bt] (9) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function > const&, std::vector > const&, mxnet::FnProperty, int, char const*, bool)+0x1b0) [0x7f1375a98400]

Any help would be appreciated!

Most helpful comment

@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.

To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.

You can also find some more information here: https://github.com/apache/incubator-mxnet/issues/3946

Hope this will help you solve your problem.

All 10 comments

another error that can happen is like this:

terminate called after throwing an instance of 'dmlc::Error'
what(): [15:44:39] src/engine/./threaded_engine.h:379: array::at: __n (which is 18) >= _Nm (which is 7)
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 9 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f4f5fe2ddcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f4f5fe2e938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0xfa9) [0x7f4f629db849]
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>
, std::shared_ptr const&)+0xeb) [0x7f4f629f231b]
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#3}::operator()() const::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr&&)+0x4e) [0x7f4f629f258e]
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(std::thread::_Impl)> (std::shared_ptr)> >::_M_run()+0x4a) [0x7f4f629da78a]
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f4f77694c80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f4f960066ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f4f95d3c41d]

same problem, unpredictable occurence

@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.

To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.

You can also find some more information here: https://github.com/apache/incubator-mxnet/issues/3946

Hope this will help you solve your problem.

Thank you very mach @WIll-Xu35

Hi, @WIll-Xu35 , Thank you.
You are my life saver!

Thank you @WIll-Xu35 , I'm struggling this whole afternoon because of this error :(
For more specific:

app.run(threaded=False)

@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.

To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.

You can also find some more information here: apache/incubator-mxnet#3946

Hope this will help you solve your problem.

大佬牛逼,给跪了

@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.

To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.

You can also find some more information here: apache/incubator-mxnet#3946

Hope this will help you solve your problem.

Nice job!

@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.

To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.

You can also find some more information here: apache/incubator-mxnet#3946

Hope this will help you solve your problem.

Thank you @WIll-Xu35 , I'm struggling this whole afternoon because of this error :(
For more specific:

app.run(threaded=False)

感谢大佬,完美解决我的问题

@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.

To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.

You can also find some more information here: apache/incubator-mxnet#3946

Hope this will help you solve your problem.

Thank you @WIll-Xu35 , I'm struggling this whole afternoon because of this error :(
For more specific:

app.run(threaded=False)

Thank you!! It works for me!!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xysong1201 picture xysong1201  Â·  4Comments

nmzszxsl01 picture nmzszxsl01  Â·  4Comments

yja1 picture yja1  Â·  4Comments

AnhVPB picture AnhVPB  Â·  4Comments

mdv3101 picture mdv3101  Â·  5Comments