Hi all,
I am trying to apply this repository to a server side face register and recognition service. I have tried to detect faces and generate embeddings for all the photos in a directory (contains 300 photos) and it worked fine. However, when I attach it to local server code, it can only do a single face detection and embedding generation. Once a second detection is called, it raises an error.
my configuration is CUDA 9.0, mxnet 1.3.0, cudnn7, python2.7
Detailed error messages here:
File "server.py", line 118, in login
login_res, message = face_verification(file_path, regis_path, username)
File "server.py", line 14, in face_verification
result, data = server_function.verify(embedding_dir, photo_dir, login_id)
File "/home/wenbin/project/mxnet_faceID/server_function.py", line 88, in verify
img_tmp = model.get_input(image)
File "/home/wenbin/project/mxnet_faceID/face_model.py", line 71, in get_input
ret = self.detector.detect_face(face_img, det_type = self.args.det)
File "/home/wenbin/project/mxnet_faceID/mtcnn_detector.py", line 493, in detect_face
output = self.LNet.predict(input_buf)
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/model.py", line 717, in predict
o_list.append(o_nd[0:real_size].asnumpy())
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/ndarray/ndarray.py", line 1894, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/wenbin/.local/lib/python2.7/site-packages/mxnet/base.py", line 210, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [11:00:54] src/operator/nn/./cudnn/cudnn_convolution-inl.h:156: Check failed: e == CUDNN_STATUS_SUCCESS (7 vs. 0) cuDNN: CUDNN_STATUS_MAPPING_ERROR
Stack trace returned 10 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f1372ee4dcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1372ee5938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::op::CuDNNConvolutionOp
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::op::ConvolutionCompute
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0x59) [0x7f13754883f9]
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(+0x317c8d3) [0x7f13754348d3]
[bt] (6) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0x8e5) [0x7f1375a92185]
[bt] (7) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptr
[bt] (8) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler
[bt] (9) /home/wenbin/mxnet/lib/libmxnet.so(std::thread::_Impl
[11:00:54] src/resource.cc:262: Ignore CUDA Error [11:00:54] src/storage/./pooled_storage_manager.h:85: CUDA: an illegal memory access was encountered
Stack trace returned 10 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f1372ee4dcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f1372ee5938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::storage::GPUPooledStorageManager::DirectFreeNoLock(mxnet::Storage::Handle)+0x95) [0x7f1375ab5815]
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::storage::GPUPooledStorageManager::DirectFree(mxnet::Storage::Handle)+0x3d) [0x7f1375ab81bd]
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::StorageImpl::DirectFree(mxnet::Storage::Handle)+0x68) [0x7f1375ab1418]
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler
[bt] (7) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0x8e5) [0x7f1375a92185]
[bt] (8) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock, bool)+0x65) [0x7f1375aad085]
[bt] (9) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::PushAsync(std::function
Any help would be appreciated!
another error that can happen is like this:
terminate called after throwing an instance of 'dmlc::Error'
what(): [15:44:39] src/engine/./threaded_engine.h:379: array::at: __n (which is 18) >= _Nm (which is 7)
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
Stack trace returned 9 entries:
[bt] (0) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f4f5fe2ddcb]
[bt] (1) /home/wenbin/mxnet/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f4f5fe2e938]
[bt] (2) /home/wenbin/mxnet/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock)+0xfa9) [0x7f4f629db849]
[bt] (3) /home/wenbin/mxnet/lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>, std::shared_ptr
[bt] (4) /home/wenbin/mxnet/lib/libmxnet.so(std::_Function_handler
[bt] (5) /home/wenbin/mxnet/lib/libmxnet.so(std::thread::_Impl
[bt] (6) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f4f77694c80]
[bt] (7) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f4f960066ba]
[bt] (8) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f4f95d3c41d]
same problem, unpredictable occurence
@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.
To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.
You can also find some more information here: https://github.com/apache/incubator-mxnet/issues/3946
Hope this will help you solve your problem.
Thank you very mach @WIll-Xu35
Hi, @WIll-Xu35 , Thank you.
You are my life saver!
Thank you @WIll-Xu35 , I'm struggling this whole afternoon because of this error :(
For more specific:
app.run(threaded=False)
@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.
To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.
You can also find some more information here: apache/incubator-mxnet#3946
Hope this will help you solve your problem.
大佬牛逼,给跪了
@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.
To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.
You can also find some more information here: apache/incubator-mxnet#3946
Hope this will help you solve your problem.
Nice job!
@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.
To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.
You can also find some more information here: apache/incubator-mxnet#3946
Hope this will help you solve your problem.
Thank you @WIll-Xu35 , I'm struggling this whole afternoon because of this error :(
For more specific:app.run(threaded=False)
感谢大佬,完美解决我的问题
@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.
To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.
You can also find some more information here: apache/incubator-mxnet#3946
Hope this will help you solve your problem.
Thank you @WIll-Xu35 , I'm struggling this whole afternoon because of this error :(
For more specific:app.run(threaded=False)
Thank you!! It works for me!!
Most helpful comment
@diggerdu I solved the problem by ensuring that only a single thread calls the initialized MXNet model. If multiple threads call the same model, this kind of error would happen.
To be more specific, in my server script I used flask and it, by default, enables multithreading to handle input requests. After I set multithreading parameter to false, everything works perfectly.
You can also find some more information here: https://github.com/apache/incubator-mxnet/issues/3946
Hope this will help you solve your problem.