dlib face descriptor generates NaNs when run on Jetson Nano

Created on 5 Apr 2019 · 19Comments · Source: davisking/dlib

Expected Behavior

Running example/face_recognition.py prints values of descriptors that are not NaN when running on Nvidia Jetson Nano.

Current Behavior

In reality, when run on Nvidia Jetson Nano, the descriptors come back as NaNs or Very high (~e+15) numbers. Note, this process works as expected (i.e. descriptors values between 0 and 1) when run on Jetson TX2.

Steps to Reproduce

Install dlib on Jetson Nano: pip3 install dlib
Run the example code to get face descriptors from the example images:
python3 face_recognition.py

Version: 19.17.0
Where did you get dlib: pip3 install dlib
Platform: Nvidia Jetson Nano, Ubuntu 18.04.2 LTS

inactive

Source

jonnboyd

Most helpful comment

That's weird. You don't modify anything?

I don't have a jetson nano to test on so someone else will have to debug this.

Nope, did not change anything, used the python example file face_recognition.py. I also tried to compile the C++ example for the same and same result.
I asked @e-fominov and he suggested to debug layer by layer implementation, it seems that there could be a bug at the input or output end of things in the CUDA implementation.
Testing is still pending, will update if I am able to get some resolution.

jonnboyd on 9 Apr 2019

👍3

All 19 comments

That's weird. You don't modify anything?

I don't have a jetson nano to test on so someone else will have to debug this.

davisking on 6 Apr 2019

I have the same problem. I have been testing C++ code and get nan or very large number

safavakili on 7 Apr 2019

That's weird. You don't modify anything?

I don't have a jetson nano to test on so someone else will have to debug this.

jonnboyd on 9 Apr 2019

👍3

Apparently if you run with CUDA memcheck you get these result:

cuda-memcheck python myApp.py

========= CUDA-MEMCHECK
========= Internal Memcheck Error: Initialization failed
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuDevicePrimaryCtxRetain + 0x154) [0x1fd7d4]

========= Host Frame:/usr/local/lib/python3.6/dist-packages/dlib.cpython-36m-aarch64-linux-gnu.so [0x8389c4]

safavakili on 15 Apr 2019

Hello, I also encountered the same problem with NaN. Can you find a solution?Please tell me, thank you very much！

zhougz17520495180 on 25 Apr 2019

I found the same issue in the NVidia forum. It seems to be a problem just with the Jetson Nano. It's interesting, that the face location is working with cnn but not face encoding.

NVidia Forum - issues with dlib library

kdwayne on 30 Apr 2019

It appears that NVIDIA is currently looking into things on their end with the CUDNN libraries based on their last update in thread listed above.
FWIW the memcheck error appears to come from not running the utility as root. I was able to reproduce this by running the utility as a regular user and resolve it by running the memcheck utility as root.

mgraves03 on 5 May 2019

For others reading the thread, NVIDIA's suggested temporary fix is to apply this diff to dlib source and re-compile dlib python extensions:

diff --git a/dlib/cuda/cudnn_dlibapi.cpp b/dlib/cuda/cudnn_dlibapi.cpp
index a32fcf6..6952584 100644
--- a/dlib/cuda/cudnn_dlibapi.cpp
+++ b/dlib/cuda/cudnn_dlibapi.cpp
@@ -851,7 +851,7 @@ namespace dlib
                         dnn_prefer_fastest_algorithms()?CUDNN_CONVOLUTION_FWD_PREFER_FASTEST:CUDNN_CONVOLUTION_FWD_NO_WORKSPACE,
                         std::numeric_limits<size_t>::max(),
                         &forward_best_algo));
-                forward_algo = forward_best_algo;
+                //forward_algo = forward_best_algo;
                 CHECK_CUDNN(cudnnGetConvolutionForwardWorkspaceSize( 
                         context(),
                         descriptor(data),

See https://devtalk.nvidia.com/default/topic/1049660/jetson-nano/issues-with-dlib-library/2

ageitgey on 9 May 2019

Dang, so it's a bug in cudnn? Is there a preprocessor macro that can be used to identify this platform and toggle this change?

davisking on 10 May 2019

I haven't found one yet. I'll dig around again. But I did confirm that the patch avoids the bug and the output is correct with it.

ageitgey on 14 May 2019

👍1

Warning: this issue has been inactive for 36 days and will be automatically closed on 2019-06-28 if there is no further activity.

If you are waiting for a response but haven't received one it's possible your question is somehow inappropriate. E.g. it is off topic, you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's official compilation instructions, dlib's API documentation, or a Google search.

dlib-issue-bot on 20 Jun 2019

Is this fixed now that cuDNN 7.6.1 is out? The devtalk thread claims that release resolved the issue but it has to be manually compiled for now.

c-x-berger on 26 Jun 2019

It's fixed in 7.6.1, but AFAIK that version isn't available for the Jetson architecture yet.

dsegel on 18 Jul 2019

Warning: this issue has been inactive for 35 days and will be automatically closed on 2019-09-01 if there is no further activity.

dlib-issue-bot on 23 Aug 2019

Warning: this issue has been inactive for 43 days and will be automatically closed on 2019-09-01 if there is no further activity.

dlib-issue-bot on 31 Aug 2019

Notice: this issue has been closed because it has been inactive for 45 days. You may reopen this issue if it has been closed in error.

dlib-issue-bot on 2 Sep 2019

has the problem been solved? I am facing similar issue. the funny thing I am getting NaN but not always?

a-kanaan on 29 Mar 2020

👀1

any updates? wold be glad to hear something here...

ozett on 24 Jul 2020

Great news. Problem has been solved in JetPack SDK 4.4. I have tested

pi-null-mezon on 27 Aug 2020

🎉2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Dlib (python) using CPU while GPU is detected

pliablepixels · 4Comments

Improve support for Recursive Neural Networks.

lvella · 4Comments

Building dLib on Ubuntu 18.04

AeroClassics · 4Comments

dlib 19.7 multi classifier training data using HOG+SVM

yourmailhacked · 3Comments

How to force install Dlib with only CPU support on a GPU machine with Cuda enabled

rsadiq · 4Comments