Openpose: Jetson TX1 - pooling_layer.cu:212] Check failed: error == cudaSuccess (8 vs. 0) invalid device function

Created on 8 Jun 2017 · 23Comments · Source: CMU-Perceptual-Computing-Lab/openpose

Issue summary

Executed command (if any)

a) build/examples/openpose/openpose.bin --image_dir /home/ubuntu/Dev/openpose/examples/media
(gives the error below)

b) build/examples/openpose/openpose.bin --no_gpu 0 --image_dir /home/ubuntu/Dev/openpose/examples/media
(open window, displays images but no recognitions made.)

Type of issue

You might select multiple topics, delete the rest:

Execution error

Your system configuration

Operating system (lsb_release -a on Ubuntu):
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04 LTS
Release: 16.04
Codename: xenial

CUDA version (cat /usr/local/cuda/version.txt in most cases):
CUDA Version 8.0.34

Caffe version:
Default from OpenPose

OpenCV version: 2.4 installed from JetPack 3.0.

build/examples/openpose/openpose.bin --image_dir /home/ubuntu/Dev/openpose/examples/media Starting pose estimation demo. Starting thread(s) F0608 00:53:13.197923 29939 pooling_layer.cu:212] Check failed: error == cudaSuccess (8 vs. 0) invalid device function *** Check failure stack trace: *** @ 0x7f935c6718 google::LogMessage::Fail() @ 0x7f935c8614 google::LogMessage::SendToLog() @ 0x7f935c6290 google::LogMessage::Flush() @ 0x7f935c8eb4 google::LogMessageFatal::~LogMessageFatal() @ 0x7f92b7ef40 caffe::PoolingLayer<>::Forward_gpu() @ 0x7f92a085b0 caffe::Net<>::ForwardFromTo() @ 0x7f936873dc op::NetCaffe::forwardPass() @ 0x7f936ee710 op::PoseExtractorCaffe::forwardPass() @ 0x7f936fa274 op::WPoseExtractor<>::work() @ 0x7f93719c2c op::Worker<>::checkAndWork() @ 0x7f9371ce98 op::SubThread<>::workTWorkers() @ 0x7f937261e4 op::SubThreadQueueInOut<>::work() @ 0x7f93721df0 op::Thread<>::threadFunction() @ 0x7f934b6280 (unknown) @ 0x7f91fadfb4 start_thread Aborted

help wantequestion

Source

cortinas

Most helpful comment

Finally have it working on the Jetson TK1.... I needed to fix a few issues with the build files for caffe and openpose as follows:

My CUDA_ARCH settings:
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_53,code=sm_53

INCLUDE_DIRS := /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := /usr/local/lib /usr/lib /usr/lib/aarch64-linux-gnu/hdf5/serial

I also forced some flags in the Makefile (this may not be neccassary but its late and I'm tired so not doing anymore as its working for me)
-DCUDA_ARCH_NAME="Manual" -DCUDA_ARCH_BIN="53" -DCUDA_ARCH_PTX="53" -DUSE_CUDNN=1

I also build using the latest openpose src.

cortinas on 10 Jun 2017

👍2

All 23 comments

Hi, 2 quick questions:

Which cuDNN version are you using?
--no_gpu 0? There is no such an option. I guess you meant --num_gpu. For that one, you need at least 1 GPU: --num_gpu 1.
Thanks!

gineshidalgo99 on 8 Jun 2017

Thanks.

Here is the output of a program with the cudnn version....

$ ./mnistCUDNN
cudnnGetVersion() : 5105 , CUDNN_VERSION from cudnn.h : 5105 (5.1.5)
Host compiler version : GCC 4.9.2
There are 1 CUDA capable devices on your machine :
device 0 : sms 2 Capabilities 5.3, SmClock 72.0 Mhz, MemSize (Mb) 3994, MemClock 12.8 Mhz, Ecc=0, boardGroupID=0
Using device 0

On the --num_gpu 0, I was just playing to see if I could get the program to do something !

cortinas on 8 Jun 2017

I am slightly confused, it is then working with --num_gpu 1 so that this issue can be closed? Or what is the output when --num_gpu 1 is used? Thanks

gineshidalgo99 on 8 Jun 2017

It does not work with either --num_gpu setting.

With --num_gpu=1, I get the "Check failed: error == cudaSuccess (8 vs. 0) invalid device function"
[I assume this is the correct way to enable gpu]

With --num_gpu=0, the program finishes without any errors but does not detect anything in the samples images.
[I was just playing to see if I could get the program to run at all]

cortinas on 8 Jun 2017

OK got it.

Since you are using a custom Ubuntu (the one from Nvidia), we cannot give you too much more help for the Caffe part (where it is failing), since we do not have that device to try.

Try to run Caffe and some Caffe demo (maybe the Caffe tests) there. Once Caffe is working with the GPU, OpenPose just uses C++11, Caffe and Caffe's dependencies.

Let us know your results. Thanks

gineshidalgo99 on 8 Jun 2017

Ok.
Caffe is working fine for all its tests and at least some demos. But let me run in a debugger to see what is actually failing.
My guess is some issue with version mismatch between caffe/cuda/cudnn/jetson

Do you know if anyone else has got it working on Jetson ?

cortinas on 8 Jun 2017

Yeah please, let me know the exact function where it fails, so I can make more guesses about OpenPose.

No idea about people using OpenPose on Jetson.

gineshidalgo99 on 8 Jun 2017

I now have "openpose.bin" running. I needed to change some of the CUDA arch params in Makefile.config for Jetson Tx1
However, I still do not see any useful or interesting output:

cortinas on 10 Jun 2017

Finally have it working on the Jetson TK1.... I needed to fix a few issues with the build files for caffe and openpose as follows:

My CUDA_ARCH settings:
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_53,code=sm_53

INCLUDE_DIRS := /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := /usr/local/lib /usr/lib /usr/lib/aarch64-linux-gnu/hdf5/serial

I also build using the latest openpose src.

cortinas on 10 Jun 2017

👍2

Thank you for posting the solution! So other people can use it too.

In conclusion, the only changes were located in the Makefile and Makefile.config files. This is good, so you and Jetson users will be able to easily update OpenPose at any point.

I am closing this issue then.

gineshidalgo99 on 12 Jun 2017

I am curious to know if cortinas finally achieved to run open pose on jetson TK1 !!

Cortinas could you please email me at smanismech[at]me[dot]com .

I have a jetson TX2 and I have some memory outage issue when i run openpose on it.

Thank you in advance

IoaSman1 on 16 Jun 2017

@cortinas
Hi, I am trying to run OpenPose on my Jetson TK1.
And I've tried the method you gave above.
I edited the file Makefile.config in the 3rdparty/caffe/, changed the CUDA_ARCH settings and added NCLUDE_DIRS := /usr/local/include /usr/include/hdf5/serial LIBRARY_DIRS := /usr/local/lib /usr/lib /usr/lib/aarch64-linux-gnu/hdf5/serial
then I ran make all -j4 && make distribute -j4 to build.
But I got ERROR:

NVCC src/caffe/solvers/adadelta_solver.cu
nvcc fatal   : Unsupported gpu architecture 'compute_53'
make: *** [.build_release/cuda/src/caffe/solvers/adadelta_solver.o] Error 1
make: *** Waiting for unfinished jobs....

Is there anything I did wrong?
My CUDA version is 6.5
thx.

YorksonChang on 21 Jul 2017

😕1

Hello?

YorksonChang on 26 Jul 2017

Don't even try it.

On Jetson TX2 with jetpack 3.1 I get 1FPS performance for prerecorded video or realtime .
I don't think it worths to run it on TK1. It is GPU hungry model !!

IoaSman1 on 27 Jul 2017

👍1

@IoaSman1 Have you test how much time dose OpenPose process one image?

YorksonChang on 27 Jul 2017

Even using the tips in the FAQ (but it'll decrease accuracy) in tge doc/installation file is that slow?

gineshidalgo99 on 28 Jul 2017

@IoaSman1 I am doing the same thing here with Jetson Tx2. Is it straightforward to make the whole thing work? Would appreciate very much if you can share the steps...

yangroupaomo on 10 Sep 2017

If someone wants to share the steps, feel free to make a pull request with the steps for any other OS or embedded board! I'll merge it. Thanks!

gineshidalgo99 on 10 Sep 2017

awesome. Looking forward to that!
Thanks!

On Sep 10, 2017, at 2:52 PM, Gines notifications@github.com wrote:

If someone wants to share the steps, feel free to make a pull request with the steps for any other OS or embedded board! I'll merge it. Thanks!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/58#issuecomment-328374676, or mute the thread https://github.com/notifications/unsubscribe-auth/AWBRGcvw1-EHbvpqkZHpZLrqTy_y3tOIks5shFocgaJpZM4NzdKD.

yangroupaomo on 10 Sep 2017

Got it working on TX2 last night, PR incoming. With loads of reduction (128x96) in net_resolution I got to 10+fps. Used external webcam as it wasn't straightforward with the board one. Hands and Face work (256x256 nets) but both at the same time is too memory intensive, it oom crashes.

After I finish the PR I'll take a look at TensorRT hoping for higher realtime performances.

bushibushi on 13 Sep 2017

https://github.com/CMU-Perceptual-Computing-Lab/openpose/pull/245

bushibushi on 13 Sep 2017

@IoaSman1 have you tried reducing the net_resolution, I can push it up to 4-7 fps based on how low I am willing to go on net_resolution, the accuracy drop is not significant too
Hope this helps

vinitmuchhala on 13 Jul 2018

Hello
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
F0123 10:55:27.467897 13141 pooling_layer.cu:212] Check failed: error == cudaSuccess (48 vs. 0) no kernel image is available for execution on the device
* Check failure stack trace: *
@ 0x7f92b39718 google::LogMessage::Fail()
@ 0x7f92b3b614 google::LogMessage::SendToLog()
@ 0x7f92b39290 google::LogMessage::Flush()
@ 0x7f92b3beb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f92f40bc8 caffe::PoolingLayer<>::Forward_gpu()
@ 0x7f92d66058 caffe::Net<>::ForwardFromTo()
@ 0x7f93e68a2c op::NetCaffe::forwardPass()
@ 0x7f93f9897c op::PoseExtractorCaffe::forwardPass()
@ 0x7f93f8e178 op::PoseExtractor::forwardPass()
@ 0x7f93f9cc18 op::WPoseExtractor<>::work()
@ 0x7f93e96bac op::Worker<>::checkAndWork()
@ 0x7f93e9b528 op::SubThread<>::workTWorkers()
@ 0x7f93ea57cc op::SubThreadQueueInOut<>::work()
@ 0x7f93ea1308 op::Thread<>::threadFunction()
@ 0x7f9394f280 (unknown)
@ 0x7f91f77fc4 start_thread
Aborted
facing the above error with TX1. tried the changes mentioned above. Please guide here.