Faced the out of memory issue even when compiled with Cudnn5.1 _(Check failed: error == cudaSuccess (2 vs. 0) out of memory.)_
Starting OpenPose demo...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
Starting thread(s)...
init done
opengl support available
F0606 10:19:02.473112 2049 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7f19e5079daa (unknown)
@ 0x7f19e5079ce4 (unknown)
@ 0x7f19e50796e6 (unknown)
@ 0x7f19e507c687 (unknown)
@ 0x7f19e4478c20 caffe::SyncedMemory::mutable_gpu_data()
@ 0x7f19e45ad6f2 caffe::Blob<>::mutable_gpu_data()
@ 0x7f19e4561228 caffe::BaseConvolutionLayer<>::forward_gpu_gemm()
@ 0x7f19e45ed176 caffe::ConvolutionLayer<>::Forward_gpu()
@ 0x7f19e45979c3 caffe::Net<>::ForwardFromTo()
@ 0x7f19e9a6053f op::NetCaffe::forwardPass()
@ 0x7f19e9a22bf7 op::PoseExtractorCaffe::forwardPass()
@ 0x7f19e9a1f8e0 op::PoseExtractor::forwardPass()
@ 0x7f19e9a1ca43 op::WPoseExtractor<>::work()
@ 0x7f19e9981670 op::Worker<>::checkAndWork()
@ 0x7f19e997a1e0 op::SubThread<>::workTWorkers()
@ 0x7f19e997ab55 op::SubThreadQueueInOut<>::work()
@ 0x7f19e997c250 op::Thread<>::threadFunction()
@ 0x7f19e9992aaf _ZNKSt7_Mem_fnIMN2op6ThreadISt10shared_ptrISt6vectorINS0_5DatumESaIS4_EEES2_INS0_6WorkerIS7_EEEEEFvvEEclIJEvEEvPSB_DpOT_
@ 0x7f19e999283b _ZNSt12_Bind_simpleIFSt7_Mem_fnIMN2op6ThreadISt10shared_ptrISt6vectorINS1_5DatumESaIS5_EEES3_INS1_6WorkerIS8_EEEEEFvvEEPSC_EE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE
@ 0x7f19e999253d std::_Bind_simple<>::operator()()
@ 0x7f19e99922d0 std::thread::_Impl<>::_M_run()
@ 0x7f19e8309a60 (unknown)
@ 0x7f19e40b8184 start_thread
@ 0x7f19e7d7703d (unknown)
@ (nil) (unknown)
Aborted (core dumped)
I later compiled it without CuDnn, the memory performance was the same.
I am able to check for CuDnn version installed on my system using the following command and it also gets displayed while using cmake-gui.
Output of cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 5
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 10
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
My GPU is GTX 1050 Ti (4GB) and there no other program running on it. According to the link provided for OpenPose 1.1.0 benchmark, my GPU should not go out of memory while running openpose.bin.
It works with net_resolution of 160x160 but it consumed almost all of my GPU memory ~ 4GB.
Is there any specific reason for this?
If you are running only body (no --face and no --hand), it takes about 1.6 GB, I run it on my laptop with a GTX 860 (2GB). So if it goes out of memory, 99% of the cases is some issue with the installation of CUDA / cuDNN, as Caffe (the one using cuDNN) is simply linked against them.
The error was in the caffe installation. I did not uncomment the USE_CUDNN:=1 flag in the Makefile.config.
It works fine now with around 1.5 GB usage.
Thanks for the help Gines.
Most helpful comment
If you are running only body (no --face and no --hand), it takes about 1.6 GB, I run it on my laptop with a GTX 860 (2GB). So if it goes out of memory, 99% of the cases is some issue with the installation of CUDA / cuDNN, as Caffe (the one using cuDNN) is simply linked against them.