Py-faster-rcnn: in aws g28xlarge GPU :demo.py :Check failed: error == cudaSuccess (2 vs. 0) out of memory

Created on 29 Dec 2015 · 17Comments · Source: rbgirshick/py-faster-rcnn

1229 13:48:13.383297 3136 net.cpp:283] This network produces output cls_prob
I1229 13:48:13.383335 3136 net.cpp:297] Network initialization done.
I1229 13:48:13.383350 3136 net.cpp:298] Memory required for data: 117093580
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 548317115

Loaded network /opt/practice/py-faster-rcnn/data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel
F1229 13:48:14.865103 3136 syncedmem.cpp:58] Check failed: error == cudaSuccess (2 vs. 0) out of memory
* Check failure stack trace: *
Aborted (core dumped)
ubuntu@ip-172-30-0-107:/opt/practice/py-faster-rcnn/tools$

Source

zdx3578

Most helpful comment

@zdx3578
1 if you're using the GPU instance on AWS, then please change the architecture setting into:

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50

Because the GPU in AWS does not support compute_35
2 Change sm_35 into sm_30 in lib/setup.py file
3 cd lib, remove these files: utils/bbox.c nms/cpu_nms.c nms/gpu_nms.cpp, if they exist.
And then make && cd ../caffe/ && make clean && make -j8 && make pycaffe -j8

twtygqyy on 4 Jan 2016

👍4

All 17 comments

@zdx3578 It seems you don't have enough GPU memories. You can try the smaller ZF model.

MenglaiWang on 29 Dec 2015

@zdx3578
1 if you're using the GPU instance on AWS, then please change the architecture setting into:

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50

twtygqyy on 4 Jan 2016

👍4

@twtygqyy: Implementing your suggestion + using --net=zf allowed me to run the demo. ./tools/demo.py still segfaults for the default run due to not enough memory. Is it really the case that the g2.8xlarge instance isn't powerful enough to run this? I had the same error output as @zdx3578

aaronpolhamus on 26 May 2016

@aaronpolhamus the memory of AWS instance should be enough to run or even train the model with CUDNN (the most recent version is CUDNN v5) installed

twtygqyy on 29 May 2016

@aaronpolhamus I forget to mention that the caffe version of this faster-rcnn repo can support CUDNN version <= v4

twtygqyy on 29 May 2016

@twtygqyy: the default run ./tools/demo.py still fails. Here's the error output:

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see    CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 548317115
I0529 20:02:37.205232  3760 net.cpp:816] Ignoring source layer data
I0529 20:02:37.314342  3760 net.cpp:816] Ignoring source layer drop6
I0529 20:02:37.329881  3760 net.cpp:816] Ignoring source layer drop7
I0529 20:02:37.329957  3760 net.cpp:816] Ignoring source layer fc7_drop7_0_split
I0529 20:02:37.330417  3760 net.cpp:816] Ignoring source layer loss_cls
I0529 20:02:37.330440  3760 net.cpp:816] Ignoring source layer loss_bbox
I0529 20:02:37.332777  3760 net.cpp:816] Ignoring source layer silence_rpn_cls_score
I0529 20:02:37.332852  3760 net.cpp:816] Ignoring source layer silence_rpn_bbox_pred


Loaded network /home/ubuntu/py-faster-    rcnn/data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel
F0529 20:02:37.756431  3760 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***

./tools/demo.py --net=zf succeeds, however. To your comments, which version of cuDNN should I be using for the py-faster-rcnn caffe install? I'm pretty sure that I'm using the most recent version now (cudnn-7.5-linux-x64-v5.0-ga.tgz), but the installation barfed when I tried to compile with USE_CUDNN := 1. I was able to successfully build caffe inside py-faster-rcnn and run the demo as ./tools/demo.py --net=zf when I commented this flag out.

So two questions for you:

Can I expect a successful install if I use a version of cuDNN that is <= 4?
Once I configure with cuDNN, can I expect to be able to run the default demo example without running in to a memory error?

aaronpolhamus on 29 May 2016

@aaronpolhamus According to the problem you described, using cuDNN v4 should solve the memory issue, because the caffe repo of faster-rcnn is the version of 2016 Feb, before the release of cuDNN v5. If you want to use v5, you have to update the caffe fork.

twtygqyy on 30 May 2016

@twtygqyy: finally worked it out. not only do you need cuDNN v 3 or 4, but you also need to be running CUDA v7.0, rather than 7.5

aaronpolhamus on 6 Jun 2016

@aaronpolhamus: I use CUDA v7.5 + cudnn v4 looks fine.

ricepot100 on 8 Jun 2016

@ricepot100: that's really interesting. To get everything running I had to revert to the earlier versions I mention above. Are you on the g2.8xlarge instance?

aaronpolhamus on 12 Jun 2016

@aaronpolhamus: No, I don't use EC2 but a local machine. 980GT 6G DDR

ricepot100 on 28 Jun 2016

when i try install fast rcnn than i got like this error? how to slove it?

i use gtx 970 so i think enough memory? isnn't it?

Loaded network /home/rvlab/Music/fast-rcnn/data/fast_rcnn_models/vgg16_fast_rcnn_iter_40000.caffemodel

Demo for data/demo/000004.jpg
F0718 22:09:35.547049 13693 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Aborted (core dumped)

cervantes-loves-ai on 18 Jul 2016

@ricepot100 You wrote that "I use CUDA v7.5 + cudnn v4 looks fine".

However, on NVIDIA's web site, the download for cuDNN is called cuDNN-7.0-linux-x64-v4.0-prod.tgz. In other words, version 4 of cuDNN is meant to work with version 7.0 of cuDNN.

BadWindshield on 12 Sep 2016

@isarker as it mentioned "out of memory" , actually vgg16 takes more than 3G memory of CUDA for training, suggest you to use a GTX Titan X with 12G, :)

https://github.com/rbgirshick/py-faster-rcnn#requirements-hardware

zhouphd on 25 Oct 2016

@zhouphd thanks for your reply, i changed my gpu and it's works.

cervantes-loves-ai on 25 Oct 2016

I had the same problem, but for me downgrading CUDNN to 4 was enough to solve it.

My configuration is Ubuntu 16.04, Cuda Toolkit 8.0, CUDNN 4.0. Apparently you don't need to downgrade the cuda toolkit and drivers to release 7.

The demo.py completed successfully on a 2Gb GeForce GTX 950.

Vandertic on 14 Apr 2017

Hi @Vandertic

I changed my CUDNN version to a 4, but I still get the same error, was there something else you did? I'm now using any AWS instance, so should that be a problem?