Caffe: error==cudaSuccess (77 vs. 0) an illegal memory access was encountered

Created on 18 May 2016  ·  11Comments  ·  Source: BVLC/caffe

Hello everyone~
I can make caffe and make test, But when I make runtest or train minist demo, I always get this error:
math_functions.cu:79] cudaSuccess (77 vs. 0) an illegal memory access was encountered
* Check failure stack trace: *
@ 0x7fcd7a848daa (unknown)
@ 0x7fcd7a848ce4 (unknown)
@ 0x7fcd7a8486e6 (unknown)
@ 0x7fcd7a84b687 (unknown)
@ 0x7fcd7affb498 caffe::caffe_gpu_memcpy()
@ 0x7fcd7afc242e caffe::SyncedMemory::to_gpu()
@ 0x7fcd7afc18d9 caffe::SyncedMemory::gpu_data()
@ 0x7fcd7af46bc2 caffe::Blob<>::gpu_data()
@ 0x7fcd7afd9393 caffe::CuDNNConvolutionLayer<>::Forward_gpu()
@ 0x7fcd7af837b5 caffe::Net<>::ForwardFromTo()
@ 0x7fcd7af83b27 caffe::Net<>::Forward()
@ 0x7fcd7af7913c caffe::Solver<>::Test()
@ 0x7fcd7af79a6e caffe::Solver<>::TestAll()
@ 0x7fcd7af79b60 caffe::Solver<>::Step()
@ 0x7fcd7af7a529 caffe::Solver<>::Solve()
@ 0x40810e train()
@ 0x4059ec main
@ 0x7fcd79b55ec5 (unknown)
@ 0x406121 (unknown)
@ (nil) (unknown)
Aborted (core dumped)

Cuda 7.5, with cudun, or without cudnn, the error is the same.
I don't know how fix it, and did anyone solve this bug?
Thx!

Most helpful comment

I decreased the batch_size, then the error:error == cudaSuccess (77 vs. 0) an illegal memory access was encountered is solved. But the error == cudaSuccess (2 vs. 0) out of memory arisses. Next , I decreased the batch_size to 1, then there is no error.

All 11 comments

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

When reporting a bug, it's most helpful to provide the following information, where applicable:

  • What steps reproduce the bug?
  • Can you reproduce the bug using the latest master, compiled with the DEBUG make option?
  • What hardware and operating system/distribution are you running?
  • If the bug is a crash, provide the backtrace (usually printed by Caffe; always obtainable with gdb).

I encountered the exactly same problem.
The following is the terminal output during ''make runtest".
Randomizing tests' orders with a seed of 65452 .

[ RUN ] ConvolutionLayerTest/2.Test1x1Gradient
F0522 13:03:41.964560 31682 math_functions.cu:79] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
* Check failure stack trace:
@ 0x2adfd553edaa (unknown)
@ 0x2adfd553ece4 (unknown)
@ 0x2adfd553e6e6 (unknown)
@ 0x2adfd5541687 (unknown)
@ 0x2adfd6b13c38 caffe::caffe_gpu_memcpy()
@ 0x2adfd6b0b550 caffe::SyncedMemory::cpu_data()
@ 0x2adfd69c76b2 caffe::Blob<>::cpu_data()
@ 0x475b42 caffe::GradientChecker<>::GetObjAndGradient()
@ 0x47b0d7 caffe::GradientChecker<>::CheckGradientSingle()
@ 0x47b69b caffe::GradientChecker<>::CheckGradientExhaustive()
@ 0x4bd7f6 caffe::ConvolutionLayerTest_Test1x1Gradient_Test<>::TestBody()
@ 0x861733 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x858417 testing::Test::Run()
@ 0x8584be testing::TestInfo::Run()
@ 0x8585c5 testing::TestCase::Run()
@ 0x85b908 testing::internal::UnitTestImpl::RunAllTests()
@ 0x85bb97 testing::UnitTest::Run()
@ 0x462f9f main
@ 0x2adfd798aec5 (unknown)
@ 0x469a59 (unknown)
@ (nil) (unknown)
make: *
* [runtest] Aborted (core dumped)

The problem is gone after I did two things.

  1. replace cudnn5 with cudnn4
  2. reboot
    Either or both can help.

@Olwn Thanks for your answer. I have done before, but it didn't work.
I think it maybe have trouble with cuda installation, because some complex cuda samples(eg. Convolution related) can not work, also end up with cuda77 error.

When you run lspci | grep VGA, how many graphics card will you get?
I get two cards both Intel card and NVIDIA card.
When I run cuda programs, I conjecture my PC use the Intel graphics card to do the operation, so the illegal memory access is encountered.
But when I run cuda programs or caffe with server or cluster, the only NVIDIA card will be used.
So I think this error maybe have connections to NVIDIA driver and cuda installation.
That's just my supposition. I want to reinstall them all.

Thx~

@seanbell Thank you for reminding me.

I decreased the batch_size, then the error:error == cudaSuccess (77 vs. 0) an illegal memory access was encountered is solved. But the error == cudaSuccess (2 vs. 0) out of memory arisses. Next , I decreased the batch_size to 1, then there is no error.

I am using DIGITS 6.

I am trying to do segmentation and have run the examples (semantic segmentation and medical imaging) successfully. When I come to load my own images, I get this error. When I decrease the image sizes to 100x100, it is solved but more than that, it throws this error. Why?

Because your GPU runs out of memory.... that's why.
Check nvidia-smi.

@happyzhouch I have the same problem with you, did you solve this? It's really miserable.

I encountered the same problem when train mobilenet-v2 after added DepthwiseConvolution layer to caffe. How to solve it?

reducing batch size solved my problem

Was this page helpful?
0 / 5 - 0 ratings