Caffe: make runtest segfaulting

Created on 25 Sep 2016  ·  15Comments  ·  Source: BVLC/caffe

I'm not sure what might be causing this, but here's what I'm seeing when I run make runtest on a checkout of master. I'm running on Debian Jessie, with GCC 4.9 and CUDA 8 RC. The only interesting thing about this machine is that it has 4x GTX 1080.

[----------] 2 tests from HingeLossLayerTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN      ] HingeLossLayerTest/2.TestGradientL2
[       OK ] HingeLossLayerTest/2.TestGradientL2 (6 ms)
[ RUN      ] HingeLossLayerTest/2.TestGradientL1
[       OK ] HingeLossLayerTest/2.TestGradientL1 (6 ms)
[----------] 2 tests from HingeLossLayerTest/2 (12 ms total)

[----------] 9 tests from AdaGradSolverTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN      ] AdaGradSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare
[       OK ] AdaGradSolverTest/2.TestLeastSquaresUpdateWithEverythingAccumShare (12 ms)
[ RUN      ] AdaGradSolverTest/2.TestAdaGradLeastSquaresUpdateWithEverythingShare
*** Aborted at 1474827886 (unix time) try "date -d @1474827886" if you are using GNU date ***
PC: @     0x7fbbf4951e2d (unknown)
*** SIGSEGV (@0x1451f000) received by PID 23925 (TID 0x7fbc03491a00) from PID 340914176; stack trace: ***
    @     0x7fbbf4bd38d0 (unknown)
    @     0x7fbbf4951e2d (unknown)
    @     0x7fbbf5496350 std::vector<>::_M_erase()
    @     0x7fbbf549427d caffe::DevicePair::compute()
    @     0x7fbbf5499123 caffe::P2PSync<>::Prepare()
    @     0x7fbbf54997a0 caffe::P2PSync<>::Run()
    @           0x6af00e caffe::GradientBasedSolverTest<>::RunLeastSquaresSolver()
    @           0x6c2d2f caffe::GradientBasedSolverTest<>::TestLeastSquaresUpdate()
    @           0x6c31b0 caffe::AdaGradSolverTest_TestAdaGradLeastSquaresUpdateWithEverythingShare_Test<>::TestBody()
    @           0x8ff553 testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x8f7eca testing::Test::Run()
    @           0x8f8018 testing::TestInfo::Run()
    @           0x8f80f5 testing::TestCase::Run()
    @           0x8f8a28 testing::internal::UnitTestImpl::RunAllTests()
    @           0x8f8d03 testing::UnitTest::Run()
    @           0x46e9df main
    @     0x7fbbf483ab45 (unknown)
    @           0x4764e9 (unknown)
    @                0x0 (unknown)
Makefile:526: recipe for target 'runtest' failed
make: *** [runtest] Segmentation fault

EDIT: I'm compiling with CuDNN enabled, but turning it off doesn't seem to make a difference.

multi-GPU testing

Most helpful comment

I suspect it is a bug of multi-GPU support.
I tried to use "export CUDA_VISIBLE_DEVICES=0" to make only 1 GPU visible to Caffe, and then I can successfully pass all the tests.

[==========] 2081 tests from 277 test cases ran. (353009 ms total)
[ PASSED ] 2081 tests.

All 15 comments

ran into exactly the same problem today with Ubuntu 16.04, 4 X K80, CUDA 8 RC, and GCC-5.3.
Advice highly appreciated!

This may be unrelated, but as an extra datapoint, I also get a segfault if
I import pycaffe and theano in the same file and the try to do anything
with caffe. Let me know if I can provide any extra info!

On Tue, 27 Sep 2016, 22:47 ruonanl, [email protected] wrote:

ran into exactly the same problem today with Ubuntu 16.04, 4 X K80, CUDA 8
RC, and GCC-5.3.
Advice highly appreciated!


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/BVLC/caffe/issues/4772#issuecomment-250008887, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACYqiCs5Z-Kv08RXj7FJyxBSQaG5R-Reks5quY8HgaJpZM4KF9-e
.

An extra bit of observation: segfaults appear in several "SolverTest", but all share the same stack trace:
std::vector<>::_M_erase()
caffe::DevicePair::compute()
caffe::P2PSync<>::Prepare()
caffe::P2PSync<>::Run()
caffe::GradientBasedSolverTest<>::TestLeastSquaresUpdate()

Same here.
Titan X (Pascal)_6+K80_2+GTX1080*1 + Ubuntu 16.04 + cudnn v5.1 + cuda 8 + GCC-5.4.

[----------] 12 tests from SGDSolverTest/2, where TypeParam = caffe::GPUDevice
[ RUN ] SGDSolverTest/2.TestLeastSquaresUpdateWithWeightDecay
* Aborted at 1475986823 (unix time) try "date -d @1475986823" if you are using GNU date
PC: @ 0x7f13e92fd512 (unknown)
* SIGSEGV (@0x19ae2000) received by PID 14082 (TID 0x7f13f0ac7ac0) from PID 430841856; stack trace: *
@ 0x7f13e958a3d0 (unknown)
@ 0x7f13e92fd512 (unknown)
@ 0x7f13e9eae280 std::vector<>::_M_erase()
@ 0x7f13e9eac494 caffe::DevicePair::compute()
@ 0x7f13e9eb1d50 caffe::P2PSync<>::Prepare()
@ 0x7f13e9eb285e caffe::P2PSync<>::Run()
@ 0x5b409e caffe::GradientBasedSolverTest<>::TestLeastSquaresUpdate()
@ 0x5b49ff caffe::SGDSolverTest_TestLeastSquaresUpdateWithWeightDecay_Test<>::TestBody()
@ 0x91ad53 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x91436a testing::Test::Run()
@ 0x9144b8 testing::TestInfo::Run()
@ 0x914595 testing::TestCase::Run()
@ 0x91586f testing::internal::UnitTestImpl::RunAllTests()
@ 0x915b93 testing::UnitTest::Run()
@ 0x46d9ed main
@ 0x7f13e91d0830 __libc_start_main
@ 0x475459 _start
@ 0x0 (unknown)
Makefile:526: recipe for target 'runtest' failed
make: *
* [runtest] Segmentation fault (core dumped)

I suspect it is a bug of multi-GPU support.
I tried to use "export CUDA_VISIBLE_DEVICES=0" to make only 1 GPU visible to Caffe, and then I can successfully pass all the tests.

[==========] 2081 tests from 277 test cases ran. (353009 ms total)
[ PASSED ] 2081 tests.

@nitbix

This may be unrelated, but as an extra datapoint, I also get a segfault if
I import pycaffe and theano in the same file and the try to do anything
with caffe. Let me know if I can provide any extra info!

This was fixed in theano in commit bb170f4fb201109f88b95da282ed3a21b5021c13 (23 Sep 2016). It was calling cudaThreadExit on shutdown which then caused a segfault when Caffe subsequently called cublasDestroy on cleanup

Dear All,

Please advice how you solve this issue as I have the same problem. Any answer is highly appreciated.
problem 1

Hi all
I have the same problem in ubuntu 16.4
screenshot from 2017-03-23 14-06-46
Any answer is highly appreciated. Thank you

@RuaYahya Did you solve issue?

Hi ,
The proplem in my case is that my labtop does not have a Nvidia card . Check whether your graphical processing unit is nvidia or not. It works fine when I try another laptop.
Thanks

@RuaYahya
* SIGABRT (@0x113c) received by PID 4412 (TID 0x7f64016a5b00) from PID 4412; stack trace: *
@ 0x7f63ffd094b0 (unknown)
@ 0x7f63ffd09428 gsignal
@ 0x7f63ffd0b02a abort
@ 0x7f63ffd4b7ea (unknown)
@ 0x7f63ffd53e0a (unknown)
@ 0x7f63ffd5798c cfree
@ 0x7f64008878af google::protobuf::internal::DestroyDefaultRepeatedFields()
@ 0x7f6400886b3b google::protobuf::ShutdownProtobufLibrary()
@ 0x7f63e98c6329 (unknown)
@ 0x7f64015a2c17 (unknown)
@ 0x7f63ffd0dff8 (unknown)
@ 0x7f63ffd0e045 exit
@ 0x7f63ffcf4837 __libc_start_main
@ 0x4077c9 _start
@ 0x0 (unknown)
Makefile:532: recipe for target 'runtest' failed

I have the same problem in ubuntu 16.4.Did you solve issue?

@Mehuli-Ruh11
I believe he would simply include it before the command, like this export CUDA_VISIBLE_DEVICES=0 make runtest. This fixed the error for me, it's related to this line in _Makefile.config_
# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

@denru01
I had a similar problem with you.
I had a boost python package installed through conda, it has a different version with the one in my system. If you are using Anaconda, just uninstall the boost python package(conda uninstall boost)
That might fix the problem.

did someone find a solution ? I have the same problem and I'm running on ubuntu16.04 with only one gpu (gtx1080) and cuda8.

@FangbRen
您好,我在安装caffe时遇到了和您相同的问题,想向您请教一下如何解决,谢谢

i solved this issue by the command : make runtest -j export CUDA_VISIBLE_DEVICES=0

Was this page helpful?
0 / 5 - 0 ratings

Related issues

iamhankai picture iamhankai  ·  3Comments

Ruhjkg picture Ruhjkg  ·  3Comments

sdemyanov picture sdemyanov  ·  3Comments

inferrna picture inferrna  ·  3Comments

shiorioxy picture shiorioxy  ·  3Comments