Caffe: Segfault during caffe::init

Created on 9 Mar 2016  路  3Comments  路  Source: BVLC/caffe

I'm using caffe-rc3 on Ubuntu. Caffe tests pass. mnist sample runs perfectly. I have a trained net with a net and weight files. Everything works perfectly in CPU mode. GPU crashes. I've spent a few hours with gdb and the crash happens when caffe_rng_uniform() calls caffe_rng() and rng_stream returns 0x1, a bad pointer.

16 inline rng_t* caffe_rng() {
17 return static_castcaffe::rng_t*(Caffe::rng_stream().generator());
18 }
1

random_generator pointer is 0x1, which causes the crash when it is dereferenced

(gdb) p *caffe::thread_instance_.get()
$49 = {cublas_handle_ = 0x4df9160, curand_generator_ = 0x4dfab10, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}

However, caffe Get() has a good pointer. it seems like the thread specific data and the singleton data are different. I can;t figure out why.

(gdb) p *caffe::Caffe::Get().random_generator_
$46 = (caffe::Caffe::RNG &) @0x4df9160: {generator_ = {px = 0x7fffffff00000200, pn = {pi_ = 0xffff0000ffff}}}

backtrace:
(gdb) bt

0 caffe::caffe_rng () at ./include/caffe/util/rng.hpp:17

1 0x00007ffff723f833 in caffe::caffe_rng_uniform (n=81536, a=-0.0686263517, b=0.0686263517, r=0x201200000)

at src/caffe/util/math_functions.cpp:252

2 0x00007ffff716ae72 in caffe::XavierFiller::Fill (this=0x60a4fb0, blob=0x60a47f0) at ./include/caffe/filler.hpp:161

3 0x00007ffff71f7d82 in caffe::BaseConvolutionLayer::LayerSetUp (this=0x60a0620,

bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/base_conv_layer.cpp:170

4 0x00007ffff7195c33 in caffe::CuDNNConvolutionLayer::LayerSetUp (this=0x60a0620,

bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})
at src/caffe/layers/cudnn_conv_layer.cpp:20

5 0x00007ffff7155548 in caffe::Layer::SetUp (this=0x60a0620, bottom=std::vector of length 1, capacity 1 = {...},

top=std::vector of length 1, capacity 1 = {...}) at ./include/caffe/layer.hpp:71

6 0x00007ffff7295246 in caffe::Net::Init (this=0x4e4a890, in_param=...) at src/caffe/net.cpp:148

7 0x00007ffff72939e0 in caffe::Net::Net (this=0x4e4a890, param_file="/home/ubuntu/linux/gtpmfgo//golast19.prototxt",

phase=caffe::TEST, root_net=0x0) at src/caffe/net.cpp:36

8 0x00000000004fb4c6 in caffe_init (path=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1) at ../src/caffecnn.cpp:63

9 0x00000000004de674 in uct_init_all (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)

at ../src/uct.c:465

10 0x000000000048cbbc in init_mfgo (cwd=0x7fffffffd330 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=1, use_gpu=1)

at ../src/G2init.c:112

11 0x00000000004fa301 in main (argc=6, argv=0x7fffffffe5d8) at gtpmfgo.cpp:1545

My code invoking caffe (use_gpu is true:

int caffe_init(const char *path, int use_gpu) {

ifdef HAVE_CAFFE

    int argc = 2;
    char *fake_args[] = { "gtpmfgo", "ManyFaces" };
    char **argv = fake_args;
    GlobalInit(&argc, &argv);
    if (use_gpu) {
            Caffe::set_mode(Caffe::GPU);
            Caffe::SetDevice(0);
            Caffe::DeviceQuery();
    }
    else {
            Caffe::set_mode(Caffe::CPU);
    }

    if (caffe_test_net != NULL) delete caffe_test_net;
    string file_path = path;
    file_path += "/";
    caffe_test_net = new Net<float>(file_path + filename_net, TEST);
    caffe_test_net->CopyTrainedLayersFrom(file_path + filename_parameters);

Most helpful comment

Found the problem. I had CPU_ONLY defined in my application header, so my application and the library had different definition of the Caffe class.

All 3 comments

I'm using Cuda_7.5

It appears that during Caffe::set_mode, the compiler is writing the mode_ into the random_generator_. gdb output: I have gdb 4.8.4.

(gdb) bt

0 boost::detail::shared_count::~shared_count (this=0x7fffffffb868, __in_chrg=)

at /usr/include/boost/smart_ptr/detail/shared_count.hpp:371

1 0x00007ffff7222d56 in boost::shared_ptrboost::detail::tss_cleanup_function::~shared_ptr (this=0x7fffffffb860,

__in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:328

2 0x00007ffff7222f69 in boost::thread_specific_ptrcaffe::Caffe::reset (

this=0x7ffff7bb9db0 caffe::thread_instance_, new_value=0x1173990) at /usr/include/boost/thread/tss.hpp:105

3 0x00007ffff7221001 in caffe::Caffe::Get () at src/caffe/common.cpp:17

4 0x00000000004fbe05 in caffe::Caffe::set_mode (mode=caffe::Caffe::GPU)

at /home/ubuntu/linux/caffe-rc3/include/caffe/common.hpp:148

5 0x00000000004fb334 in caffe_init (path=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1)

at ../src/caffecnn.cpp:54

6 0x00000000004de5b4 in uct_init_all (cwd=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", max_memory=965,

max_threads=64, use_gpu=1) at ../src/uct.c:465

7 0x000000000048cafc in init_mfgo (cwd=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", max_memory=965, max_threads=64,

use_gpu=1) at ../src/G2init.c:112

8 0x00000000004fa241 in main (argc=4, argv=0x7fffffffe5e8) at gtpmfgo.cpp:1545

(gdb) n
375 }
(gdb) s
boost::thread_specific_ptrcaffe::Caffe::reset (this=0x7ffff7bb9db0 caffe::thread_instance_, new_value=0x1173990)
at /usr/include/boost/thread/tss.hpp:107
107 }
(gdb) s
caffe::Caffe::Get () at src/caffe/common.cpp:19
19 return _(thread_instance_.get());
(gdb) p thread_instance_.get()
$34 = (caffe::Caffe *) 0x1173990
(gdb) d
(gdb) p thread_instance_.get()->random_generator_
$35 = {px = 0x0, pn = {pi_ = 0x0}}
(gdb) s
boost::thread_specific_ptrcaffe::Caffe::get (this=0x7ffff7bb9db0 caffe::thread_instance_)
at /usr/include/boost/thread/tss.hpp:84
84 return static_cast(detail::get_tss_data(this));
(gdb) p thread_instance_.get()->random_generator_
No symbol "thread_instance_" in current context.
(gdb) p caffe::thread_instance_.get()->random_generator_
$36 = {px = 0x0, pn = {pi_ = 0x0}}
(gdb) s
85 }
(gdb) s
caffe::Caffe::Get () at src/caffe/common.cpp:20
20 }
(gdb) s
caffe_init (path=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/", use_gpu=1) at ../src/caffecnn.cpp:60
60 string file_path = path;
(gdb) p caffe::thread_instance_.get()->random_generator_
$37 = {px = 0x1, pn = {pi_ = 0x0}}
(gdb) p *caffe::thread_instance_.get()
$38 = {cublas_handle_ = 0x4840a30, curand_generator_ = 0x53064e0, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}
(gdb) s
61 file_path += "/";
(gdb) p *caffe::thread_instance_.get()
$39 = {cublas_handle_ = 0x4840a30, curand_generator_ = 0x53064e0, random_generator_ = {px = 0x1, pn = {pi_ = 0x0}},
mode_ = caffe::Caffe::CPU, solver_count_ = 1, root_solver_ = true}
(gdb) info threads
Id Target Id Frame
3 Thread 0x7fffcf3ff700 (LWP 1757) "gtpmfgo" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
2 Thread 0x7fffd490d700 (LWP 1756) "gtpmfgo" 0x00007ffff613a12d in poll ()
at ../sysdeps/unix/syscall-template.S:81

  • 1 Thread 0x7ffff7fa5a40 (LWP 1752) "gtpmfgo" caffe_init (path=0x7fffffffd340 "/home/ubuntu/linux/gtpmfgo/",
    use_gpu=1) at ../src/caffecnn.cpp:61
    (gdb) p caffe::Caffe::MODE_GPU
    There is no field named MODE_GPU
    (gdb) p caffe::Caffe::GPU
    $40 = caffe::Caffe::GPU
    (gdb) p/x caffe::Caffe::GPU
    $41 = 0x1

Found the problem. I had CPU_ONLY defined in my application header, so my application and the library had different definition of the Caffe class.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

OpenHero picture OpenHero  路  3Comments

inferrna picture inferrna  路  3Comments

FreakTheMighty picture FreakTheMighty  路  3Comments

malreddysid picture malreddysid  路  3Comments

Ruhjkg picture Ruhjkg  路  3Comments