Caffe: cuDNN bug in Caffe with "group" in conv layer (misaligned address)

Created on 30 Jun 2017 · 16Comments · Source: BVLC/caffe

Issue summary

Using "group" parameter in any convolution layer, with CUDNN, I get "misaligned address" error when the training phase starts. The (first?) test phase is not affected. The error disappears when I build caffe with CUDA but without CUDNN. However such a training is 2x slower...

Steps to reproduce

Checkout repo, build with CUDNN, use "group" parameter of Convolution layer in some net and run training.

Your system configuration

Operating system: Ubuntu 16.04
Compiler: gcc 5.4
CUDA version (if applicable): 8
CUDNN version (if applicable): 5.1
BLAS: open

Source

svobora

Most helpful comment

@svobora this is a bug of Caffe, I solved it by modifying cudnn_conv_layer.cpp and aligning the address to be multiples of 32.

You can insert tow lines of code before size_t total_max_workspace = ... as follow:

       size_t m=32;
       max_workspace = (max_workspace + m-1) / m * m; //align address to be multiples of m

BTW, I think there is another bug, these lines should be put in else block:

      for (int g = 0; g < (this->group_ * CUDNN_STREAMS_PER_GROUP); g++) {
        workspace[g] = reinterpret_cast<char *>(workspaceData)+g*max_workspace;
      }

hoszbh on 4 Apr 2018

👍24

All 16 comments

Could you please post the error log?

deepali-c on 30 Jun 2017

F0630 15:37:53.939421 12138 benchmark.cpp:92] Check failed: error == cudaSuccess (74 vs. 0) misaligned address
* Check failure stack trace:
F0630 15:37:53.939426 12256 math_functions.cu:79] Check failed: error == cudaSuccess (74 vs. 0) misaligned address
Check failure stack trace: *
@ 0x7f71ff1c55cd google::LogMessage::Fail()
@ 0x7f71ff1c55cd google::LogMessage::Fail()
@ 0x7f71ff1c7433 google::LogMessage::SendToLog()
@ 0x7f71ff1c7433 google::LogMessage::SendToLog()
@ 0x7f71ff1c515b google::LogMessage::Flush()
@ 0x7f71ff1c515b google::LogMessage::Flush()
@ 0x7f71ff1c7e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f71ff1c7e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f71ff8101da caffe::Timer::MilliSeconds()
@ 0x7f71ff9bdc0a caffe::caffe_gpu_memcpy()
@ 0x7f71ff82eb1d caffe::SyncedMemory::mutable_cpu_data()
@ 0x7f71ff80f73a caffe::Timer::Seconds()
@ 0x7f71ff8161f2 caffe::Blob<>::mutable_cpu_data()
@ 0x7f71ff98e85d caffe::Solver<>::Step()
@ 0x7f71ff916144 caffe::ImageDataLayer<>::load_batch()
@ 0x7f71ff98f26a caffe::Solver<>::Solve()
@ 0x40e0ba train()
@ 0x40a687 main
@ 0x7f71ff8a1d37 caffe::BasePrefetchingDataLayer<>::InternalThreadEntry()
@ 0x7f71fe135830 __libc_start_main
@ 0x40b029 _start
@ (nil) (unknown)

svobora on 30 Jun 2017

i met the same question, the group size 2 is ok, 3 or larger is some wrong

azhangwei on 4 Jul 2017

I tried it, get the same error even on removing group. The network converges well without cudNN but slowly. Did you manage any fix to the problem? @svobora .
What exactly is the root cause for the problem

ayushchopra96 on 21 Jul 2017

same here. with group=output it requires a huge amount of mem. reducing batch size to the minimum crashes after 2000-3000 iteration as out of memory

cateweb on 30 Jul 2017

Consider use ConvolutionDepthwise (#5665) to replace convolution with group parameters.

cpwei80 on 1 Aug 2017

I got the same error with the following layer

layer { name: "fc2_conv_b" type: "Convolution" bottom: "fc2_a" top: "fc2_b"
    param { lr_mult: 1 decay_mult: 1  }
    param { lr_mult: 2 decay_mult: 0  }
        convolution_param { num_output: 64 pad: 1 kernel_size: 3 group: 4
stride: 1
        weight_filler {  type: "xavier"    }
        bias_filler {  type: "constant"    }
    }}

I don't know why, with output num 128 or kernel_size 5 there will be no problem......
Can anyone fix this?

douzsh on 24 Jan 2018

I'm unable to reproduce the problem; more specific instructions are needed.
@douzsh How does fc2_a blob look like? Grouped convolution will work or fail depending on its input, so its shape is important. Your layer worked for me if I shaped the blob to dimension 1x16x32x32.

Noiredd on 26 Feb 2018

@Noiredd
the Input blob's shpae is 1x64x64x64.
Hope you can help me and solve this issue.

douzsh on 27 Feb 2018

@douzsh I just ran this network with no problems - both in Python and caffe time.
What's the output of caffe device_query -gpu=all on your machine? What are your CUDA and cuDNN versions?

Noiredd on 27 Feb 2018

@Noiredd
Thatz really a good news. Can you tell me your CUDA and cudnn version too. I can upgrade my lib for training...
BTW, my CUDA is 7.0 and cudnn V6 is used while training.