Using "group" parameter in any convolution layer, with CUDNN, I get "misaligned address" error when the training phase starts. The (first?) test phase is not affected. The error disappears when I build caffe with CUDA but without CUDNN. However such a training is 2x slower...
Checkout repo, build with CUDNN, use "group" parameter of Convolution layer in some net and run training.
Operating system: Ubuntu 16.04
Compiler: gcc 5.4
CUDA version (if applicable): 8
CUDNN version (if applicable): 5.1
BLAS: open
Could you please post the error log?
F0630 15:37:53.939421 12138 benchmark.cpp:92] Check failed: error == cudaSuccess (74 vs. 0) misaligned address
* Check failure stack trace:
F0630 15:37:53.939426 12256 math_functions.cu:79] Check failed: error == cudaSuccess (74 vs. 0) misaligned address
Check failure stack trace: *
@ 0x7f71ff1c55cd google::LogMessage::Fail()
@ 0x7f71ff1c55cd google::LogMessage::Fail()
@ 0x7f71ff1c7433 google::LogMessage::SendToLog()
@ 0x7f71ff1c7433 google::LogMessage::SendToLog()
@ 0x7f71ff1c515b google::LogMessage::Flush()
@ 0x7f71ff1c515b google::LogMessage::Flush()
@ 0x7f71ff1c7e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f71ff1c7e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f71ff8101da caffe::Timer::MilliSeconds()
@ 0x7f71ff9bdc0a caffe::caffe_gpu_memcpy()
@ 0x7f71ff82eb1d caffe::SyncedMemory::mutable_cpu_data()
@ 0x7f71ff80f73a caffe::Timer::Seconds()
@ 0x7f71ff8161f2 caffe::Blob<>::mutable_cpu_data()
@ 0x7f71ff98e85d caffe::Solver<>::Step()
@ 0x7f71ff916144 caffe::ImageDataLayer<>::load_batch()
@ 0x7f71ff98f26a caffe::Solver<>::Solve()
@ 0x40e0ba train()
@ 0x40a687 main
@ 0x7f71ff8a1d37 caffe::BasePrefetchingDataLayer<>::InternalThreadEntry()
@ 0x7f71fe135830 __libc_start_main
@ 0x40b029 _start
@ (nil) (unknown)
i met the same question, the group size 2 is ok, 3 or larger is some wrong
I tried it, get the same error even on removing group. The network converges well without cudNN but slowly. Did you manage any fix to the problem? @svobora .
What exactly is the root cause for the problem
same here. with group=output it requires a huge amount of mem. reducing batch size to the minimum crashes after 2000-3000 iteration as out of memory
Consider use ConvolutionDepthwise (#5665) to replace convolution with group parameters.
I got the same error with the following layer
layer { name: "fc2_conv_b" type: "Convolution" bottom: "fc2_a" top: "fc2_b"
param { lr_mult: 1 decay_mult: 1 }
param { lr_mult: 2 decay_mult: 0 }
convolution_param { num_output: 64 pad: 1 kernel_size: 3 group: 4
stride: 1
weight_filler { type: "xavier" }
bias_filler { type: "constant" }
}}
I don't know why, with output num 128 or kernel_size 5 there will be no problem......
Can anyone fix this?
I'm unable to reproduce the problem; more specific instructions are needed.
@douzsh How does fc2_a blob look like? Grouped convolution will work or fail depending on its input, so its shape is important. Your layer worked for me if I shaped the blob to dimension 1x16x32x32.
@Noiredd
the Input blob's shpae is 1x64x64x64.
Hope you can help me and solve this issue.
@douzsh I just ran this network with no problems - both in Python and caffe time.
What's the output of caffe device_query -gpu=all on your machine? What are your CUDA and cuDNN versions?
@Noiredd
Thatz really a good news. Can you tell me your CUDA and cudnn version too. I can upgrade my lib for training...
BTW, my CUDA is 7.0 and cudnn V6 is used while training.
@douzsh I'm pretty sure you need at least CUDA 7.5 to run cuDNN 6 - see the download page for Nvidia cuDNN for a list of compatible releases. I ran my test on CUDA 9.0.176 and cuDNN 7.0.5.
@svobora this is a bug of Caffe, I solved it by modifying cudnn_conv_layer.cpp and aligning the address to be multiples of 32.
You can insert tow lines of code before size_t total_max_workspace = ... as follow:
size_t m=32;
max_workspace = (max_workspace + m-1) / m * m; //align address to be multiples of m
BTW, I think there is another bug, these lines should be put in else block:
for (int g = 0; g < (this->group_ * CUDNN_STREAMS_PER_GROUP); g++) {
workspace[g] = reinterpret_cast<char *>(workspaceData)+g*max_workspace;
}
@hoszbh Just wanted to confirm that you fix is working thanks a lot. Do you know why this fix not on the master yet?
See also #6548
after fix the code ,should compile caffe again?
Most helpful comment
@svobora this is a bug of Caffe, I solved it by modifying cudnn_conv_layer.cpp and aligning the address to be multiples of 32.
You can insert tow lines of code before
size_t total_max_workspace = ...as follow:BTW, I think there is another bug, these lines should be put in else block: