Darknet: CUDA Error: mapping of buffer object failed

Created on 30 Oct 2017 · 8Comments · Source: pjreddie/darknet

Hi guys,

I am trying to follow the "Training YOLO on VOC" tutorial. I downloaded the dataset, compiled darknet with GPU=1.

When I run ./darknet detector train cfg/voc.data cfg/yolo-voc.cfg weights/darknet19_448.conv.23

I get the following error:

CUDA Error: mapping of buffer object failed
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)

The stack trace is:

1  __GI_raise                       raise.c                  54  0x7fffeee7f428 
2  __GI_abort                       abort.c                  89  0x7fffeee8102a 
3  __assert_fail_base               assert.c                 92  0x7fffeee77bd7 
4  __GI___assert_fail               assert.c                 101 0x7fffeee77c82 
5  check_error                      cuda.c                   36  0x442cd1       
6  gemm_gpu                         gemm.c                   181 0x4c57ff       
7  backward_convolutional_layer_gpu convolutional_kernels.cu 243 0x4bae57       
8  backward_network_gpu             network.c                759 0x47ac06       
9  backward_network                 network.c                263 0x47873a       
10 train_network_datum              network.c                290 0x478942       
11 train_network                    network.c                320 0x478b10       
12 train_detector                   detector.c               116 0x42fc9f       
13 run_detector                     detector.c               686 0x433d10       
14 main                             darknet.c                419 0x43d9be

The state of the memory on my GPU is:

Mon Oct 30 16:51:44 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K1100M       Off  | 00000000:01:00.0  On |                  N/A |
| N/A   49C    P0    N/A /  N/A |   1407MiB /  1998MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1487      G   /usr/lib/xorg/Xorg                           296MiB |
|    0      1724      C   ...rk/image_classification/darknet/darknet   763MiB |
|    0      2720      G   kwin_x11                                      31MiB |
|    0      2748      G   /usr/bin/krunner                              22MiB |
|    0      2773      G   /usr/bin/plasmashell                          65MiB |
|    0      3225      G   ...-token=392F0AC4D080865DA6AD009B6EED524C    22MiB |
|    0      3746      G   ...-token=ADF24504200311844A2F8BBAAD96A7E6    95MiB |
|    0     20293      G   .../distr/Qt/Tools/QtCreator/bin/qtcreator    45MiB |
|    0     26023      G   ...-token=681CC81C35A1DCB33439F4492CDEFEC2    56MiB |
+-----------------------------------------------------------------------------+

What I tried to verify first while debugging is that the pointers that are passed to cublasSgemm are valid. It seems that this is the case, but I'm not sure.

It fails on conv layer that has order 5 out of 31 layers in total. So it makes it all for the layers above while doing back propagation but fails on this one always.

Any help to understand what is wrong would be appreciated.

Source

lamerman

Most helpful comment

I had the same issue at first, but then I change the Makefile and set CUDNN=1 (default is 0), and compile again, this error is gone now.

elit8888 on 30 Oct 2017

👍4

All 8 comments

I tried it with both cuda-8.0 GA2 and cuda-9.0 with the same result.

lamerman on 30 Oct 2017

The same error for other machine with GTX-1080

lamerman on 30 Oct 2017

I had the same issue at first, but then I change the Makefile and set CUDNN=1 (default is 0), and compile again, this error is gone now.

elit8888 on 30 Oct 2017

👍4

@elit8888 thank you, it helped.

lamerman on 2 Nov 2017

I have the same problem when I run :
`./darknet detector train cfg/voc.data cfg/yolo-voc.cfg weights/darknet19_448.conv.23``

But, After I set cudnn=1, a new error occurred:

./src/convolutional_layer.c:133:5: error: too many arguments to function ‘cudnnSetFilter4dDescriptor’
compilation terminated due to -Wfatal-errors.

I follow the tutorial with K40c, CUDA-7.5, and cudnn-v4.
Can you help me? Thx @elit8888 @lamerman

luowy1001 on 4 Nov 2017

@luowy1001 not sure, I'm using CUDA-8.0 and CUDNN-v5, didn't have the problem.
I found the same problem from google group: https://groups.google.com/forum/#!topic/darknet/uqxtX3m3gJc

Maybe try upgrade the version?

elit8888 on 4 Nov 2017

👍1

@elit8888 Thanks you very much! It helped. I need cudnn-7.5-linux-x64-v5.0-ga.tgz to replace cudnn-7.0-linux-x64-v4.0-rc.tgz, I have solved it!

luowy1001 on 4 Nov 2017

👍1

@luowy1001 Thanks for the hint on updating the cudnn package. I ran into the same issue (with cuda 8.0, ubuntu 14.04) and solved it by upgrading the cudnn to cuDNN v7.1.3 Runtime Library for Ubuntu14.04 (Deb)