Hi guys,
I am trying to follow the "Training YOLO on VOC" tutorial. I downloaded the dataset, compiled darknet with GPU=1.
When I run ./darknet detector train cfg/voc.data cfg/yolo-voc.cfg weights/darknet19_448.conv.23
I get the following error:
CUDA Error: mapping of buffer object failed
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)
The stack trace is:
1 __GI_raise raise.c 54 0x7fffeee7f428
2 __GI_abort abort.c 89 0x7fffeee8102a
3 __assert_fail_base assert.c 92 0x7fffeee77bd7
4 __GI___assert_fail assert.c 101 0x7fffeee77c82
5 check_error cuda.c 36 0x442cd1
6 gemm_gpu gemm.c 181 0x4c57ff
7 backward_convolutional_layer_gpu convolutional_kernels.cu 243 0x4bae57
8 backward_network_gpu network.c 759 0x47ac06
9 backward_network network.c 263 0x47873a
10 train_network_datum network.c 290 0x478942
11 train_network network.c 320 0x478b10
12 train_detector detector.c 116 0x42fc9f
13 run_detector detector.c 686 0x433d10
14 main darknet.c 419 0x43d9be
The state of the memory on my GPU is:
Mon Oct 30 16:51:44 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K1100M Off | 00000000:01:00.0 On | N/A |
| N/A 49C P0 N/A / N/A | 1407MiB / 1998MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1487 G /usr/lib/xorg/Xorg 296MiB |
| 0 1724 C ...rk/image_classification/darknet/darknet 763MiB |
| 0 2720 G kwin_x11 31MiB |
| 0 2748 G /usr/bin/krunner 22MiB |
| 0 2773 G /usr/bin/plasmashell 65MiB |
| 0 3225 G ...-token=392F0AC4D080865DA6AD009B6EED524C 22MiB |
| 0 3746 G ...-token=ADF24504200311844A2F8BBAAD96A7E6 95MiB |
| 0 20293 G .../distr/Qt/Tools/QtCreator/bin/qtcreator 45MiB |
| 0 26023 G ...-token=681CC81C35A1DCB33439F4492CDEFEC2 56MiB |
+-----------------------------------------------------------------------------+
What I tried to verify first while debugging is that the pointers that are passed to cublasSgemm are valid. It seems that this is the case, but I'm not sure.
It fails on conv layer that has order 5 out of 31 layers in total. So it makes it all for the layers above while doing back propagation but fails on this one always.
Any help to understand what is wrong would be appreciated.
I tried it with both cuda-8.0 GA2 and cuda-9.0 with the same result.
The same error for other machine with GTX-1080
I had the same issue at first, but then I change the Makefile and set CUDNN=1 (default is 0), and compile again, this error is gone now.
@elit8888 thank you, it helped.
I have the same problem when I run :
`./darknet detector train cfg/voc.data cfg/yolo-voc.cfg weights/darknet19_448.conv.23``
But, After I set cudnn=1, a new error occurred:
./src/convolutional_layer.c:133:5: error: too many arguments to function ‘cudnnSetFilter4dDescriptor’
compilation terminated due to -Wfatal-errors.
I follow the tutorial with K40c, CUDA-7.5, and cudnn-v4.
Can you help me? Thx @elit8888 @lamerman
@luowy1001 not sure, I'm using CUDA-8.0 and CUDNN-v5, didn't have the problem.
I found the same problem from google group: https://groups.google.com/forum/#!topic/darknet/uqxtX3m3gJc
Maybe try upgrade the version?
@elit8888 Thanks you very much! It helped. I need cudnn-7.5-linux-x64-v5.0-ga.tgz to replace cudnn-7.0-linux-x64-v4.0-rc.tgz, I have solved it!
@luowy1001 Thanks for the hint on updating the cudnn package. I ran into the same issue (with cuda 8.0, ubuntu 14.04) and solved it by upgrading the cudnn to cuDNN v7.1.3 Runtime Library for Ubuntu14.04 (Deb)
Most helpful comment
I had the same issue at first, but then I change the
Makefileand setCUDNN=1(default is 0), and compile again, this error is gone now.