Darknet: Compile error -- /usr/bin/ld: cannot find -lcuda

Created on 6 May 2020 · 7Comments · Source: AlexeyAB/darknet

Environment:
~ Linux
~ CUDA = 8.0.27
~ cuDNN = 5.1
~ OpenCV = 2.4.8

Makefile setup:
GPU=1
CUDNN=1
CUDNN_HALF=0
OPENCV=1
AVX=0
OPENMP=0
LIBSO=0
ZED_CAMERA=0 # ZED SDK 3.0 and above
ZED_CAMERA_v2_8=0 # ZED SDK 2.X

I got a problem when compiling Makefile:
/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
make: * [darknet] Error 1

My CUDA libs ard in directory /usr/local/cuda/lib64, but I can't find "libcuda.so" in this directory.

Do I need to change Makefile to fix this problem?
Thank you !

Source

HaolyShiit

Most helpful comment

The lib in the "stub" folder is a stub, as the name says. It is useful only to silent compiler at link time, but at run time it won't work and you need the real libcuda (which comes with nvidia drivers and not cuda sdk).
So, is it working your executable??

cenit on 7 May 2020

👍2

All 7 comments

If you build with CMake (just launch ./build.sh), does it work?

cenit on 6 May 2020

If you build with CMake (just launch ./build.sh), does it work?

Thanks for your reply.

I tried but I found no CMake, then I solved the problem before installing CMake.
The lib "lubcuda.so" is not in "/usr/local/cuda/lib64" but in "/usr/local/cuda-8.0/lib64/stubs"

I modified this :
ifeq ($(GPU), 1)
COMMON+= -DGPU -I/usr/local/cuda/include/
CFLAGS+= -DGPU
ifeq ($(OS),Darwin) #MAC
LDFLAGS+= -L/usr/local/cuda/lib -lcuda -lcudart -lcublas -lcurand
else
LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand
endif
endif
to :
ifeq ($(GPU), 1)
COMMON+= -DGPU -I/usr/local/cuda-8.0/include/
CFLAGS+= -DGPU
LDFLAGS+= -L/usr/local/cuda-8.0/lib64 -lcudart -lcublas -lcurand
LDFLAGS+= -L/usr/local/cuda-8.0/lib64/stubs -lcuda
endif

It works !

HaolyShiit on 7 May 2020

cenit on 7 May 2020

👍2

@cenit

I can run testing command , and the results are normal:
./darknet detect ./cfg/yolov4.cfg ./reference_hao/yolov4.weights ./data/person.jpg

But when I train the model with my own dataset, another error occurs.

train command:
./darknet detector train ./cfg/hao/RLD_hao.data ./cfg/hao/yolov4_hao.cfg ./reference_hao/yolov4.conv.137 -dont_show -map -i 3

The error occurs after 6000 iterations:
(next mAP calculation at 9236 iterations)
5999: 0.633929, 0.654455 avg loss, 0.000010 rate, 1.877512 seconds, 95984 images, 0.038086 hours left
Loaded: 0.000035 seconds
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.865682, GIOU: 0.862828), Class: 0.960477, Obj: 0.961623, No Obj: 0.000468, .5R: 1.000000, .75R: 1.000000, count: 17, class_loss = 0.127089, iou_loss = 51.345772, total_loss = 51.472862
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.889135, GIOU: 0.887272), Class: 0.991085, Obj: 0.993148, No Obj: 0.001230, .5R: 1.000000, .75R: 1.000000, count: 16, class_loss = 0.007091, iou_loss = 22.845778, total_loss = 22.852869
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.641530, GIOU: 0.600117), Class: 0.989874, Obj: 0.898132, No Obj: 0.000299, .5R: 1.000000, .75R: 0.000000, count: 1, class_loss = 0.002665, iou_loss = 0.064184, total_loss = 0.066849
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.844136, GIOU: 0.840075), Class: 0.998164, Obj: 0.824059, No Obj: 0.000533, .5R: 1.000000, .75R: 1.000000, count: 24, class_loss = 0.766716, iou_loss = 89.653999, total_loss = 90.420715
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.847136, GIOU: 0.845399), Class: 0.996766, Obj: 0.781560, No Obj: 0.001333, .5R: 1.000000, .75R: 0.928571, count: 14, class_loss = 0.862465, iou_loss = 17.476854, total_loss = 18.339319
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000001, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000000, iou_loss = 0.000000, total_loss = 0.000000
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.756514, GIOU: 0.753608), Class: 0.993219, Obj: 0.873042, No Obj: 0.000479, .5R: 1.000000, .75R: 0.555556, count: 18, class_loss = 0.753386, iou_loss = 75.371330, total_loss = 76.124718
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.891816, GIOU: 0.889111), Class: 0.977927, Obj: 0.992992, No Obj: 0.000796, .5R: 1.000000, .75R: 1.000000, count: 9, class_loss = 0.100123, iou_loss = 7.447656, total_loss = 7.547779
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.736795, GIOU: 0.708656), Class: 0.963782, Obj: 0.574549, No Obj: 0.000317, .5R: 1.000000, .75R: 0.500000, count: 2, class_loss = 0.111820, iou_loss = 0.421658, total_loss = 0.533478
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.862464, GIOU: 0.859188), Class: 0.975140, Obj: 0.962894, No Obj: 0.000415, .5R: 1.000000, .75R: 0.866667, count: 15, class_loss = 0.047956, iou_loss = 37.204109, total_loss = 37.252064
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.835306, GIOU: 0.832563), Class: 0.993731, Obj: 0.941625, No Obj: 0.002150, .5R: 1.000000, .75R: 0.869565, count: 23, class_loss = 0.549264, iou_loss = 21.456238, total_loss = 22.005503
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.624525, GIOU: 0.612480), Class: 0.912660, Obj: 0.558629, No Obj: 0.000962, .5R: 0.666667, .75R: 0.333333, count: 3, class_loss = 0.534840, iou_loss = 0.438115, total_loss = 0.972956

(next mAP calculation at 9236 iterations)
6000: 0.325167, 0.621526 avg loss, 0.000010 rate, 1.857895 seconds, 96000 images, 0.037710 hours left
Resizing to initial size: 416 x 416 try to allocate additional workspace_size = 33.55 MB
CUDA allocate done!

calculation mAP (mean average precision)...
4
**cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 544 : build time: May 6 2020 - 19:37:32

cuDNN Error: CUDNN_STATUS_INVALID_VALUE
cuDNN Error: CUDNN_STATUS_INVALID_VALUE: File exists
darknet: ./src/utils.c:325: error: Assertion `0' failed.
Aborted (core dumped)**

I can't figure out why it happens.

HaolyShiit on 7 May 2020

Mismatching version between cuDNN and cuda?

cenit on 7 May 2020

@cenit
Now I am training the model without "-map", hoping that it works:
./darknet detector train ./cfg/hao/RLD_hao.data ./cfg/hao/yolov4_hao.cfg ./reference_hao/yolov4.conv.137 -dont_show -i 3

If you have any suggestions, please let me know.
Thanks a lot!

HaolyShiit on 7 May 2020