Darknet: cuDNN "CUDNN_STATUS_EXECUTION_FAILED: Success" error after processing

Created on 27 Feb 2019 · 14Comments · Source: AlexeyAB/darknet

I have downloaded and compiled OpenCV 3.4.0 with CUDA support and also have set CUDA, CUDNN, and OPENCV all to 1 in the darknet Makefile and built it without issue. When I run darknet against a video, e.g. ./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights test.mp4 -dont_show -out_filename output.mp4 it runs through the whole video listing the detected objects but once it gets to the end, it outputs

Stream closed.

 cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 535 : build time: Feb 27 2019 - 13:14:21 
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED: Success
darknet: ./src/utils.c:281: error: Assertion `0' failed.

This is on Ubuntu 18.04 with nvidia 410.48, CUDA 10, and cuDNN 7.4.2.24. It seems to do the processing fine, it just has a problem writing the file? The output file exists but it won't open as it's not a valid video.

Bug fixed

Source

kgunnar

All 14 comments

@kgunnar Hi,

Can you try it without -out_filename output.mp4 ?
Can you try with another input avi/mp4-file?
Can you try it with CUDNN=0 in the Makefile?

AlexeyAB on 27 Feb 2019

Hi @AlexeyAB,

I tried without specifying an -out_filename and it returned the same error.

I tried different MP4 files, they also ran through the classification fine but generate the error at the end.

I changed CUDNN=0 and rebuilt darknet. It processes at a slower FPS and then generates a similar error that references CUDA instead of cuDNN:

Stream closed.
CUDA status Error: file: ./src/cuda.c : () : line: 29 : build time: Mar  1 2019 - 07:21:48 
CUDA Error: driver shutting down
CUDA Error: driver shutting down: Success
darknet: ./src/utils.c:281: error: Assertion `0' failed.
Aborted (core dumped)

kgunnar on 1 Mar 2019

Yeah, I have the same problem recently. When I have the mp4 tested, the movie was running so quick, at the end of the movie, the mp4 closed, and the same error occurred. Besides, when I had my model trained, my computer's memory was filled step by step, then it was full. I thought It's another BUG.

jiaozhentian on 5 Mar 2019

@jiaozhentian

Try to use the latest version of this repository.
What command do you use?
What parameters do you use in the Makefile?
What CUDA, cuDNN, OpenCV versions do you use?
What GPU do you use?
Can you attach your mp4-file?

AlexeyAB on 5 Mar 2019

@AlexeyAB Hi,

I did use the latest version of this repository, I cloned it 3 days ago.
My train conmmand is ./darknet detector train ./cfg/obj.data ./cfg/yolo-tiny_3l_obj.cfg ./darknet/ -dont_show -map, and my mp4 test command is just like the Author's.
In Makefile, I set GPU, CUDNN, CUDNN_HALF, OPENCV, AVX and LIBSO to 1, and then use the ARCH= -gencode arch=compute_75,code=[sm_75,compute_75] cause my GPU is RTX2080. and then I edited the path of cuDNN becasue I put the cuDNN's file in CUDA's floder. Well, there is no error but only warnings when I ran make.
The version of CUDA is 10.0, cuDNN is 7.4, Opencv is 3.4.5, by the way, my computer is i7-8700K, 16G RAM, RTX2080-8G.
It's RTX2080-8G.
Sorry about that cause our Mp4-files is used for my team's reasearch, but the Mp4-file is only less than 1 minute, and it's resolution is 480P.
Looking for your progress on this repository, thank you.

jiaozhentian on 5 Mar 2019

@jiaozhentian Thanks! I will try to find the issue.

AlexeyAB on 5 Mar 2019

@AlexeyAB Hi, I think I'v found the problem, when I set random = 1 in .cfg file, the computer's memory would be filled up step by step, while there is no BUG like that if I set random to 0.

jiaozhentian on 13 Mar 2019

👍1

Hi AlexeyAB, I have a similiar problem cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED.
My environment is on Win10, cuda9.0, cuda7.1.3,Opencv3.4.0, VS2015, RTX2080ti
I set up follow your latest version, and then write
darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights -i 0 -thresh 0.25

....
106 yolo
Total BFLOPS 65.864
Allocate additional workspace_size = 1099.43 MB
Loading weights from yolov3.weights...
seen 64
Done!
Enter Image Path: data/dog.jpg

I write data/dog.jpg then it outputs

cuDNN status Error in: file: D:/Patterns/darknet-master-test/src/convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 544 : build time: Mar 15 2019 - 14:10:21
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED

What could be the problem?
I have this problem on your latest version, but I can run your last version https://github.com/AlexeyAB/darknet/tree/Yolo_v3 in the same environment

ToNYLin66 on 15 Mar 2019

Hi AlexeyAB, I have a similiar problem cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED.
My environment is on Win10, cuda9.0, cuda7.1.3,Opencv3.4.0, VS2015, RTX2080ti
I set up follow your latest version, and then write
darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights -i 0 -thresh 0.25

....
106 yolo
Total BFLOPS 65.864
Allocate additional workspace_size = 1099.43 MB
Loading weights from yolov3.weights...
seen 64
Done!
Enter Image Path: data/dog.jpg

I write data/dog.jpg then it outputs

cuDNN status Error in: file: D:/Patterns/darknet-master-test/src/convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 544 : build time: Mar 15 2019 - 14:10:21
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED

What could be the problem?
I have this problem on your latest version, but I can run your last version https://github.com/AlexeyAB/darknet/tree/Yolo_v3 in the same environment

I solve this with the latest cuda 10.1, cudnn 7.5.0.56, and driver 419.35

ToNYLin66 on 16 Mar 2019

Same error here while training with custom dataset.

drapado on 16 Mar 2019

@drapado

What versions of CUDA, cuDNN and OpenCV do you use?
Do you train with random=1 in the cfg-file?

AlexeyAB on 16 Mar 2019

I was using CUDA 10 and cudnn 7.5 (the latest verions) with opencv 3.4. Also with random=1 activated.
However, after updating the code from the repo it has never happened anymore. I will let you now if the error appears again!

drapado on 17 Mar 2019

👍1

@jiaozhentian Hi,

Hi, I think I'v found the problem, when I set random = 1 in .cfg file, the computer's memory would be filled up step by step, while there is no BUG like that if I set random to 0.

So, does it happen only if both random=1 and flag -map is used for training?

AlexeyAB on 17 Mar 2019

@jiaozhentian @kgunnar @ToNYLin66
I fixed memory leak during training with flag -map and random=1 in cfg-file.

AlexeyAB on 22 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings