Darknet: cuDNN "CUDNN_STATUS_EXECUTION_FAILED: Success" error after processing

Created on 27 Feb 2019  路  14Comments  路  Source: AlexeyAB/darknet

I have downloaded and compiled OpenCV 3.4.0 with CUDA support and also have set CUDA, CUDNN, and OPENCV all to 1 in the darknet Makefile and built it without issue. When I run darknet against a video, e.g. ./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights test.mp4 -dont_show -out_filename output.mp4 it runs through the whole video listing the detected objects but once it gets to the end, it outputs

Stream closed.

 cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 535 : build time: Feb 27 2019 - 13:14:21 
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED: Success
darknet: ./src/utils.c:281: error: Assertion `0' failed.

This is on Ubuntu 18.04 with nvidia 410.48, CUDA 10, and cuDNN 7.4.2.24. It seems to do the processing fine, it just has a problem writing the file? The output file exists but it won't open as it's not a valid video.

Bug fixed

All 14 comments

@kgunnar Hi,

  • Can you try it without -out_filename output.mp4 ?

  • Can you try with another input avi/mp4-file?

  • Can you try it with CUDNN=0 in the Makefile?

Hi @AlexeyAB,

I tried without specifying an -out_filename and it returned the same error.

I tried different MP4 files, they also ran through the classification fine but generate the error at the end.

I changed CUDNN=0 and rebuilt darknet. It processes at a slower FPS and then generates a similar error that references CUDA instead of cuDNN:

Stream closed.
CUDA status Error: file: ./src/cuda.c : () : line: 29 : build time: Mar  1 2019 - 07:21:48 
CUDA Error: driver shutting down
CUDA Error: driver shutting down: Success
darknet: ./src/utils.c:281: error: Assertion `0' failed.
Aborted (core dumped)

Yeah, I have the same problem recently. When I have the mp4 tested, the movie was running so quick, at the end of the movie, the mp4 closed, and the same error occurred. Besides, when I had my model trained, my computer's memory was filled step by step, then it was full. I thought It's another BUG.

@jiaozhentian

  • Try to use the latest version of this repository.
  • What command do you use?
  • What parameters do you use in the Makefile?
  • What CUDA, cuDNN, OpenCV versions do you use?
  • What GPU do you use?
  • Can you attach your mp4-file?

@AlexeyAB Hi,

  1. I did use the latest version of this repository, I cloned it 3 days ago.
  2. My train conmmand is ./darknet detector train ./cfg/obj.data ./cfg/yolo-tiny_3l_obj.cfg ./darknet/ -dont_show -map, and my mp4 test command is just like the Author's.
  3. In Makefile, I set GPU, CUDNN, CUDNN_HALF, OPENCV, AVX and LIBSO to 1, and then use the ARCH= -gencode arch=compute_75,code=[sm_75,compute_75] cause my GPU is RTX2080. and then I edited the path of cuDNN becasue I put the cuDNN's file in CUDA's floder. Well, there is no error but only warnings when I ran make.
  4. The version of CUDA is 10.0, cuDNN is 7.4, Opencv is 3.4.5, by the way, my computer is i7-8700K, 16G RAM, RTX2080-8G.
  5. It's RTX2080-8G.
  6. Sorry about that cause our Mp4-files is used for my team's reasearch, but the Mp4-file is only less than 1 minute, and it's resolution is 480P.
    Looking for your progress on this repository, thank you.

@jiaozhentian Thanks! I will try to find the issue.

@AlexeyAB Hi, I think I'v found the problem, when I set random = 1 in .cfg file, the computer's memory would be filled up step by step, while there is no BUG like that if I set random to 0.

Hi AlexeyAB, I have a similiar problem cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED.
My environment is on Win10, cuda9.0, cuda7.1.3,Opencv3.4.0, VS2015, RTX2080ti
I set up follow your latest version, and then write
darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights -i 0 -thresh 0.25

....
106 yolo
Total BFLOPS 65.864
Allocate additional workspace_size = 1099.43 MB
Loading weights from yolov3.weights...
seen 64
Done!
Enter Image Path: data/dog.jpg

I write data/dog.jpg then it outputs

cuDNN status Error in: file: D:/Patterns/darknet-master-test/src/convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 544 : build time: Mar 15 2019 - 14:10:21
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED

What could be the problem?
I have this problem on your latest version, but I can run your last version https://github.com/AlexeyAB/darknet/tree/Yolo_v3 in the same environment

Hi AlexeyAB, I have a similiar problem cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED.
My environment is on Win10, cuda9.0, cuda7.1.3,Opencv3.4.0, VS2015, RTX2080ti
I set up follow your latest version, and then write
darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights -i 0 -thresh 0.25

....
106 yolo
Total BFLOPS 65.864
Allocate additional workspace_size = 1099.43 MB
Loading weights from yolov3.weights...
seen 64
Done!
Enter Image Path: data/dog.jpg

I write data/dog.jpg then it outputs

cuDNN status Error in: file: D:/Patterns/darknet-master-test/src/convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 544 : build time: Mar 15 2019 - 14:10:21
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED

What could be the problem?
I have this problem on your latest version, but I can run your last version https://github.com/AlexeyAB/darknet/tree/Yolo_v3 in the same environment

I solve this with the latest cuda 10.1, cudnn 7.5.0.56, and driver 419.35

Same error here while training with custom dataset.

@drapado

  • What versions of CUDA, cuDNN and OpenCV do you use?
  • Do you train with random=1 in the cfg-file?

I was using CUDA 10 and cudnn 7.5 (the latest verions) with opencv 3.4. Also with random=1 activated.
However, after updating the code from the repo it has never happened anymore. I will let you now if the error appears again!

@jiaozhentian Hi,

Hi, I think I'v found the problem, when I set random = 1 in .cfg file, the computer's memory would be filled up step by step, while there is no BUG like that if I set random to 0.

So, does it happen only if both random=1 and flag -map is used for training?

@jiaozhentian @kgunnar @ToNYLin66
I fixed memory leak during training with flag -map and random=1 in cfg-file.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PROGRAMMINGENGINEER-NIKI picture PROGRAMMINGENGINEER-NIKI  路  3Comments

Jacky3213 picture Jacky3213  路  3Comments

kebundsc picture kebundsc  路  3Comments

jasleen137 picture jasleen137  路  3Comments

Greta-A picture Greta-A  路  3Comments