Darknet: GPU usage is very low when training YOLO v3

Created on 25 Dec 2018 · 7Comments · Source: pjreddie/darknet

I am training my own data on win10 with CUDA 10 and cuDNN7.4
I am using rtx 2080ti card
approximate 2000 images were used for training.
It cost approximate 60 hours for 100,000 iterations.
I found that the GPU usage is rather low....only 1.0% for most time, sometimes it will reach 30% for 1-2 seconds.
Graphic memory usage was ~60%.
Could any one help me to figure out the reason?
How to increase the GPU usage?
Please refer to the following snaps of GPU usage.

Source

sailor01

Most helpful comment

Try to check GPU-usage by using GPU-Z utility (download link at left-top corner): https://www.techpowerup.com/download/techpowerup-gpu-z/

What GitHub repository of Darknet do you use?

Thank you very much for your reply.
The GPU-Z and nvidia-smi shows more than 85% usage.
It seems windows task manager dose not present the GPU usage.

sailor01 on 25 Dec 2018

👍2

All 7 comments

Try to check GPU-usage by using GPU-Z utility (download link at left-top corner): https://www.techpowerup.com/download/techpowerup-gpu-z/
What GitHub repository of Darknet do you use?

AlexeyAB on 25 Dec 2018

👍1

Try to check GPU-usage by using GPU-Z utility (download link at left-top corner): https://www.techpowerup.com/download/techpowerup-gpu-z/

What GitHub repository of Darknet do you use?

Thank you very much for your reply.
The GPU-Z and nvidia-smi shows more than 85% usage.
It seems windows task manager dose not present the GPU usage.

sailor01 on 25 Dec 2018

👍2

Sorry, feeling like a n00b here. I have CUDA and CuDNN installed as indicated with all env vars set and all .dll and lib files in the right place, copied cudnn.lib into the darknet project and added it to the linker for compilation in Visual Studio 2019 and have configured my darknet Makefile with:

GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1

I also have OpenCV installed. My Python version is 3.7.4 and CUDA toolkit is 10.1.

I then run the darknet 'detector train' command with the option: -gpus 0,1. I have two GTX 1060's installed in my machine, which is running Windows 7 64-bit.

However, utilization for both during training is 0-2%. It would seem they are not being used at all. Is there a step I am perhaps missing to get yolov3 to utilize my GPUs or even one of them? I'm using yolov3.cfg. I have an i7 6900K processor but it's my understanding that the 1060's should still be substantially faster.

ryamldess on 27 Nov 2019

I tracked my own issue with CUDA support down to my OpenCV build. Apparently it is very difficult to build with CUDA support. My CMake config is complaining about OpenBLAS, Lapack and VTK. I'm no familiar with these libraries and have no idea if they are essential to CUDA support or not. I will attempt to hunt that down elsewhere, but if you have any pointers on that, it would be greatly appreciated. I'll review your readme again as well to see if there are any clues there.

ryamldess on 27 Nov 2019

I've tracked my CUDA issues down to my OpenCV install - apparently it hadn't compiled properly for CUDA support, which it seems is a bit of a white whale on Windows. It's complaining about missing BLAS, Lapack and VTK packages. I have no idea what Lapack or VTK are, BLAS is apparently a vector library (not sure why it doesn't just use Python's vectorization, which has SIMD support). I've had a bit of trouble tracking these down for Windows and getting them installed, so any pointers at all would be appreciated. In the meantime I have VMWare so I guess I can hack around with getting a Linux install running in a VM instance, although that will mean I can only use a max of 2 GB of VRAM. I finally got my network to train by switching to the tiny config though.

ryamldess on 1 Dec 2019

Okay, I got myself sorted. I forgot that since darknet uses C++ not Python, Python's vectorization is moot, which is why BLAS libs are necessary for vectorization. I got OpenCV compiled correctly with CUDA support on another machine with Win 10 and older GPUs (4 Titan Blacks) but better CPUs (dual Xeons). I first knew that GPU acceleration was working when my UPS started beeping to warn me I was overdrawing power from it :). I'll now try to replicate my GPU acceleration success on my main workstation which only has a 6900K processor, but slightly newer GPUs that are a little better or equivalent to the Titan Blacks (a pair of 1060s). This is all for testing for a single class. For the dozens to hundreds of classes I intend to train, I will probably be upgrading to either a 2080 Ti or Titan RTX. Even the dual Xeon machine with the Titan Blacks however was able to train my 150 images for a single class in about an hour with tiny Yolo over 5200 iterations with a final loss of ~0.1. I'm testing regular Yolo now for the same set up and it's looking like it will be around 4 hours to train the same, but the loss curve looks much better and the last time I looked loss was ~0.005, so I will probably be able to train to about halfway, or 2600 and still get better results than tiny Yolo for the trade-off of double the training time. I may see even better results on 2080 Ti or Titan RTX. I may run into an issue on my main rig, as it still has Win 7 installed, but I think the only real potential issue there is the 260 character limit for the PATH var.

I found this blog post to be very helpful for compiling OpenCV with CUDA support:

https://jamesbowley.co.uk/build-opencv-4-0-0-with-cuda-10-0-and-intel-mkl-tbb-in-windows/

It's a bit dated, so I couldn't get Ninja to work, simply using Visual Studio 16 2019 instead as my compiler. I also didn't compile with Python bindings or Anaconda; I did install MKL and TBB, however. I also just used Cmake-GUI rather than his recommendation of using the commandline, as some of his parameters were out of date. That said, following the above basically gets all the dependencies required for darknet out of the way except for cuDNN.

HTH anyone else who ends up on this comment thread.

ryamldess on 6 Dec 2019

https://jamesbowley.co.uk/build-opencv-4-0-0-with-cuda-10-0-and-intel-mkl-tbb-in-windows/

It's a bit dated, so I couldn't get Ninja to work, simply using Visual Studio 16 2019 instead as my compiler. I also didn't compile with Python bindings or Anaconda; I did install MKL and TBB, however. I also just used Cmake-GUI rather than his recommendation of using the commandline, as some of his parameters were out of date.

That guide should work without issues using ninja and with cmake from the command line. I used the parameters from the newer guide, which that one links to, the other day on the tip of the master branch. Can you let me know what errors you were seeing?