Darknet: Model is training in CPU instead of GPU

Created on 23 Feb 2018  路  12Comments  路  Source: pjreddie/darknet

I recently started training a yolov2 model, using darknet on Pascal VOC database but when I check the processes running on my GPU device it does not show that process. I suspect the code is running on CPU instead of GPU. I also changed the fields GPU and CUDNN to '1' in the _Makefile._
Is there anything else I need to change too? I am using an 8 GB nvidia Tesla P4.
Command I ran: ./darknet detector train cfg/coco.data cfg/yolo.cfg darknet19_448.conv.23

Most helpful comment

Okay, I finally found a solution for this if you're using Linux (Ubuntu 16.x-18.x)!!!

  1. Edit Makefile _GPU=1_
  2. Edit Makefile _CUDNN=1_
  3. Leave Makefile _OPENCV=0_
  4. Install re-install CUDA using pip (my advice is to use version 10.1)
  5. Edit Makefile _NVCC=/usr/local/cuda/bin/nvcc_ (failure to do this will result in a recipe for target 'obj/convolutional_kernels.o' error during compilation)
  6. Compile (or recompile if you previously compiled) the Darknet binary build via admin privileges -- SUDO
  7. Train!

NOTE: You may get a warning about something being depreciated during compilation. That's okay to ignore

All 12 comments

Maybe try using the gpus option and see if that does anything.

like the following:
./darknet detector train cfg/coco.data cfg/yolo.cfg darknet19_448.conv.23 -gpus 0

Otherwise, I would suggest to go into the detector.c file and check if the function cuda_set_device() is being called.

I tried using the -gpus option but it gives the same results. Also cuda_set_device() is an in-built function right? All I can see is a call to the function, but not the definition of that function. How can I check if it is being called or not?

Okay. I tried to store the return value of cuda_set_device() in a string and print it but it doesn't give any output. Does that mean it is not being called?

聽cuda_set_device is a CUDA function. If it is being called then it means that darkent is trying to use your GPU.

See below:

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g159587909ffa0791bbe4b40187a4c6bb

Note that it returns a cudaError_t聽 type which is an enum, not a string type. So check if it returns anything other than 0. If so, check here to know what it means

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038

in Makefile, change

GPU=1

Then compile again.

And in the training command, add -gpus 0 in the end.

Hello,
Seems I have the similar issue, when I try to train a model on gpus my cpus is going to max performance, but in same time I see that gpus is working too, This is normal?
train usage monitor

hi .
i have the same issue. my darknet training is running on cpu.. I also changed the fields GPU and CUDNN to '1' in the Makefile.
Is there anything else I need to change too? I am using GeForce GTX 1080 nvidia GPU
Command I ran: ./darknet detector train cfg/coco.data cfg/yolo.cfg darknet19_448.conv.23

Okay, I finally found a solution for this if you're using Linux (Ubuntu 16.x-18.x)!!!

  1. Edit Makefile _GPU=1_
  2. Edit Makefile _CUDNN=1_
  3. Leave Makefile _OPENCV=0_
  4. Install re-install CUDA using pip (my advice is to use version 10.1)
  5. Edit Makefile _NVCC=/usr/local/cuda/bin/nvcc_ (failure to do this will result in a recipe for target 'obj/convolutional_kernels.o' error during compilation)
  6. Compile (or recompile if you previously compiled) the Darknet binary build via admin privileges -- SUDO
  7. Train!

NOTE: You may get a warning about something being depreciated during compilation. That's okay to ignore

Did anyone solve this???

Other than GigaMatt thanks

Followed all of the steps, except for :
-downgrading CUDA
-OPENCV=1
-the 6th step.
Although in my case, NVCC=/usr/local/cuda-10.2/bin/nvcc in the 5th step . GPU is now being used.

Thanks a lot!

for me just using GPU=1 in makefile worked, after recompilation in google colab

Was this page helpful?
0 / 5 - 0 ratings

Related issues

spaul13 picture spaul13  路  3Comments

sayanmutd picture sayanmutd  路  3Comments

bujingdexin picture bujingdexin  路  3Comments

job2003 picture job2003  路  3Comments

AndyZX picture AndyZX  路  3Comments