Yolov5: Training very very slowly

Created on 29 Oct 2020 · 11Comments · Source: ultralytics/yolov5

❔Question

Traing very very very slowly, and the GPU-Util is always 0 by nvidia-smi, however the GPU Memory Usage about 20G+.
Is this normal？

Additional context

Here is my env:
yolov5 version :83deec
Python : 3.8
CUDA : 10.1
cudnn: 7.6.3
PyTorch: 1.6.0
GPU: Tesla V100 32G Mem version.
I train yolov5m with 20k+ images, the GPU usage always 0.

Stale question

Source

mengban

Most helpful comment

@SiyangXie @dongjuns yes the nvidia-smi command is the best way to monitor GPU stats.

A new option for monitoring GPU utilization is also W&B logging, which plots your utilization, temperature, CUDA memory over your full training run. Here are stats for a COCO128 YOLOv5x training with a V100 on Colab Pro. We are putting togethor tutorials this week for our recent W&B integration.
Screenshot 2020-11-02 at 11 32 46

glenn-jocher on 2 Nov 2020

👍2

All 11 comments

Hello @mengban, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

github-actions[bot] on 29 Oct 2020

@mengban GPU utilisation should be about 90% when running nvidia-smi. You may have environment problems. I would recommend the Docker Image as an easy way to reproduce our environment while exploiting your hardware.

Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

glenn-jocher on 29 Oct 2020

@mengban GPU utilisation should be about 90% when running nvidia-smi. You may have environment problems. I would recommend the Docker Image as an easy way to reproduce our environment while exploiting your hardware.

Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:
$ pip install -r requirements.txt
Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:

Kaggle Notebook with free GPU: https://www.kaggle.com/ultralytics/yolov5

Google Cloud Deep Learning VM. See GCP Quickstart Guide

Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

Thanks for your reply. I re-install the package with pip install -r requirements.txt, and my problem still exists.
And I find that the 8(num of workers)*CPU works nearly 100%, so I think perhaps it's caused by my dataset. In my dataset, the image pixel is about 3000 * 4000 , even 6000 * 4000... and the number of box in the single image nearly 100+, so I think CPU can't feed data into GPU in time and then slow the whole training process. what do u think?

mengban on 30 Oct 2020

@mengban both CPU and GPU utilization should be 90-100%. 8 --workers is the default, you're free to vary as you see fit.

As I said try the docker image.

glenn-jocher on 30 Oct 2020

👍1

Docker usage link, https://github.com/ultralytics/yolov5/wiki/Docker-Quickstart

sudo docker run --ipc=host --gpus all -it -v "$(pwd)"/yourDirectory:/usr/src/yourDirectory ultralytics/yolov5:latest

replace 'yourDirectory' to your directory which you want to use in YOLOv5 docker container.

dongjuns on 31 Oct 2020

Docker usage link, https://github.com/ultralytics/yolov5/wiki/Docker-Quickstart
sudo docker run --ipc=host --gpus all -it -v "$(pwd)"/yourDirectory:/usr/src/yourDirectory ultralytics/yolov5:latest
replace 'yourDirectory' to your directory which you want to use in YOLOv5 docker container.

thanks, bro. I'll have a try.

mengban on 31 Oct 2020

😄1

+1, in the docker container, yolov5 directory placed on /usr/src/app

dongjuns on 31 Oct 2020

So where do you see your GPU-Util? I don't see it when training.

SiyangXie on 2 Nov 2020

@SiyangXie
Use the command in the terminal space

nvidia-smi
watch nvidia-smi

dongjuns on 2 Nov 2020

👍1

@SiyangXie @dongjuns yes the nvidia-smi command is the best way to monitor GPU stats.

glenn-jocher on 2 Nov 2020

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] on 3 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings