I am getting the following error when trying to train using my custom dataset.

Hello @krishnam3065, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@krishnam3065 it appears you have environment problems. In order to run YOLOv5 correctly your environment must meet the minimum version requirements for the dependencies described in https://github.com/ultralytics/yolov5#requirements. You can either update your local environment to bring it into compliance or you can use one of our verified environment options below.
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
I met the same problem. And it seems to be a problem of pytorch/torchvision. See their issues like: https://github.com/pytorch/vision/issues/1405 ...
The environment I have:
PyTorch version: 1.5.1
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 20.04 LTS
GCC version: (Ubuntu 9.3.0-10ubuntu2) 9.3.0
CMake version: version 3.16.3
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
Nvidia driver version: 440.100
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.17.0
[pip3] torch==1.5.1
[pip3] torchvision==0.6.1
[conda] blas 1.0 mkl
[conda] mkl 2020.1 217
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.1.0 py37h23d657b_0
[conda] mkl_random 1.1.1 py37h0573a6f_0
[conda] pytorch 1.5.1 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchvision 0.6.1 pypi_0 pypi
I also tried to compiled pytorch and torchvision from source, but no luck.
@chengjiun seems like a conda issue. You can try one of the 4 working environments above, they are all verified working correctly.
@glenn-jocher yeah. Docker works fine. I need to run it locally, so the other solutions don't help at all. And, I am not used to project development in docker, so it is better if I can solve the issue in native environment.
I am not sure if it is caused by conda. I have tried compiled pytorch, torchvision from source, but problem is the same. Anyway, it is related to some inconsistency among installed libraries of pytorch, torchvision, cuda, and perhaps cudnn. Anyway, I have wasted half day to track this issue, and will stay in docker for now.
[update]
it is strange. I used docker to run yolov5 (train.py) successfully. But, I got the identical error message, if I directly run "torch.ops.torchvision.nms" in yolov5 docker. Somehow, the docker version of yolov5 bypass the torch.ops.torchvision.nms?
p.s. torch.ops.torchvision.nms works fine in colab tutorial code.
@chengjiun @glenn-jocher
I found a solution
first
conda uninstall pytorch torchvision cudatoolkit=10.1 -c pytorch
pip uninstall numpy
then
pip install numpy==1.17.0
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
我找到解決的方法了
先執行
conda uninstall pytorch torchvision cudatoolkit=10.1 -c pytorch
pip uninstall numpy
再執行
pip install numpy==1.17.0
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
Thanks everyone this issue is resolved when torch and torchvision was re installed
@xandyxor your solution works well.
I use numpy 1.17.3 and scipy 1.4.1 instead of the latest numpy 1.19.2 and scipy 1.6.1. The problem has been solved.
@xandyxor thanks!! your solution works.. it was probably because of the incompatible versions
Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
@xandyxor your solution works well.
I use numpy 1.17.3 and scipy 1.4.1 instead of the latest numpy 1.19.2 and scipy 1.6.1. The problem has been solved.
This solved the issue!
Most helpful comment
Thanks everyone this issue is resolved when torch and torchvision was re installed