I use this command to train model:python train.py --img-size 640 --batch-size 4 --epochs 300 --data ./data/garbage.yaml --cfg ./models/yolov5m.yaml --weights weights/yolov5m.pt
But the error message is:
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1050', total_memory=4096MB)
Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='./models/yolov5m.yaml', data='./data/garbage.yaml', device='', epochs=300, evolve=False, global_rank=-1, hyp='data/
hyp.finetune.yaml', img_size=[640, 640], local_rank=-1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=Fa
lse, sync_bn=False, total_batch_size=4, weights='weights/yolov5m.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0
, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Overriding ./models/yolov5m.yaml nc=80 with nc=13
from n params module arguments
0 -1 1 5280 models.common.Focus [3, 48, 3]
1 -1 1 41664 models.common.Conv [48, 96, 3, 2]
2 -1 1 67680 models.common.BottleneckCSP [96, 96, 2]
3 -1 1 166272 models.common.Conv [96, 192, 3, 2]
4 -1 1 639168 models.common.BottleneckCSP [192, 192, 6]
5 -1 1 664320 models.common.Conv [192, 384, 3, 2]
6 -1 1 2550144 models.common.BottleneckCSP [384, 384, 6]
7 -1 1 2655744 models.common.Conv [384, 768, 3, 2]
8 -1 1 1476864 models.common.SPP [768, 768, [5, 9, 13]]
9 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False]
10 -1 1 295680 models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 1219968 models.common.BottleneckCSP [768, 384, 2, False]
14 -1 1 74112 models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 305856 models.common.BottleneckCSP [384, 192, 2, False]
18 -1 1 332160 models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 1072512 models.common.BottleneckCSP [384, 384, 2, False]
21 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 4283136 models.common.BottleneckCSP [768, 768, 2, False]
24 [17, 20, 23] 1 72738 models.yolo.Detect [13, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Model Summary: 263 layers, 2.15343e+07 parameters, 2.15343e+07 gradients
Transferred 506/514 items from weights/yolov5m.pt
Optimizer groups: 86 .bias, 94 conv.weight, 83 other
Scanning labels E:\test_opencv\yolov5-master\dataset\labels\train_small_image.cache (12752 found, 0 missing, 0 empty, 0 duplicate, for 12752 images): 12752it [00:00, 25988.17it/s]
Scanning labels E:\test_opencv\yolov5-master\dataset\labels\test_small_image.cache (3443 found, 0 missing, 0 empty, 134 duplicate, for 3443 images): 3443it [00:00, 23168.64it/s]
Analyzing anchors... anchors/target = 4.14, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 4 dataloader workers
Starting training for 300 epochs...
You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3sitecustomize.py
You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3sitecustomize.py
You have uninstalled pretty_errors but it is still present in your python startup. Please remove its section from file:
E:\Anaconda3sitecustomize.py
Traceback (most recent call last):
File "
Traceback (most recent call last):
File "train.py", line 453, in
File "E:\Anaconda3\lib\multiprocessingspawn.py", line 105, in spawn_main
train(hyp, opt, device, tb_writer) exitcode = _main(fd)
File "E:\Anaconda3\lib\multiprocessingspawn.py", line 114, in _main
File "train.py", line 237, in train
prepare(preparation_data)
File "E:\Anaconda3\lib\multiprocessingspawn.py", line 225, in prepare
pbar = enumerate(dataloader)
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Anaconda3\libsite-packages\torch\utils\data\dataloader.py", line 291, in __iter__
File "E:\Anaconda3\lib\multiprocessingspawn.py", line 277, in _fixup_main_from_path
return _MultiProcessingDataLoaderIter(self)
run_name="__mp_main__")
File "E:\Anaconda3\libsite-packages\torch\utils\data\dataloader.py", line 737, in __init__
File "E:\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "E:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
w.start() mod_name, mod_spec, pkg_name, script_name)
File "E:\Anaconda3\lib\runpy.py", line 85, in _run_code
File "E:\Anaconda3\lib\multiprocessing\process.py", line 112, in start
exec(code, run_globals)
File "E:\test_opencv\yolov5-master\train.py", line 10, in
import torch.distributed as dist
self._popen = self._Popen(self) File "E:\Anaconda3\libsite-packages\torch__init__.py", line 116, in
File "E:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
raise err
OSError: [WinError 1455] Error loading "E:\Anaconda3\libsite-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "E:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
Hello @dapsjj, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
Reduce the number of workers using 鈥攚orkers 2 or even 鈥攚orkers 0. You dont need 4 workers for a batch size of 4.
@glenn-jocher same kind of error I was getting on Windows, maybe we could revisit the workers formula
@dapsjj it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment (conda is not recommended), clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
Reduce the number of workers using 鈥攚orkers 2 or even 鈥攚orkers 0. You dont need 4 workers for a batch size of 4.
@glenn-jocher same kind of error I was getting on Windows, maybe we could revisit the workers formula
You are right,my GPU performance is not good, I can run it only by setting batch-size to 1.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Reduce the number of workers using 鈥攚orkers 2 or even 鈥攚orkers 0. You dont need 4 workers for a batch size of 4.
@glenn-jocher same kind of error I was getting on Windows, maybe we could revisit the workers formula