
Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:
git fetch && git status -uno to check and git pull to update your repoIf this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate results.png with utils.plot_results().
A clear and concise description of what the bug is.
Input:
import torch
a = torch.tensor([5])
c = a / 0
Output:
Traceback (most recent call last):
File "/Users/glennjocher/opt/anaconda3/envs/env1/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-be04c762b799>", line 5, in <module>
c = a / 0
RuntimeError: ZeroDivisionError
A clear and concise description of what you expected to happen.
If applicable, add screenshots to help explain your problem.
Add any other context about the problem here.
Hello @jitunayak, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@jitunayak if you do not include exact code to reproduce, there is nothing for us to do, and your issue is non-actionable.
@jitunayak @glenn-jocher This problem is caused by TensorBoard. PyTorch official has solved this problem.
run
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl --user
More details see: https://github.com/pytorch/pytorch/pull/16196 and https://github.com/pytorch/pytorch/commit/98e312cf96f6a5e23933cd8794097063ee3cbc8c
@jitunayak How big was your dataset relative to your batch size? I was able to reproduce your error with a dataset that contained fewer images than my batch size.
Decreasing the batch size to be smaller than your training dataset size may resolve your issue. e.g.:
!python train.py --img 416 --batch 2 --epochs 300 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results --nosave --cache
@josephofiowa ah, interesting. There is actually existing code to check for this use case and adjust the batch size accordingly, but I have not tested it in a while. I'll try to reproduce.
https://github.com/ultralytics/yolov5/blob/d994ed25f1cc158c30b08cc19546e4ce5a9b32cc/train.py#L169-L170
@jitunayak @Lornatang @josephofiowa I was able to reproduce the error myself using the following notebook command:
!python train.py --data coco128.yaml --epochs 3 --img 64 --batch 150
I found the problem was produced by plotting code for the first 3 batches, which was glitching since these small datasets are composed of a single batch. Just pushed a fix now, problem should be resolved. Please git pull and try again.
@glenn-jocher Confirming that resolved the issue.
Most helpful comment
@jitunayak @Lornatang @josephofiowa I was able to reproduce the error myself using the following notebook command:
!python train.py --data coco128.yaml --epochs 3 --img 64 --batch 150I found the problem was produced by plotting code for the first 3 batches, which was glitching since these small datasets are composed of a single batch. Just pushed a fix now, problem should be resolved. Please
git pulland try again.