Yolov5: How to save weights & resume training?

Created on 11 Oct 2020 · 9Comments · Source: ultralytics/yolov5

❔Question

The question is very straightforward - How do i save weights in drive & further resume train from the previous trained weights (like previous yolov3/v4) ??
Is it possible with yolov5 ?

Additional context

I don't find any clue . please give me resources. It''ll be very helpful for me.

question

Source

whoafridi

Most helpful comment

@whoafridi when you start training with any command your experiment is saved in yolov5/runs/exp.... If your training is interrupted for any reason, the following command will resume your partially completed training from the most recently updated experiment:

python train.py --resume

or from a specific experiment:

python train.py --resume runs/exp17/weights/last.pt

glenn-jocher on 11 Oct 2020

👍2 ❤1

All 9 comments

Hello @whoafridi, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

github-actions[bot] on 11 Oct 2020

python train.py --resume

or from a specific experiment:

python train.py --resume runs/exp17/weights/last.pt

glenn-jocher on 11 Oct 2020

👍2 ❤1

Great. But I don't have GPU so, there is any option to save the weights in google drive instead of that particular folder??
It'll very much needed. Thank you @glenn-jocher

whoafridi on 11 Oct 2020

Your hardware is irrelevant for logging. See train.py argparser for logging to arbitrary destinations:
https://github.com/ultralytics/yolov5/blob/10c85bf4ebf51cdf7d974ce0212bcb420e0a66bb/train.py#L403

glenn-jocher on 11 Oct 2020

❤1

sure . Thanks for giving this. @glenn-jocher Thank you again

whoafridi on 11 Oct 2020

@whoafridi see https://github.com/ultralytics/yolov5/issues/640#issuecomment-670317119 for specific example of checkpointing to google drive from colab notebook.

glenn-jocher on 11 Oct 2020

❤1

Sure .

whoafridi on 11 Oct 2020

how to resume with "python -m torch.distributed.launch --nproc_per_node 2 train.py --resume runs/exp4/weights/last.pt"

the log
"""
23 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
24 [17, 20, 23] 1 16182 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.25509e+06 parameters, 7.25509e+06 gradients

Transferred 370/370 items from runs/exp4/weights/last.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other

Traceback (most recent call last):
File "train.py", line 460, in
train(hyp, opt, device, tb_writer)
File "train.py", line 138, in train
shutil.copytree(wdir, wdir.parent / f'weights_backup_epoch{start_epoch - 1}') # save previous weights
File "/opt/conda/lib/python3.6/shutil.py", line 321, in copytree
os.makedirs(dst)
File "/opt/conda/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'runs/exp4/weights_backup_epoch54'
"""

I delete the weights_backup_epoch54 and it will be create again