The question is very straightforward - How do i save weights in drive & further resume train from the previous trained weights (like previous yolov3/v4) ??
Is it possible with yolov5 ?
I don't find any clue . please give me resources. It''ll be very helpful for me.
Hello @whoafridi, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@whoafridi when you start training with any command your experiment is saved in yolov5/runs/exp.... If your training is interrupted for any reason, the following command will resume your partially completed training from the most recently updated experiment:
python train.py --resume
or from a specific experiment:
python train.py --resume runs/exp17/weights/last.pt
Great. But I don't have GPU so, there is any option to save the weights in google drive instead of that particular folder??
It'll very much needed. Thank you @glenn-jocher
Your hardware is irrelevant for logging. See train.py argparser for logging to arbitrary destinations:
https://github.com/ultralytics/yolov5/blob/10c85bf4ebf51cdf7d974ce0212bcb420e0a66bb/train.py#L403
sure . Thanks for giving this. @glenn-jocher Thank you again
@whoafridi see https://github.com/ultralytics/yolov5/issues/640#issuecomment-670317119 for specific example of checkpointing to google drive from colab notebook.
Sure .
how to resume with "python -m torch.distributed.launch --nproc_per_node 2 train.py --resume runs/exp4/weights/last.pt"
the log
"""
23 -1 1 1248768 models.common.BottleneckCSP [512, 512, 1, False]
24 [17, 20, 23] 1 16182 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.25509e+06 parameters, 7.25509e+06 gradients
Transferred 370/370 items from runs/exp4/weights/last.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Traceback (most recent call last):
File "train.py", line 460, in
train(hyp, opt, device, tb_writer)
File "train.py", line 138, in train
shutil.copytree(wdir, wdir.parent / f'weights_backup_epoch{start_epoch - 1}') # save previous weights
File "/opt/conda/lib/python3.6/shutil.py", line 321, in copytree
os.makedirs(dst)
File "/opt/conda/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'runs/exp4/weights_backup_epoch54'
"""
I delete the weights_backup_epoch54 and it will be create again
@alicera HI, I have same problem.
Most helpful comment
@whoafridi when you start training with any command your experiment is saved in
yolov5/runs/exp.... If your training is interrupted for any reason, the following command will resume your partially completed training from the most recently updated experiment:or from a specific experiment: