Yolov3: KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

Created on 16 Mar 2020  Â·  6Comments  Â·  Source: ultralytics/yolov3

Thank you very much for your contribution so that we can use this amazing repo, but I had some problems while training my own dataset, and the results didn't look very good. All parameter settings are default. Could you give me some advice for better results? Thanks a lot again O(∩_∩)O

There are results:
results
results

bug

Most helpful comment

@limingcv I'm able to recreate this error with the following:

python3 train.py --epochs 3 --data coco16.data --weights ''
python3 train.py --epochs 6 --data coco16.data --weights weights/last.pt

It seems to be related to the order of optimizer instantiation. I don't have time to address this at the moment, so I would simply train for full epochs in the meantime.
https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822

A temporary, less preferable solution is to convert last.pt into a backbone first, which strips the optimizer and resets last_epoch to -1 to avoid the problem, though you will also lose the optimizer gradient and LR schedule:

Python code:

from utils.utils import *; create_backbone('weights/last.pt')

And then train (bash code):

python3 train.py --weights weights/backbone.pt

All 6 comments

Hello @limingcv, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

@limingcv looks pretty good! What seems to be the problem?

@glenn-jocher
Thank you for replying to me, I tried to fine-tine the yesterday's model, so I ran the following code

python train.py --batch-size 64 --accumulate 1 --data data/ocean.data --multi-scale --cache-images --epochs 80 --name 'anchor cluster multi-scale' --weights weights/best.pt

but it gave an error

KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

This error happened in file "train.py", line 162, in training process. what should I do for this problem?

@limingcv I'm able to recreate this error with the following:

python3 train.py --epochs 3 --data coco16.data --weights ''
python3 train.py --epochs 6 --data coco16.data --weights weights/last.pt

It seems to be related to the order of optimizer instantiation. I don't have time to address this at the moment, so I would simply train for full epochs in the meantime.
https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822

A temporary, less preferable solution is to convert last.pt into a backbone first, which strips the optimizer and resets last_epoch to -1 to avoid the problem, though you will also lose the optimizer gradient and LR schedule:

Python code:

from utils.utils import *; create_backbone('weights/last.pt')

And then train (bash code):

python3 train.py --weights weights/backbone.pt

Thanks a lot, I will try it !

It works with the code below that you mention, thanks

from utils.utils import *; create_backbone('weights/last.pt')
python3 train.py --weights weights/backbone.pt
Was this page helpful?
0 / 5 - 0 ratings

Related issues

JiahongXue picture JiahongXue  Â·  5Comments

cyberclone12 picture cyberclone12  Â·  4Comments

Blddwkb picture Blddwkb  Â·  4Comments

MichaelCong picture MichaelCong  Â·  4Comments

yoga-0125 picture yoga-0125  Â·  4Comments