Yolov3: KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

Created on 16 Mar 2020 · 6Comments · Source: ultralytics/yolov3

Thank you very much for your contribution so that we can use this amazing repo, but I had some problems while training my own dataset, and the results didn't look very good. All parameter settings are default. Could you give me some advice for better results? Thanks a lot again O(∩_∩)O

There are results:
results
results

bug

Source

limingcv

🚀1

Most helpful comment

@limingcv I'm able to recreate this error with the following:

python3 train.py --epochs 3 --data coco16.data --weights ''
python3 train.py --epochs 6 --data coco16.data --weights weights/last.pt

It seems to be related to the order of optimizer instantiation. I don't have time to address this at the moment, so I would simply train for full epochs in the meantime.
https://discuss.pytorch.org/t/a-problem-occured-when-resuming-an-optimizer/28822

A temporary, less preferable solution is to convert last.pt into a backbone first, which strips the optimizer and resets last_epoch to -1 to avoid the problem, though you will also lose the optimizer gradient and LR schedule:

Python code:

from utils.utils import *; create_backbone('weights/last.pt')

And then train (bash code):

python3 train.py --weights weights/backbone.pt

glenn-jocher on 17 Mar 2020

👍2

All 6 comments

Hello @limingcv, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Google Colab Notebook, Docker Image, and GCP Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

github-actions[bot] on 16 Mar 2020

@limingcv looks pretty good! What seems to be the problem?

glenn-jocher on 16 Mar 2020

@glenn-jocher
Thank you for replying to me, I tried to fine-tine the yesterday's model, so I ran the following code

python train.py --batch-size 64 --accumulate 1 --data data/ocean.data --multi-scale --cache-images --epochs 80 --name 'anchor cluster multi-scale' --weights weights/best.pt

but it gave an error

KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

This error happened in file "train.py", line 162, in training process. what should I do for this problem?

limingcv on 17 Mar 2020

@limingcv I'm able to recreate this error with the following:

python3 train.py --epochs 3 --data coco16.data --weights ''
python3 train.py --epochs 6 --data coco16.data --weights weights/last.pt

Python code:

from utils.utils import *; create_backbone('weights/last.pt')

And then train (bash code):

python3 train.py --weights weights/backbone.pt

glenn-jocher on 17 Mar 2020

👍2

Thanks a lot, I will try it !

limingcv on 17 Mar 2020

It works with the code below that you mention, thanks

from utils.utils import *; create_backbone('weights/last.pt')
python3 train.py --weights weights/backbone.pt

ZurMaD on 25 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Transfer learning cannot be resumed?

JiahongXue · 5Comments

How to convert .pt model to mlmodel

cyberclone12 · 4Comments

RuntimeError: expected device cuda:0 and dtype Float but got device cuda:0 and dtype Bool

Blddwkb · 4Comments

shape

MichaelCong · 4Comments

Cannot find --transfer in train.py and how to freeze layers except darknet backbone?

yoga-0125 · 4Comments