When doing transfer learning we need to switch between phases.
Normally, the first phase is to freeze all but the head of the model and train only that.
After a predefined amount of epochs, we unfreeze the rest of our model (or a part of it) and start training again (possibly with the help of differential learning rates, described in #2005). We can repeat this phase as many times as we like.
We should implement a class that handles all of that for us, this includes:
lr_scheduler parameters between phasesLearningRateLogger is being used, register the new lr_scheduler This will take care of what I call "phase switches"
There are some ways of achieving this:
on_epoch_startdef on_epoch_start(self):
if self.current_epoch == 0:
self.freeze()
self.trainer.lr_schedulers = ... # Define new scheduler
if self.current_epoch == N_FREEZE_EPOCHS:
self.unfreeze() # Or partially unfreeze
self.trainer.lr_schedulers = ... # Define new scheduler
We can keep adding as many milestones as we want this way, but it's important to note that they all have to be define beforehand.
Trainer.fitmodel.freeze()
trainer.fit_one_cycle(model, n_epochs=2, lr=1e-3, pct_start=0.9)
model.unfreeze()
trainer.fit_one_cycle(mode, n_epochs=5, lr=slice(5e-6, 5e-4), pct_start=0.2)
This is exactly the flow on fastai, this way of training model is excellent for iterative training, like on a notebook or a REPL.
fit_one_cycle assumes that we are using the OneCycleLR scheduler, assumes that each call is a continuation of the last, and assumes we want to reset our schedule
When we pass a slice to lr we are asking for a interpolation of values between the trainable layer groups
The scheduler receives a list of dicts, each dict will specify the duration of the phase and it's configuration (what layers to freeze, what lrs to use, ...)
scheduler = FineTuneScheduler([
{'params': [nn.Sequential(self.c_d1, self.c_d1_bn), self.c_d2], 'action': 'freeze', 'epoch': 0},
{'params': [self.c_d2], 'action': 'unfreeze', 'epoch': 2},
])
Then we can just pass the scheduler to the Trainer.
In both cases, the flow should be the same for all standard areas (vision, nlp, time-series,...).
The only things we assume is:
I personally like the approach of calling Trainer.fit (or some variation) multiple times more.
It allows me to have more control on how to train my model. Usually transfer learning happens on small datasets, so it's possible for the user to train some epochs, see what happens, and only then decide if it's time to unfreeze some layers or run some more epochs on the current configuration.
Added a new proposal to OP, the scheduler interface suggested by @williamFalcon
I think the main benefit of this approach is that it's easily reproducible, because we are using a list of dicts (configs), I think we can even store the scheduler into as a config file in the future.
Another option with the scheduler, would be to pass a function to it instead of predefined actions, it would look like something like this:
def phase1(trainer, model):
model.freeze()
sched = OneCycleLR(...)
trainer.new_schedule(sched)
def phase2(trainer, model):
model.unfreeze()
sched = OneCycleLR(...) # Differential LRs can be introduced here
trainer.new_schedule(sched)
sched = FineTuneScheduler([
{'func': phase1, 'epoch': 0},
{'func': phase2, 'epoch': 5},
])
This gives the user full control on what happens in these phases
If you think about it, this is not even a specific FineTunerScheduler, it's more like a LambdaScheduler, you can inject any functionality you want with it, very powerful.
We can then implement helper functions to make the definition of differential learning rates, reseting schedulers easier. But it would be up to the user to construct what we wants =)
One thing I don't currently like about it though, is that when creating a new scheduler I also need to know the duration of the phase. Maybe we can change is signature to:
def phase(trainer, model, n_epochs)
And then, as @williamFalcon suggested again, we can implement a scheduler that is really specific to the standard transfer learning case:
class FineTuneScheduler(Scheduler):
def __init__(self, pretrained, head, head_unfreeze_epoch):
...
# unfreeze head after 1 epoch
sched = FineTuneScheduler(nn.Sequential(self.c_d1, self.c_d1_bn), self.c_d2, 1)
# unfreeze head after 10 epoch
sched = FineTuneScheduler(nn.Sequential(self.c_d1, self.c_d1_bn), self.c_d2, 10)
This can be easily built on top of LambdaScheduler
I would go the scheduler way with duct config as it can be simply stored and even without load/run you can see what you did in past, kind or history notes
@PyTorchLightning/core-contributors any other thoughts?
When restoring a checkpoint for finetuning a model, users still need a way to reset the current_epoch and global_step to 0.
Do we still need a GH issue to handle this aside from params_group and differentiable learning rate features?
A hack to this was described by @lgvaz
class MyTrainer(Trainer):
def restore_weights(self, model: LightningModule):
res = super().restore_weights(model)
self.reset_lr_schedulers()
return res
def reset_lr_schedulers(self):
for sched in self.lr_schedulers:
sched['scheduler'].last_epoch = 0
Is there a better way? If we pass both resume_from_checkpoint and lr_schedulers params to the Trainer, will the new lr_schedulers override the ones saved from the saved checkpoint鈥檚 training state along with the scheduler's last_epoch?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
And then, as @williamFalcon suggested again, we can implement a scheduler that is really specific to the standard transfer learning case:
This can be easily built on top of
LambdaScheduler