Pytorch-lightning: learning rate warmup

Created on 8 Oct 2019 · 6Comments · Source: PyTorchLightning/pytorch-lightning

What is the most appropriate way to add learning rate warmup ?
I am thinking about using the hooks.def on_batch_end(self):, but not sure where to put this function to ? Thank you.

question

Source

chuong98

Most helpful comment

You can also override optimizer_step and do it there. Here's an example where the first 500 batches are for warm up.

    def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i, opt_closure):
        if self.trainer.global_step < 500:
            lr_scale = min(1., float(self.trainer.global_step + 1) / 500.)
            for pg in optimizer.param_groups:
                pg['lr'] = lr_scale * self.hparams.learning_rate

        optimizer.step()
        optimizer.zero_grad()

williamFalcon on 6 Nov 2019

👍7

All 6 comments

You can use a learning rate scheduler and return it in choose_optimizers.

williamFalcon on 8 Oct 2019

Well, learning_rate_warmup change learning rate every batch. Most learning rate scheduler just change after each epoch. Can you explain how to use choose_optimizer to do lr_warmup???

chuong98 on 8 Oct 2019

Same question here. In Transformer, the LR is adjusted by training step, not epoch. Is there a solution?

magic282 on 6 Nov 2019

You can also override optimizer_step and do it there. Here's an example where the first 500 batches are for warm up.

    def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i, opt_closure):
        if self.trainer.global_step < 500:
            lr_scale = min(1., float(self.trainer.global_step + 1) / 500.)
            for pg in optimizer.param_groups:
                pg['lr'] = lr_scale * self.hparams.learning_rate

        optimizer.step()
        optimizer.zero_grad()

williamFalcon on 6 Nov 2019

👍7

So, I ended up with something like this:

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.lr)

        def lr_foo(epoch):
            if epoch < self.hparams.warm_up_step:
                # warm up lr
                lr_scale = 0.1 ** (self.hparams.warm_up_step - epoch)
            else:
                lr_scale = 0.95 ** epoch

            return lr_scale

        scheduler = LambdaLR(
            optimizer,
            lr_lambda=lr_foo
        )

        return [optimizer], [scheduler]

PS: to pytorch-lighting creators and contributors: thank you for contributing, I was searching for such approach (define loss/optim/etc in model class) for years!

Red-Eyed on 15 Jun 2020

👍2

I just stumbled upon this issue, as I was also looking for a way to make my LR scheduler update on each step instead of each epoch. After doing some additional research I found that there is a better way of doing this than overwriting optimizer_step. I am guessing this features wasn't available yet when this issue initially came up, but in version 1.0.3 (don't know the exact version this has been added though) you can just do this:

def configure_optimizers(self):
    optimizer = AdamW(self.parameters(), lr=self.learning_rate)
    scheduler = InverseSquareRootLR(optimizer, self.lr_warmup_steps)
    return (
        [optimizer],
        [
            {
                'scheduler': scheduler,
                'interval': 'step',
                'frequency': 1,
                'reduce_on_plateau': False,
                'monitor': 'val_loss',
            }
        ]
    )