Pytorch-lightning: learning rate warmup

Created on 8 Oct 2019  路  6Comments  路  Source: PyTorchLightning/pytorch-lightning

What is the most appropriate way to add learning rate warmup ?
I am thinking about using the hooks.def on_batch_end(self):, but not sure where to put this function to ? Thank you.

question

Most helpful comment

You can also override optimizer_step and do it there. Here's an example where the first 500 batches are for warm up.

    def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i, opt_closure):
        if self.trainer.global_step < 500:
            lr_scale = min(1., float(self.trainer.global_step + 1) / 500.)
            for pg in optimizer.param_groups:
                pg['lr'] = lr_scale * self.hparams.learning_rate

        optimizer.step()
        optimizer.zero_grad()

All 6 comments

You can use a learning rate scheduler and return it in choose_optimizers.

Well, learning_rate_warmup change learning rate every batch. Most learning rate scheduler just change after each epoch. Can you explain how to use choose_optimizer to do lr_warmup???

Same question here. In Transformer, the LR is adjusted by training step, not epoch. Is there a solution?

You can also override optimizer_step and do it there. Here's an example where the first 500 batches are for warm up.

    def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i, opt_closure):
        if self.trainer.global_step < 500:
            lr_scale = min(1., float(self.trainer.global_step + 1) / 500.)
            for pg in optimizer.param_groups:
                pg['lr'] = lr_scale * self.hparams.learning_rate

        optimizer.step()
        optimizer.zero_grad()

So, I ended up with something like this:

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.lr)

        def lr_foo(epoch):
            if epoch < self.hparams.warm_up_step:
                # warm up lr
                lr_scale = 0.1 ** (self.hparams.warm_up_step - epoch)
            else:
                lr_scale = 0.95 ** epoch

            return lr_scale

        scheduler = LambdaLR(
            optimizer,
            lr_lambda=lr_foo
        )

        return [optimizer], [scheduler]

image

PS: to pytorch-lighting creators and contributors: thank you for contributing, I was searching for such approach (define loss/optim/etc in model class) for years!

I just stumbled upon this issue, as I was also looking for a way to make my LR scheduler update on each step instead of each epoch. After doing some additional research I found that there is a better way of doing this than overwriting optimizer_step. I am guessing this features wasn't available yet when this issue initially came up, but in version 1.0.3 (don't know the exact version this has been added though) you can just do this:

def configure_optimizers(self):
    optimizer = AdamW(self.parameters(), lr=self.learning_rate)
    scheduler = InverseSquareRootLR(optimizer, self.lr_warmup_steps)
    return (
        [optimizer],
        [
            {
                'scheduler': scheduler,
                'interval': 'step',
                'frequency': 1,
                'reduce_on_plateau': False,
                'monitor': 'val_loss',
            }
        ]
    )
Was this page helpful?
0 / 5 - 0 ratings

Related issues

edenlightning picture edenlightning  路  3Comments

mmsamiei picture mmsamiei  路  3Comments

DavidRuhe picture DavidRuhe  路  3Comments

williamFalcon picture williamFalcon  路  3Comments

monney picture monney  路  3Comments