Pytorch-lightning: learning rate scheduler does not appear to step

Created on 8 Nov 2020 · 7Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

when using learning rate scheduler from PyTorch, it appears that it does not step

Please reproduce using the BoringModel and post here

Yes I used the notebook above and the only thing I did is to add callback to the trainer. See code below

To Reproduce

from pytorch_lightning.callbacks import LearningRateMonitor

def test_x(tmpdir):
    # init model
    model = BoringModel()

    lr_monitor = LearningRateMonitor(logging_interval='step')

    # Initialize a trainer
    trainer = pl.Trainer(
        max_epochs=1, 
        progress_bar_refresh_rate=20,
        callbacks=[lr_monitor]
    )

    # Train the model ⚡
    trainer.fit(model, train, val)

    trainer.test(test_dataloaders=test)

After the code run, open the log using tensorboard

%load_ext tensorboard
%tensorboard --logdir lightning_logs --reload_interval 1

and look for the lr-SGD curve and I find it flat instead of gradually decrease 10% (gamma=0.1 by default) every 10 steps.

Here is the link for my notebook
https://colab.research.google.com/drive/10l3Kz-9rOP7lu5qrLurJj-2m-lio8SmE?usp=sharing

Expected behavior

learning rate schedule should step

Environment

Note: Bugs with code are solved faster ! Colab Notebook should be made public !

IDE: Please, use our python bug_report_model.py template.
Colab Notebook: Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py

CUDA:
- GPU:
  - Tesla V100-SXM2-16GB
- available: True
- version: 10.1
Packages:
- numpy: 1.18.5
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu101
- pytorch-lightning: 1.0.5
- tqdm: 4.41.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor: x86_64
- python: 3.6.9
- version: #1 SMP Thu Jul 23 08:00:38 PDT 2020
  
  Additional context

Logger help wanted with code

Source

junwen-austin

Most helpful comment

you need to update:

def configure_optimizers(self):
    optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10)
    lr_scheduler = {'scheduler': lr_scheduler, 'interval': 'step'}
    #lr_scheduler = get_linear_schedule_with_warmup(optimizer, 100, 1000)
    return [optimizer], [lr_scheduler]

here interval: 'step' tells when to update the lr and LearningRateMonitor(logging_interval='step') tells when to log the lr.

rohitgr7 on 8 Nov 2020

👍2

All 7 comments

When I print the learning rates to console with
optimizer.param_groups[0]['lr']
I can see it change, so it must be a visualization issue.

awaelchli on 8 Nov 2020

@rohitgr7 have you worked on the LRMonitor? Any ideas what could be wrong with the visualization?

awaelchli on 8 Nov 2020

@awaelchli thanks for looking into it. Could you share the code you print the learning rate to console so I can do more testing?

junwen-austin on 8 Nov 2020

you need to update:

def configure_optimizers(self):
    optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10)
    lr_scheduler = {'scheduler': lr_scheduler, 'interval': 'step'}
    #lr_scheduler = get_linear_schedule_with_warmup(optimizer, 100, 1000)
    return [optimizer], [lr_scheduler]

here interval: 'step' tells when to update the lr and LearningRateMonitor(logging_interval='step') tells when to log the lr.

rohitgr7 on 8 Nov 2020

👍2

@junwen-austin I just added this line to the training_step
print(self.trainer.optimizers[0].param_groups[0]['lr'])

awaelchli on 8 Nov 2020

@junwen-austin as per @rohitgr7 's answer you need to run multiple epoch to see the updates or change the scheduling from epoch to step using the dict key in the code sample above.

awaelchli on 8 Nov 2020

@awaelchli @rohitgr7 Thank you so much!

junwen-austin on 8 Nov 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Namespace Cleaning

monney · 3Comments

Add "epoch" options to basic templates

baeseongsu · 3Comments

Managing Checkpoints

srush · 3Comments

Fix .test() on ddp

williamFalcon · 3Comments

How set number of epochs

Vichoko · 3Comments

Pytorch-lightning: learning rate scheduler does not appear to step

🐛 Bug

Please reproduce using the BoringModel and post here

To Reproduce

Expected behavior

Environment

Additional context

Most helpful comment

All 7 comments

Related issues