Pytorch-lightning: Learning rate finder crashes if accumulate_grad_batches is not set to 1

Created on 4 May 2020 · 8Comments · Source: PyTorchLightning/pytorch-lightning

I'm not sure if it is expected behavior or a bug, but when I'm trying to find a learning rate like this:

trainer = pl.Trainer(gpus = [1], accumulate_grad_batches=8)
lr_finder = trainer2.lr_find(model,min_lr = 1e-8, max_lr = 1e-1, num_training = 300)

It throws an error AttributeError: 'NoneType' object has no attribute 'item', which happens on the line 335 of lr_finder.py : current_loss = trainer.running_loss.last().item()

When I remove accumulate_grad_batches=8 everything works as expected
If it is expected behavior, I suggest implementing a more expressive error message

question

Source

RafailFridman

Most helpful comment

@jopo666 @florisdf I do not think that will solve the problem if the goal is to accumulate gradients during the lr_find experiment. The global_step of the trainer, which only iterates when the learning rate is updated, runs every batch during the lr_find experiment, regardless of the num_accumulate_steps. This number resets itself after the finder is done running, but adding a print statement to line 434 or line 471 of training_loop.py will show that the learning rate (and the gradients) are updated every batch.

Tested on a nightly from last week.

alexstoken on 11 May 2020

👍2

All 8 comments

Just to be sure, is it an typing error that the trainer that gets initialized is called trainer and the trainer that gets used with learning rate finder is called trainer2, or is it two different trainers?

SkafteNicki on 4 May 2020

@SkafteNicki yeah, sorry, I just tried different trainers and copied the wrong one.
Can you please check on your side if this error exists?

RafailFridman on 4 May 2020

This is very strange because the accumulate_grad_batches variable are override by the learning rate finders own argument num_accumulation_steps while it is running. I will look into whats coursing this error.

Just to be sure, do you want to accumulate gradients during the learning rate finder or is it just for later fitting?

SkafteNicki on 4 May 2020

I want to accumulate batches in training, so I suppose I should set accumulate_grad_batches parameter as in the training phase. Do I understand this wrong?

RafailFridman on 4 May 2020

No, nothing wrong with your understanding of the code. I have found a solution to the problem and will create a PR soon.

SkafteNicki on 4 May 2020

👀1

I'm having the same error. Any solutions ready to be pulled in?

florisdf on 11 May 2020

Just use the num_accumulation_steps option used by the learning rate finder for now.

trainer = pl.Trainer(gpus=1, accumulate_grad_batches=1)
lr_finder = trainer.lr_find(model, num_accumulation_steps=8)

[solution doesn't work]

jopo666 on 11 May 2020

👎1

Tested on a nightly from last week.

alexstoken on 11 May 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Bug] Progress bar displays wrong total iterations for train

versatran01 · 3Comments

Fix .test() on ddp

williamFalcon · 3Comments

Dataloader starving the gpu

maxime-louis · 3Comments

How set number of epochs

Vichoko · 3Comments

Metrics: Accuracy, Precision, Recall, F1, ROC

justusschock · 3Comments