Pytorch-lightning: Learning rate finder crashes if accumulate_grad_batches is not set to 1

Created on 4 May 2020  路  8Comments  路  Source: PyTorchLightning/pytorch-lightning

I'm not sure if it is expected behavior or a bug, but when I'm trying to find a learning rate like this:

trainer = pl.Trainer(gpus = [1], accumulate_grad_batches=8)
lr_finder = trainer2.lr_find(model,min_lr = 1e-8, max_lr = 1e-1, num_training = 300)

It throws an error AttributeError: 'NoneType' object has no attribute 'item', which happens on the line 335 of lr_finder.py : current_loss = trainer.running_loss.last().item()

When I remove accumulate_grad_batches=8 everything works as expected
If it is expected behavior, I suggest implementing a more expressive error message

question

Most helpful comment

@jopo666 @florisdf I do not think that will solve the problem if the goal is to accumulate gradients during the lr_find experiment. The global_step of the trainer, which only iterates when the learning rate is updated, runs every batch during the lr_find experiment, regardless of the num_accumulate_steps. This number resets itself after the finder is done running, but adding a print statement to line 434 or line 471 of training_loop.py will show that the learning rate (and the gradients) are updated every batch.

Tested on a nightly from last week.

All 8 comments

Just to be sure, is it an typing error that the trainer that gets initialized is called trainer and the trainer that gets used with learning rate finder is called trainer2, or is it two different trainers?

@SkafteNicki yeah, sorry, I just tried different trainers and copied the wrong one.
Can you please check on your side if this error exists?

This is very strange because the accumulate_grad_batches variable are override by the learning rate finders own argument num_accumulation_steps while it is running. I will look into whats coursing this error.

Just to be sure, do you want to accumulate gradients during the learning rate finder or is it just for later fitting?

I want to accumulate batches in training, so I suppose I should set accumulate_grad_batches parameter as in the training phase. Do I understand this wrong?

No, nothing wrong with your understanding of the code. I have found a solution to the problem and will create a PR soon.

I'm having the same error. Any solutions ready to be pulled in?

Just use the num_accumulation_steps option used by the learning rate finder for now.

trainer = pl.Trainer(gpus=1, accumulate_grad_batches=1)
lr_finder = trainer.lr_find(model, num_accumulation_steps=8)

[solution doesn't work]

@jopo666 @florisdf I do not think that will solve the problem if the goal is to accumulate gradients during the lr_find experiment. The global_step of the trainer, which only iterates when the learning rate is updated, runs every batch during the lr_find experiment, regardless of the num_accumulate_steps. This number resets itself after the finder is done running, but adding a print statement to line 434 or line 471 of training_loop.py will show that the learning rate (and the gradients) are updated every batch.

Tested on a nightly from last week.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anthonytec2 picture anthonytec2  路  3Comments

williamFalcon picture williamFalcon  路  3Comments

iakremnev picture iakremnev  路  3Comments

chuong98 picture chuong98  路  3Comments

srush picture srush  路  3Comments