I'm not sure if it is expected behavior or a bug, but when I'm trying to find a learning rate like this:
trainer = pl.Trainer(gpus = [1], accumulate_grad_batches=8)
lr_finder = trainer2.lr_find(model,min_lr = 1e-8, max_lr = 1e-1, num_training = 300)
It throws an error AttributeError: 'NoneType' object has no attribute 'item', which happens on the line 335 of lr_finder.py : current_loss = trainer.running_loss.last().item()
When I remove accumulate_grad_batches=8 everything works as expected
If it is expected behavior, I suggest implementing a more expressive error message
Just to be sure, is it an typing error that the trainer that gets initialized is called trainer and the trainer that gets used with learning rate finder is called trainer2, or is it two different trainers?
@SkafteNicki yeah, sorry, I just tried different trainers and copied the wrong one.
Can you please check on your side if this error exists?
This is very strange because the accumulate_grad_batches variable are override by the learning rate finders own argument num_accumulation_steps while it is running. I will look into whats coursing this error.
Just to be sure, do you want to accumulate gradients during the learning rate finder or is it just for later fitting?
I want to accumulate batches in training, so I suppose I should set accumulate_grad_batches parameter as in the training phase. Do I understand this wrong?
No, nothing wrong with your understanding of the code. I have found a solution to the problem and will create a PR soon.
I'm having the same error. Any solutions ready to be pulled in?
Just use the num_accumulation_steps option used by the learning rate finder for now.
trainer = pl.Trainer(gpus=1, accumulate_grad_batches=1)
lr_finder = trainer.lr_find(model, num_accumulation_steps=8)
[solution doesn't work]
@jopo666 @florisdf I do not think that will solve the problem if the goal is to accumulate gradients during the lr_find experiment. The global_step of the trainer, which only iterates when the learning rate is updated, runs every batch during the lr_find experiment, regardless of the num_accumulate_steps. This number resets itself after the finder is done running, but adding a print statement to line 434 or line 471 of training_loop.py will show that the learning rate (and the gradients) are updated every batch.
Tested on a nightly from last week.
Most helpful comment
@jopo666 @florisdf I do not think that will solve the problem if the goal is to accumulate gradients during the
lr_findexperiment. Theglobal_stepof the trainer, which only iterates when the learning rate is updated, runs every batch during thelr_findexperiment, regardless of thenum_accumulate_steps. This number resets itself after the finder is done running, but adding a print statement to line 434 or line 471 of training_loop.py will show that the learning rate (and the gradients) are updated every batch.Tested on a nightly from last week.