Making a finetuning model where the backbone isn't training breaks 16-bit.
Hi! thanks for your contribution!, great first issue!
Just to keep all the details here, this seems to be a side effect of amp. When we call self.trainer.scaler.step(optimizer) internally the scaler does an inf check on the optimizer's parameters, which is the assertion being thrown. This check needs to coincide with ensuring that the parameters are even updated within this step.
@SeanNaren follow up with pytorch team
Most helpful comment
Hi! thanks for your contribution!, great first issue!