Apex: NaNs after torch.nn.Softmax with Amp O2 model?

Created on 27 Mar 2019 · 3Comments · Source: NVIDIA/apex

Hey,

In particular, it's this line of code: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L305

After running attention scores through softmax, some (but not all) scores get set to NaN. I follow the Amp guide pretty much letter for letter in terms of setting up the model, but can post any code you might think is relevant. I had this code training successfully (on a different machine) with the old API, so maybe this is something to do with how Softmax is being patched in the new implementation?

Thanks!

Source

dave-epstein

Most helpful comment

@dave-epstein I got the similar problem. How did you solve the problem getting NaN?

kkjh0723 on 24 Aug 2019

👍4

All 3 comments

O2 doesn't do any patching at all, and also does not correspond to the thing called "amp" in the old API. O1 is what corresponds to the old thing called "amp," and O1 should be patching softmax to run in FP32. Do you still see nans with O1?

The softmax forward pass should be benign, since it produces values between 0 and 1 and the underlying Pytorch implementation is pretty smart. The backward pass may have dynamic range issues because the gradient wrt each input element requires a reduction.