Addons: Resolve discrepancy in python and custom op gelu implementations

Created on 5 Mar 2020 · 4Comments · Source: tensorflow/addons

Per #1172 we have only been able to get the precision within 1e-5. Need to determine why this is and if it can be lowered

activations custom-ops help wanted

Source

seanpmorgan

All 4 comments

That is expected because C++ grad implementation is hand-crafted and [most likely] analytically simplified so it doesn't accumulate round-off errors. That is why tf.custom_gradient was introduced:
https://www.tensorflow.org/api_docs/python/tf/custom_gradient

This decorator allows fine grained control over the gradients of a sequence for operations. This may be useful for multiple reasons, including providing a more efficient or numerically stable gradient for a sequence of operations.

But... Why 1e-6? I see a lot of 1e-4 out there. Isn't that enough?

failure-to-thrive on 5 Mar 2020

Another interesting finding - it fails only on CPU and float32.

failure-to-thrive on 6 Mar 2020

👍1

So, my preliminary conclusion after hours of research (I can be wrong): it's quite natural to observe such a discrepancy. float32 is 7 decimal digits of precision. Accumulated round-off errors can climb into 1e-6 with easy.

failure-to-thrive on 6 Mar 2020

👍1

Thanks a lot @failure-to-thrive for your investigation. Using 10e-6 was @Squadrick 's suggestion there https://github.com/tensorflow/addons/pull/1137#issuecomment-592125603 . @Squadrick , if you agree with @failure-to-thrive , should we close this issue?