Just a note for deprecation.
gelu has been migrated to core TensorFlow.data_format argument under tensorflow_addons/image.sequential_update for AveragedOptimizerWrapper.Should we maintain the backward compatibility? /cc @tensorflow/sig-addons-maintainers.
I think maintaining backwards compatibility is the preferred route when possible. It may require the function to re-arrange some parameters or fill in defaults, but that can be part of the deprecation warning advising the user to use the core functionality.
can i take this up?
Of course! But probably not now. tf.nn.gelu and tf.keras.layers.MultiHeadAttention will be shipped in TF2.4. We will pin to TF2.4 rc version once it releases so we can deprecate gelu and MultiHeadAttention at that time.
Another thought is that we can bump our major version to 1.0.0. We can finalize our optimizer API consistency in the next major version as well.
@WindQAQ IMO we should refactor the optimizers to be consistent with the new API