System information
Describe the feature and the current behavior/state.
LayerBatchNormalization has been moved to core here: https://github.com/tensorflow/tensorflow/blob/43a255d44035f2a68585288876e2ccbb77cb70de/tensorflow/python/keras/layers/normalization.py#L807
Can we graduate it from Addons?
Will this change the current api? How?
Yes, this will delete the LayerBatchNormalization API in Addons.
Who will benefit with this feature?
Any Other info.
Per the discussions in https://github.com/tensorflow/addons/pull/14 . We're going to request that LayerNormalization be removed from core. Although now it looks like the experimental tag has been removed from core? cc @karmel as this was not the roadmap we discussed.
If it does stay in core, I'd recommend that we use a more general implemenetation of LayernNorm that includes GroupNorm and InstanceNorm as we have done in addons.
@seanpmorgan After some discussion, it seemed best to graduate the implementation from addons to core, as we were seeing increased requests and usage. The Addons implementations of Group and InstanceNorm would then remain, and could build off the core implementation of LayerNorm. We figured this would be a win for all, as it is the first example of moving from Addons to core, and the more specialized implementations (Instance + GroupNorm) still have a proper home here. Does that work for you?
@karmel thanks for the follow up... yes that works for us. My only hesitation would be the comparisons in the GN paper that show LN to be a bit less flexible than GN and worse performing. Perhaps LN is being used because of legacy examples... but that's not really our battle to fight.
Relation to Layer Normalization [3]. GN becomes LN if we set the group number as G = 1. LN assumes all channels in a layer make “similar contributions” [3]. Unlike the case of fully-connected layers studied in [3], this assumption can be less valid with the presence of convolutions, as discussed
in [3]. GN is less restricted than LN, because each group of channels (instead of all of them) are assumed to subject to the shared mean and variance; the model still has flexibility of learning a different distribution for each group. This leads to improved representational power of GN over LN,
as shown by the lower training and validation error in experiments (Figure 4).

https://arxiv.org/abs/1803.08494
@Smokrow Would you mind submitting a PR to remove LayerNormalization so we can keep our apis in sync with TF core. Thanks.
@seanpmorgan Thanks! One noticeable difference between core LN and Addons LN is the default value of epsilon. TF core uses 1e-3 to keep it consistent with that of Keras BN; While I see Addons implementation uses 1e-5.
@seanpmorgan I will open a PR tomorrow morning.
@karmel if I can help out with anything during the move please let me know.
@karmel Layer Normalization is a more specialized case of group normalization.
I can just remove this special case but this would of course kinda break the inheritance. Maybe it would make more sense to move all 3 layers in one go since the specialization is just 7 lines of code.
For example the current implementation of LayerNorm looks like this:
@keras_utils.register_keras_custom_object
class LayerNormalization(GroupNormalization):
def __init__(self, **kwargs):
if "groups" in kwargs:
logging.warning("The given value for groups will be overwritten.")
kwargs["groups"] = 1
super(LayerNormalization, self).__init__(**kwargs)
@Smokrow I think the most pressing thing to do is to align our api's and remove LayerNormalization. Also @yhliang2018 suggestion to use 1e-3 is better implemented sooner rather than later before we acquire too many users.
I do agree with you, but I imagine the process of adding two new layers will take a while and I'd prefer not to have duplicate functionality now that LayerNormalization is not experimental.
Most helpful comment
Per the discussions in https://github.com/tensorflow/addons/pull/14 . We're going to request that LayerNormalization be removed from core. Although now it looks like the experimental tag has been removed from core? cc @karmel as this was not the roadmap we discussed.
If it does stay in core, I'd recommend that we use a more general implemenetation of LayernNorm that includes GroupNorm and InstanceNorm as we have done in addons.