Addons: Graduate LayerNormalization layer to TF core.

Created on 13 Apr 2019 · 7Comments · Source: tensorflow/addons

System information

TensorFlow version (you are using):
TensorFlow Addons version:
Is it in the tf.contrib (if so, where):
Are you willing to contribute it (yes/no):
Are you willing to maintain it going forward? (yes/no):

Describe the feature and the current behavior/state.
LayerBatchNormalization has been moved to core here: https://github.com/tensorflow/tensorflow/blob/43a255d44035f2a68585288876e2ccbb77cb70de/tensorflow/python/keras/layers/normalization.py#L807

Can we graduate it from Addons?

Will this change the current api? How?
Yes, this will delete the LayerBatchNormalization API in Addons.

Who will benefit with this feature?

Any Other info.

layers

Source

yhliang2018

Most helpful comment

Per the discussions in https://github.com/tensorflow/addons/pull/14 . We're going to request that LayerNormalization be removed from core. Although now it looks like the experimental tag has been removed from core? cc @karmel as this was not the roadmap we discussed.

If it does stay in core, I'd recommend that we use a more general implemenetation of LayernNorm that includes GroupNorm and InstanceNorm as we have done in addons.

seanpmorgan on 13 Apr 2019

👍2

All 7 comments

If it does stay in core, I'd recommend that we use a more general implemenetation of LayernNorm that includes GroupNorm and InstanceNorm as we have done in addons.

seanpmorgan on 13 Apr 2019

👍2

@seanpmorgan After some discussion, it seemed best to graduate the implementation from addons to core, as we were seeing increased requests and usage. The Addons implementations of Group and InstanceNorm would then remain, and could build off the core implementation of LayerNorm. We figured this would be a win for all, as it is the first example of moving from Addons to core, and the more specialized implementations (Instance + GroupNorm) still have a proper home here. Does that work for you?

karmel on 15 Apr 2019

@karmel thanks for the follow up... yes that works for us. My only hesitation would be the comparisons in the GN paper that show LN to be a bit less flexible than GN and worse performing. Perhaps LN is being used because of legacy examples... but that's not really our battle to fight.

Relation to Layer Normalization [3]. GN becomes LN if we set the group number as G = 1. LN assumes all channels in a layer make “similar contributions” [3]. Unlike the case of fully-connected layers studied in [3], this assumption can be less valid with the presence of convolutions, as discussed
in [3]. GN is less restricted than LN, because each group of channels (instead of all of them) are assumed to subject to the shared mean and variance; the model still has flexibility of learning a different distribution for each group. This leads to improved representational power of GN over LN,
as shown by the lower training and validation error in experiments (Figure 4).

https://arxiv.org/abs/1803.08494

@Smokrow Would you mind submitting a PR to remove LayerNormalization so we can keep our apis in sync with TF core. Thanks.

seanpmorgan on 15 Apr 2019

@seanpmorgan Thanks! One noticeable difference between core LN and Addons LN is the default value of epsilon. TF core uses 1e-3 to keep it consistent with that of Keras BN; While I see Addons implementation uses 1e-5.

yhliang2018 on 15 Apr 2019

👍1

@seanpmorgan I will open a PR tomorrow morning.

@karmel if I can help out with anything during the move please let me know.

Smokrow on 15 Apr 2019

@karmel Layer Normalization is a more specialized case of group normalization.
I can just remove this special case but this would of course kinda break the inheritance. Maybe it would make more sense to move all 3 layers in one go since the specialization is just 7 lines of code.

For example the current implementation of LayerNorm looks like this:

@keras_utils.register_keras_custom_object
class LayerNormalization(GroupNormalization):
    def __init__(self, **kwargs):
        if "groups" in kwargs:
            logging.warning("The given value for groups will be overwritten.")
        kwargs["groups"] = 1
        super(LayerNormalization, self).__init__(**kwargs)

Smokrow on 16 Apr 2019

@Smokrow I think the most pressing thing to do is to align our api's and remove LayerNormalization. Also @yhliang2018 suggestion to use 1e-3 is better implemented sooner rather than later before we acquire too many users.

I do agree with you, but I imagine the process of adding two new layers will take a while and I'd prefer not to have duplicate functionality now that LayerNormalization is not experimental.

seanpmorgan on 16 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

AttentionWrapperTest results failing on nightlies

seanpmorgan · 4Comments

BeamSearchDecoder with non LSTM cells raises ValueError exception

jimthompson5802 · 3Comments

Request for example: Weight Decay Optimizers / Super Convergence

seanpmorgan · 4Comments

tensorflow2.0 can't use this module

ididhmc · 4Comments

How to use addons in Java/Scala

maziyarpanahi · 3Comments