Addons: Add Adafactor optimizer

Created on 18 Sep 2019  路  13Comments  路  Source: tensorflow/addons

Relevant information

Which API type would this fall under (layer, metric, optimizer, etc.)

This is a new optimizer.

Who will benefit with this feature?

Users using optimizers with adaptive learning rate and looking to reduce the memory usage.

Feature Request help wanted optimizers

Most helpful comment

I talked with @guillaumekln and we decided that we will drop the experimental quantization flag of the t2t implementation to avoid implementing all the dependencies for now.

It looks like Keras will add quantization functionality as discussed in here, so we can add the flag back into the function once it is released. @seanpmorgan @WindQAQ are you okay with that approach?

I am fine with this :smiley:

All 13 comments

I will give this a try tomorrow

I talked with @guillaumekln and we decided that we will drop the experimental quantization flag of the t2t implementation to avoid implementing all the dependencies for now.

It looks like Keras will add quantization functionality as discussed in here, so we can add the flag back into the function once it is released. @seanpmorgan @WindQAQ are you okay with that approach?

I talked with @guillaumekln and we decided that we will drop the experimental quantization flag of the t2t implementation to avoid implementing all the dependencies for now.

It looks like Keras will add quantization functionality as discussed in here, so we can add the flag back into the function once it is released. @seanpmorgan @WindQAQ are you okay with that approach?

I am fine with this :smiley:

Checking in on this issue; it looks like something that would be useful for the external community. Has there been any progress on adding Adafactor as an optimizer?

@Smokrow I remember you were not far from completing the development. Are you still interested in opening a PR with this optimizer?

Hey there. It is basically finished. I got stuck on the testing part since I am not quite sure what is a valid way to test this thing. Sorry for the delay. Work got a little bit busier in the last few months :grimacing:

Can we get a feature in the adafactor implementation to turn off weight decay for certain types of parameters (e.g. bias and layer_norm) since this is often used when finetuning language models?

In Pytorch this would be done like this:

no_decay = ["bias", "LayerNorm.weight"]
optimizer_grouped_parameters = [
    {"params": [p for n, p in self.model.named_parameters() if not any(
        nd in n for nd in no_decay)], "weight_decay": 0.0},
    {"params": [p for n, p in self.model.named_parameters() if any(
        nd in n for nd in no_decay)], "weight_decay": 0.0},
]

Hi! Checking in as to whether anyone has updates on getting Adafactor into a TF release (or in add-ons). I've been using a version adapted from the original Tensor2Tensor repo but it's a bit quirky and doesn't play well with tf.keras LR schedules (I know Adafactor has an internal adaptive LR and the paper says you shouldn't need an external LR schedule, but empirical evidence seems to suggest it would be helpful).

@mathemakitten Hi! It does not seem anyone is working on this at moment. It mostly means the community is not interested that much in this optimizer. Are you interested in making a PR?

I have a TF Adafactor implementation that I had to modify to work well with keras's lr schedulers and I know that it works well on TPUs. It probably needs to be cleaned up, but the code is here if anyone's interested:

It will be very nice to have the adafactor. It will save memory a lot for NLP models. We do not have a common TF2 implementation now.
Is Adafactor planned for the future release of addons?

Hey @guillaumekln happy to make a PR, or @bkkaggle would you like to do it? My implementation is very similarly adapted out of the T2T implementation like yours. I've added support for compatibility with external LR schedule objects, and weight decay (and excluding weights from weight decay by name).

My implementation is pretty messy so go ahead and merge your implementation in if you want!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maziyarpanahi picture maziyarpanahi  路  3Comments

seanpmorgan picture seanpmorgan  路  3Comments

n3011 picture n3011  路  4Comments

ididhmc picture ididhmc  路  4Comments

seanpmorgan picture seanpmorgan  路  4Comments