Transformers: Huggingface create_optimizer method not working

Created on 18 Aug 2020 · 4Comments · Source: huggingface/transformers

Environment info

transformers version: 3.0.2
Platform: Windows-10-10.0.18362-SP0
Python version: 3.6.6
PyTorch version (GPU?): 1.5.0+cpu (False)
Tensorflow version (GPU?): 2.2.0 (False)
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

@sgugger
@jplu

Information

Model I am using (Bert, XLNet ...): Roberta

The problem arises when using:

[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Run this code:

import tensorflow as tf
from transformers import RobertaConfig, TFRobertaForMaskedLM, create_optimizer
config = RobertaConfig()  
optimizer,lr = create_optimizer(1e-4,1000000,10000,0.1,1e-6,0.01)
training_loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model = TFRobertaForMaskedLM(config)
model.compile(optimizer=optimizer, loss=training_loss)
input = tf.random.uniform(shape=[1,25], maxval=100, dtype=tf.int32)
hist = model.fit(input, input, epochs=1, steps_per_epoch=1,verbose=0)

I am getting an error:

TypeError: apply_gradients() got an unexpected keyword argument 'experimental_aggregate_gradients'

Expected behavior

optimizer should be created

Source

brand17

All 4 comments

It seems that keras is passing experimental_aggregate_gradients to apply_gradients, but the transformers TF2 optimizer does not have this argument (see https://github.com/huggingface/transformers/blob/master/src/transformers/optimization_tf.py#L224).

One workaround right now is to set optimizer._HAS_AGGREGATE_GRAD = False, which prevents keras from passing this argument.

volker42maru on 20 Aug 2020

👍1

Thanks for the analysis @volker42maru. @jplu when you're back from vacation, we should fix this optimizer to accept this argument.

sgugger on 20 Aug 2020

👍1

Hello!

I was aware of this, and this was on purpose to make the trainer compliant with all the TF 2.X versions. Now that the trainer is fixed to v 2.2 min, I will modify accordingly the method. Thanks @volker42maru to raise this and make me remember I had to update this.