transformers version: 3.0.2@sgugger
@jplu
Model I am using (Bert, XLNet ...): Roberta
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
import tensorflow as tf
from transformers import RobertaConfig, TFRobertaForMaskedLM, create_optimizer
config = RobertaConfig()
optimizer,lr = create_optimizer(1e-4,1000000,10000,0.1,1e-6,0.01)
training_loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model = TFRobertaForMaskedLM(config)
model.compile(optimizer=optimizer, loss=training_loss)
input = tf.random.uniform(shape=[1,25], maxval=100, dtype=tf.int32)
hist = model.fit(input, input, epochs=1, steps_per_epoch=1,verbose=0)
TypeError: apply_gradients() got an unexpected keyword argument 'experimental_aggregate_gradients'
optimizer should be created
It seems that keras is passing experimental_aggregate_gradients to apply_gradients, but the transformers TF2 optimizer does not have this argument (see https://github.com/huggingface/transformers/blob/master/src/transformers/optimization_tf.py#L224).
One workaround right now is to set optimizer._HAS_AGGREGATE_GRAD = False, which prevents keras from passing this argument.
Thanks for the analysis @volker42maru. @jplu when you're back from vacation, we should fix this optimizer to accept this argument.
Hello!
I was aware of this, and this was on purpose to make the trainer compliant with all the TF 2.X versions. Now that the trainer is fixed to v 2.2 min, I will modify accordingly the method. Thanks @volker42maru to raise this and make me remember I had to update this.
The PR #6717 should fix the problem.