Transformers: 馃悰 TFTrainer not working on TPU (TF2.2)

Created on 15 Jun 2020  路  3Comments  路  Source: huggingface/transformers

馃悰 Bug

Information

The problem arises when using:

  • [ ] the official example scripts
  • [x] my own modified scripts

The tasks I am working on is:

  • [x] an official GLUE/SQUaD task: CNN/DM
  • [ ] my own task or dataset

To reproduce

Steps to reproduce the behavior:

  1. Install transformers from master
  2. Run TPU training using TFTrainer

I get the following error :

TypeError: Failed to convert object of type to Tensor. Contents: . Consider casting elements to a supported type.


Here :
https://github.com/huggingface/transformers/blob/9931f817b75ecb2c8bb08b6e9d4cbec4b0933935/src/transformers/trainer_tf.py#L324

we pass optimizer as arguments.

But according to the documentation in TF :
https://github.com/tensorflow/tensorflow/blob/2b96f3662bd776e277f86997659e61046b56c315/tensorflow/python/distribute/distribute_lib.py#L890-L891

All arguments in args or kwargs should either be nest of tensors or
tf.distribute.DistributedValues containing tensors or composite tensors.

Environment info

  • transformers version: 2.11.0
  • Platform: Linux-4.9.0-9-amd64-x86_64-with-debian-9.12
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.5.0 (False)
  • Tensorflow version (GPU?): 2.2.0 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: TPU training
TensorFlow wontfix

Most helpful comment

Hello !

Nice finding! TPUs with TF Trainer is currently under developement and not works for several cases. If you really need to train your model with TPUs I suggest you to use the PyTorch version of the trainer. Full support of TPUs for the TF Trainer will arrive I hope this month.

But if you are ready to make PRs, you are welcome to do so :)

All 3 comments

Currently as a work-around I set the optimizer as an attribute and remove the argument :

After this line :
https://github.com/huggingface/transformers/blob/9931f817b75ecb2c8bb08b6e9d4cbec4b0933935/src/transformers/trainer_tf.py#L235

I add :

self.optimizer = optimizer

And replace the argument optimizer :

https://github.com/huggingface/transformers/blob/9931f817b75ecb2c8bb08b6e9d4cbec4b0933935/src/transformers/trainer_tf.py#L326-L335

    def _step(self):
        """Applies gradients and resets accumulation."""
        gradient_scale = self.gradient_accumulator.step * self.args.strategy.num_replicas_in_sync
        gradients = [
            gradient / tf.cast(gradient_scale, gradient.dtype) for gradient in self.gradient_accumulator.gradients
        ]
        gradients = [(tf.clip_by_value(grad, -self.args.max_grad_norm, self.args.max_grad_norm)) for grad in gradients]

        self.optimizer.apply_gradients(list(zip(gradients, self.model.trainable_variables)))
        self.gradient_accumulator.reset()

And finally replace the call :

https://github.com/huggingface/transformers/blob/9931f817b75ecb2c8bb08b6e9d4cbec4b0933935/src/transformers/trainer_tf.py#L324

self.args.strategy.experimental_run_v2(self._step)

Not closing as it's only a work-around. Any cleaner solution to put in a PR ?

Hello !

Nice finding! TPUs with TF Trainer is currently under developement and not works for several cases. If you really need to train your model with TPUs I suggest you to use the PyTorch version of the trainer. Full support of TPUs for the TF Trainer will arrive I hope this month.

But if you are ready to make PRs, you are welcome to do so :)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings