Transformers: Trainer.evaluate does not support seq2seq models

Created on 12 Jun 2020  路  6Comments  路  Source: huggingface/transformers

馃悰 Bug

Information

Hi! I can't thank you enough for Transformers. I know that the Trainer is still under development, but would like to report this just to know the current status.

Currently Trainer._prediction_loop assumes that different batches of data have the same shape.
Specifically, this line

preds = torch.cat((preds, logits.detach()), dim=0)

This does not allow to use Trainer.evaluate for models with a variable output (e.g. seq2seq models). One of the possible solutions is to pad all batches to the same length, but it is pretty inefficient.

The problem arises when using:

  • [ ] the official example scripts: (give details below)
  • [x] my own modified scripts: (give details below)

The tasks I am working on is:

  • [ ] an official GLUE/SQUaD task: (give the name)
  • [x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. create seq2seq model
  2. pad batches in such a way that each batch is padded to the maximum length within batch
  3. create Trainer for the model, call .evaluate()
Traceback (most recent call last):
  File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 509, in train
    self.evaluate()
  File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 696, in evaluate
    output = self._prediction_loop(eval_dataloader, description="Evaluation")
  File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 767, in _prediction_loop
    preds = torch.cat((preds, logits.detach()), dim=0)
RuntimeError: Sizes of tensors must match except in dimension 0. Got 29 and 22 in dimension 1

Expected behavior

Trainer is able to evaluate Seq2seq

Environment info

  • transformers version: 2.11
  • Platform: Linux
  • Python version: 3.7.6
  • PyTorch version (GPU?): 1.5.0
  • Tensorflow version (GPU?): 2.2.0
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No
wontfix

Most helpful comment

Hi @Guitaricet , if you only want to evaluate for loss (AFAIK this is the case for seq2seq models) then you can set prediction_loss_only to True

All 6 comments

Hi @Guitaricet , if you only want to evaluate for loss (AFAIK this is the case for seq2seq models) then you can set prediction_loss_only to True

Hi! Thank you, but I need the metrics too. Workaround was to inherit from Trainer and override _prediction_loop.

That sounds like a reasonable solution, but we should document this somewhere. Pinging @sgugger on this:)

Yes, documentation about trainer would be awesome! Would love to contribute

Still no updates on this issue?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings