Transformers: Trainer.evaluate does not support seq2seq models

Created on 12 Jun 2020 · 6Comments · Source: huggingface/transformers

🐛 Bug

Information

Hi! I can't thank you enough for Transformers. I know that the Trainer is still under development, but would like to report this just to know the current status.

Currently Trainer._prediction_loop assumes that different batches of data have the same shape.
Specifically, this line

preds = torch.cat((preds, logits.detach()), dim=0)

This does not allow to use Trainer.evaluate for models with a variable output (e.g. seq2seq models). One of the possible solutions is to pad all batches to the same length, but it is pretty inefficient.

The problem arises when using:

[ ] the official example scripts: (give details below)
[x] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[x] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

create seq2seq model
pad batches in such a way that each batch is padded to the maximum length within batch
create Trainer for the model, call .evaluate()

Traceback (most recent call last):
  File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 509, in train
    self.evaluate()
  File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 696, in evaluate
    output = self._prediction_loop(eval_dataloader, description="Evaluation")
  File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 767, in _prediction_loop
    preds = torch.cat((preds, logits.detach()), dim=0)
RuntimeError: Sizes of tensors must match except in dimension 0. Got 29 and 22 in dimension 1

Expected behavior

Trainer is able to evaluate Seq2seq

Environment info

transformers version: 2.11
Platform: Linux
Python version: 3.7.6
PyTorch version (GPU?): 1.5.0
Tensorflow version (GPU?): 2.2.0
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

wontfix

Source

Guitaricet

👍1

Most helpful comment

Hi @Guitaricet , if you only want to evaluate for loss (AFAIK this is the case for seq2seq models) then you can set prediction_loss_only to True

patil-suraj on 12 Jun 2020

👍2

All 6 comments

Hi @Guitaricet , if you only want to evaluate for loss (AFAIK this is the case for seq2seq models) then you can set prediction_loss_only to True

patil-suraj on 12 Jun 2020

👍2

Hi! Thank you, but I need the metrics too. Workaround was to inherit from Trainer and override _prediction_loop.

Guitaricet on 13 Jun 2020

That sounds like a reasonable solution, but we should document this somewhere. Pinging @sgugger on this:)

julien-c on 13 Jun 2020

👍1

Yes, documentation about trainer would be awesome! Would love to contribute

patil-suraj on 13 Jun 2020

Still no updates on this issue?

ggaemo on 7 Aug 2020

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.