Hi! I can't thank you enough for Transformers. I know that the Trainer is still under development, but would like to report this just to know the current status.
Currently Trainer._prediction_loop assumes that different batches of data have the same shape.
Specifically, this line
preds = torch.cat((preds, logits.detach()), dim=0)
This does not allow to use Trainer.evaluate for models with a variable output (e.g. seq2seq models). One of the possible solutions is to pad all batches to the same length, but it is pretty inefficient.
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
Traceback (most recent call last):
File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 509, in train
self.evaluate()
File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 696, in evaluate
output = self._prediction_loop(eval_dataloader, description="Evaluation")
File "/home/vlialin/miniconda3/lib/python3.7/site-packages/transformers/trainer.py", line 767, in _prediction_loop
preds = torch.cat((preds, logits.detach()), dim=0)
RuntimeError: Sizes of tensors must match except in dimension 0. Got 29 and 22 in dimension 1
Trainer is able to evaluate Seq2seq
transformers version: 2.11Hi @Guitaricet , if you only want to evaluate for loss (AFAIK this is the case for seq2seq models) then you can set prediction_loss_only to True
Hi! Thank you, but I need the metrics too. Workaround was to inherit from Trainer and override _prediction_loop.
That sounds like a reasonable solution, but we should document this somewhere. Pinging @sgugger on this:)
Yes, documentation about trainer would be awesome! Would love to contribute
Still no updates on this issue?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Hi @Guitaricet , if you only want to evaluate for loss (AFAIK this is the case for seq2seq models) then you can set
prediction_loss_onlytoTrue