Transformers: Reformer model crashes during casual LM evaluation

Created on 13 Nov 2020 · 1Comment · Source: huggingface/transformers

Environment info

transformers version: 3.4.0
Platform: Linux-5.4.0-47-generic-x86_64-with-debian-bullseye-sid
Python version: 3.6.12
PyTorch version (GPU?): 1.7.0 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: yes
Using distributed or parallel set-up in script?: -

Who can help

I tried to dig into the code but could not find out why this is happening, so I am tagging @sgugger since this might be a Trainer related issue as well as @patrickvonplaten as I am using ReformerWithLMHead.

Information

I am using ReformerWithLMHead with a custom dataset and already set up the masked language modeling task so I moved on to casual LM but something odd happened. My setup is based on the official notebook from @patrickvonplaten and it works fine for masked LM.

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=False
)

def compute_metrics(pred):
    """
        pred.label_ids = (prediction_set_size, sequence_length)
        pred.predictions = (prediction_set_size, sequence_length, vocab_size)
            prob. dist. along vocab size
        Since we do masked language modelling, most of the sequence is MASKED with -100
        and only the non masked should be checked. :)
    """
    non_masked_indices = (pred.label_ids != -100)
    predictions = np.argmax(pred.predictions, axis=-1)
    labels = pred.label_ids[non_masked_indices]
    predictions = predictions[non_masked_indices]
    return {"accuracy": np.mean(np.asarray(predictions == labels), dtype=np.float)}

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    data_collator=data_collator,
    train_dataset=dataset,
    eval_dataset=eval_dataset,
    prediction_loss_only=False)

trainer.train()

I set up the collator for the non-mlm task but left the custom metric (also based on the official notebook) to calculate accuracy since it should be the same as before (IMO). The tricky part is if I explicitly set prediction_loss_only=False I get an error indicating that the logits could not have been nested_detached:

  File "src/lm/reformer_casual_lm.py", line 146, in <module>
    trainer.train()
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer.py", line 786, in train
    self._maybe_log_save_evalute(tr_loss, model, trial, epoch)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer.py", line 843, in _maybe_log_save_evalute
    metrics = self.evaluate()
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer.py", line 1251, in evaluate
    output = self.prediction_loop(eval_dataloader, description="Evaluation")
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer.py", line 1348, in prediction_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer.py", line 1452, in prediction_step
    logits = nested_detach(logits)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer_pt_utils.py", line 66, in nested_detach
    return type(tensors)(nested_detach(t) for t in tensors)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer_pt_utils.py", line 66, in <genexpr>
    return type(tensors)(nested_detach(t) for t in tensors)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer_pt_utils.py", line 66, in nested_detach
    return type(tensors)(nested_detach(t) for t in tensors)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer_pt_utils.py", line 66, in <genexpr>
    return type(tensors)(nested_detach(t) for t in tensors)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer_pt_utils.py", line 66, in nested_detach
    return type(tensors)(nested_detach(t) for t in tensors)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer_pt_utils.py", line 66, in <genexpr>
    return type(tensors)(nested_detach(t) for t in tensors)
  File "/home/qbeer/miniconda3/envs/nlp/lib/python3.6/site-packages/transformers/trainer_pt_utils.py", line 67, in nested_detach
    return tensors.detach()
AttributeError: 'NoneType' object has no attribute 'detach'

If I just delete the prediction_loss_only=False line the training runs but my custom metric is not evaluated since in the training class, the gathered labels and predictions are only not None when this value is set to False:

eval_loss = eval_losses_gatherer.finalize()
preds = preds_gatherer.finalize() if not prediction_loss_only else None
label_ids = labels_gatherer.finalize() if not prediction_loss_only else None

if self.compute_metrics is not None and preds is not None and label_ids is not None:
    metrics = self.compute_metrics(EvalPrediction(predictions=preds, label_ids=label_ids))

Expected behavior

I expect that my custom metric is evaluated and the training not crashing randomly.

Thanks in advance.

Source

qbeer

Most helpful comment

Mmm, looks like the reformer model is outputing some Nones, which it shouldn't do. Can make a fix for that in Trainer but the model itself should not do that. Looks like there is work for both of us @patrickvonplaten :-)