Transformers: Long BERT TypeError: forward() takes from 2 to 4 positional arguments but 7 were given

Created on 14 Jul 2020 · 4Comments · Source: huggingface/transformers

I'm having an issue on the pretraining of a BERT-like model. I used the following function twice: the first time with bert-base-multilingual-cased and the second time with a simil version, but more efficient for long documents, exploiting the class LongformerSelfAttention to make the normal BERT into a LongBERT.

def pretrain_and_evaluate(args, model, tokenizer, eval_only, model_path):
    val_dataset = TextDataset(tokenizer=tokenizer,
                              file_path=args.val_datapath,
                              block_size=tokenizer.max_len)
    if eval_only:
        train_dataset = val_dataset
    else:
        logger.info(f'Loading and tokenizing training data is usually slow: {args.train_datapath}')
        train_dataset = TextDataset(tokenizer=tokenizer,
                                    file_path=args.train_datapath,
                                    block_size=tokenizer.max_len)

    data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)
    trainer = Trainer(model=model, args=args, data_collator=data_collator,
                      train_dataset=train_dataset, eval_dataset=val_dataset, prediction_loss_only=True,)

    eval_loss = trainer.evaluate()
    eval_loss = eval_loss['eval_loss']
    logger.info(f'Initial eval bpc: {eval_loss/math.log(2)}')

    if not eval_only:
        trainer.train(model_path=model_path)
        trainer.save_model()

        eval_loss = trainer.evaluate()
        eval_loss = eval_loss['eval_loss']
        logger.info(f'Eval bpc after pretraining: {eval_loss/math.log(2)}')

With the bert-base-multilingual-cased it works well: model and tokenizer passed as arguments to the function are respectively:

model = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-multilingual-cased')

But with the modified version of BERT this error occours:

Traceback (most recent call last):
  File "convert_bert_to_long_bert.py", line 172, in <module>
    pretrain_and_evaluate(training_args, model, tokenizer, eval_only=False, model_path=training_args.output_dir)
  File "convert_bert_to_long_bert.py", line 86, in pretrain_and_evaluate
    eval_loss = trainer.evaluate()
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/trainer.py", line 748, in evaluate
    output = self._prediction_loop(eval_dataloader, description="Evaluation")
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/trainer.py", line 829, in _prediction_loop
    outputs = model(**inputs)
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 1098, in forward
    return_tuple=return_tuple,
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 799, in forward
    return_tuple=return_tuple,
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 460, in forward
    output_attentions,
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 391, in forward
    hidden_states, attention_mask, head_mask, output_attentions=output_attentions,
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 335, in forward
    hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions,
  File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() takes from 2 to 4 positional arguments but 7 were given

I did few modifications to a working script to obtain a Long version of RoBERTa given the RoBERTa base model. What could be the mistake?

wontfix

Source

paulthemagno

👀2

Most helpful comment

The code in Longformer has changed quite a bit. I think a simply remedy to make your code work with the current version of Longformer is to add **kwargs to every forward function in modeling_longformer.py that you copied into your notebook. This way it can handle an arbitrary number of input arguments and the above error should not occur.

patrickvonplaten on 8 Aug 2020

👍3

All 4 comments

Update: I have downgraded transformers to the version transformers==2.11.0 and it seems working, even if for now I have used little datasets for test. I will update this issue if someone is interested

paulthemagno on 14 Jul 2020

patrickvonplaten on 8 Aug 2020

👍3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 10 Oct 2020

The code in Longformer has changed quite a bit. I think a simply remedy to make your code work with the current version of Longformer is to add **kwargs to every forward function in modeling_longformer.py that you copied into your notebook. This way it can handle an arbitrary number of input arguments and the above error should not occur.

EDIT: To begin pre-training, make sure you LOAD the saved model exactly the way the notebook does BEFORE pre-training! Don't try and use the model straightaway!