I'm having an issue on the pretraining of a BERT-like model. I used the following function twice: the first time with bert-base-multilingual-cased and the second time with a simil version, but more efficient for long documents, exploiting the class LongformerSelfAttention to make the normal BERT into a LongBERT.
def pretrain_and_evaluate(args, model, tokenizer, eval_only, model_path):
val_dataset = TextDataset(tokenizer=tokenizer,
file_path=args.val_datapath,
block_size=tokenizer.max_len)
if eval_only:
train_dataset = val_dataset
else:
logger.info(f'Loading and tokenizing training data is usually slow: {args.train_datapath}')
train_dataset = TextDataset(tokenizer=tokenizer,
file_path=args.train_datapath,
block_size=tokenizer.max_len)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True, mlm_probability=0.15)
trainer = Trainer(model=model, args=args, data_collator=data_collator,
train_dataset=train_dataset, eval_dataset=val_dataset, prediction_loss_only=True,)
eval_loss = trainer.evaluate()
eval_loss = eval_loss['eval_loss']
logger.info(f'Initial eval bpc: {eval_loss/math.log(2)}')
if not eval_only:
trainer.train(model_path=model_path)
trainer.save_model()
eval_loss = trainer.evaluate()
eval_loss = eval_loss['eval_loss']
logger.info(f'Eval bpc after pretraining: {eval_loss/math.log(2)}')
With the bert-base-multilingual-cased it works well: model and tokenizer passed as arguments to the function are respectively:
model = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-multilingual-cased')
But with the modified version of BERT this error occours:
Traceback (most recent call last):
File "convert_bert_to_long_bert.py", line 172, in <module>
pretrain_and_evaluate(training_args, model, tokenizer, eval_only=False, model_path=training_args.output_dir)
File "convert_bert_to_long_bert.py", line 86, in pretrain_and_evaluate
eval_loss = trainer.evaluate()
File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/trainer.py", line 748, in evaluate
output = self._prediction_loop(eval_dataloader, description="Evaluation")
File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/trainer.py", line 829, in _prediction_loop
outputs = model(**inputs)
File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 1098, in forward
return_tuple=return_tuple,
File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 799, in forward
return_tuple=return_tuple,
File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 460, in forward
output_attentions,
File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 391, in forward
hidden_states, attention_mask, head_mask, output_attentions=output_attentions,
File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/Users/user/Library/Python/3.7/lib/python/site-packages/transformers/modeling_bert.py", line 335, in forward
hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, output_attentions,
File "/Users/user/Library/Python/3.7/lib/python/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() takes from 2 to 4 positional arguments but 7 were given
I did few modifications to a working script to obtain a Long version of RoBERTa given the RoBERTa base model. What could be the mistake?
Update: I have downgraded transformers to the version transformers==2.11.0 and it seems working, even if for now I have used little datasets for test. I will update this issue if someone is interested
The code in Longformer has changed quite a bit. I think a simply remedy to make your code work with the current version of Longformer is to add **kwargs to every forward function in modeling_longformer.py that you copied into your notebook. This way it can handle an arbitrary number of input arguments and the above error should not occur.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
The code in Longformer has changed quite a bit. I think a simply remedy to make your code work with the current version of
Longformeris to add**kwargsto every forward function inmodeling_longformer.pythat you copied into your notebook. This way it can handle an arbitrary number of input arguments and the above error should not occur.
EDIT: To begin pre-training, make sure you LOAD the saved model exactly the way the notebook does BEFORE pre-training! Don't try and use the model straightaway!
Most helpful comment
The code in Longformer has changed quite a bit. I think a simply remedy to make your code work with the current version of
Longformeris to add**kwargsto every forward function inmodeling_longformer.pythat you copied into your notebook. This way it can handle an arbitrary number of input arguments and the above error should not occur.