Transformers: Loading pretrained RobertaForSequenceClassification fails, size missmatch error

Created on 24 Oct 2019 · 2Comments · Source: huggingface/transformers

🐛 Bug

Model I am using RobertaForSequenceClassification and when I tried to load 'roberta-base' model using this code on Google Colab:

```from transformers import RobertaForSequenceClassification, RobertaConfig
config = RobertaConfig()
model = RobertaForSequenceClassification.from_pretrained(
"roberta-base", config = config)
model

I get the following error:

RuntimeError: Error(s) in loading state_dict for RobertaForSequenceClassification:
size mismatch for roberta.embeddings.word_embeddings.weight: copying a param with shape torch.Size([50265, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
size mismatch for roberta.embeddings.position_embeddings.weight: copying a param with shape torch.Size([514, 768]) from checkpoint, the shape in current model is torch.Size([512, 768]).
size mismatch for roberta.embeddings.token_type_embeddings.weight: copying a param with shape torch.Size([1, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
```

Maybe related to #1340

Environment

Google Colab Platform Linux-4.14.137+-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0]
PyTorch 1.3.0+cu100
Transformers 2.1.1

Source

jlealtrujillo

Most helpful comment

Hi @LysandreJik ,

Thanks a lot for the clarification, this is indeed much clearer. I tried the code again and it is working.

jlealtrujillo on 25 Oct 2019

👍2

All 2 comments

Hi! You're initializing RoBERTa with a blank configuration, which results in a very BERT-like configuration. BERT has different attributes than RoBERTa (different vocabulary size, positional embeddings size etc) so this indeed results in an error.

To instantiate RoBERTa you can simply do:

model = RobertaForSequenceClassification.from_pretrained("roberta-base")

If you wish to have a configuration file so that you can change attributes like outputting the hidden states, you could do it like this:

config = RobertaConfig.from_pretrained("roberta-base", output_hidden_states=True)
model = RobertaForSequenceClassification.from_pretrained("roberta-base", config=config)