Transformers: Vocab size mismatch on EncoderDecoder model from_pretrained

Created on 16 Jul 2020 · 3Comments · Source: huggingface/transformers

🐛 Bug

Information

Model I am using (Bert, XLNet ...): EncoderDecoder with bert-base-multilingual-cased in both

Language I am using the model on (English, Chinese ...): not relevant for the bug

The problem arises when using:

[ ] the official example scripts: (give details below)
[x] my own modified scripts: (give details below)

The tasks I am working on is: Not relevant for the bug

To reproduce

Steps to reproduce the behavior:

I am trying to load a training checkpoint using the save_pretrained and from_pretrained API with the EncoderDecoder model. EncoderDecoderModel.from_pretrained fails to load the model when the configuration is loaded from the previously checkpointed model. I believe it's because it is loading a default vocab size (30522) instead of whatever is defined in the saved config (119547) in my case. To repro this run:

from transformers import EncoderDecoderModel, BertTokenizer, BertConfig, EncoderDecoderConfig

# Loading encoder-decoder model and saving it
load_dir = 'bert-base-multilingual-cased'
encoder_config = BertConfig.from_pretrained(load_dir)
decoder_config = BertConfig.from_pretrained(load_dir, is_decoder=True)
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
tokenizer = BertTokenizer.from_pretrained(load_dir)
model = EncoderDecoderModel.from_encoder_decoder_pretrained(load_dir, load_dir, encoder_config=encoder_config, decoder_config=decoder_config) # initialize Bert2Bert
model.save_pretrained('ok')

# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws

The exception is the following:

File "/home/ancruzsa/.local/lib/python3.6/site-packages/transformers/modeling_utils.py", line 781, in from_pretrained
    model.__class__.__name__, "\n\t".join(error_msgs)
RuntimeError: Error(s) in loading state_dict for EncoderDecoderModel:
        size mismatch for encoder.embeddings.word_embeddings.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
        size mismatch for decoder.bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
        size mismatch for decoder.cls.predictions.bias: copying a param with shape torch.Size([119547]) from checkpoint, the shape in current model is torch.Size([30522]).
        size mismatch for decoder.cls.predictions.decoder.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
        size mismatch for decoder.cls.predictions.decoder.bias: copying a param with shape torch.Size([119547]) from checkpoint, the shape in current model is torch.Size([30522]).

Expected behavior

from_pretrained(path) should load the model without issues and using the provided configuration.

Edit: I was expecting from_pretrained with a single path as argument to work as explained in #4595 comment. However, it seems like doing EncoderDecoderModel.from_encoder_decoder_pretrained('ok', 'ok', encoder_config=encoder_config, decoder_config=decoder_config) does not throw an exception but it gives different results in text generation compared to EncoderDecoderModel.from_pretrained(path). It would be great to confirm if both are supported and load the model weights correctly.

Environment info

transformers version: 3.0.2
Platform: Linux
Python version: 3.6.9
PyTorch version (GPU?): 1.5.0 / Yes with GPU
Tensorflow version (GPU?): None
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Source

afcruzs

Most helpful comment

Hey @afcruzs,

These lines are the problem I think:

# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws

If you replace these lines with

# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)

no error should be thrown.

This line here:

encoder_config = BertConfig.from_pretrained('ok')

saves a EncoderDecoderConfig as a Bert Encoder config which should not be done IMO.

patrickvonplaten on 17 Jul 2020

❤3

All 3 comments

Hey @afcruzs,

These lines are the problem I think:

# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws

If you replace these lines with

# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)

no error should be thrown.

This line here:

encoder_config = BertConfig.from_pretrained('ok')

saves a EncoderDecoderConfig as a Bert Encoder config which should not be done IMO.

patrickvonplaten on 17 Jul 2020

❤3

Thanks @patrickvonplaten! that is indeed much clearer. My actual use case is to load the hf pretrained module with possibly modifying the config, saving with save_pretrained, and then later loading with from_pretrained. So this is my final code:

load_dir = 'bert-base-multilingual-cased'
encoder_config = BertConfig.from_pretrained(load_dir)
decoder_config = BertConfig.from_pretrained(load_dir, is_decoder=True)
model = EncoderDecoderModel.from_encoder_decoder_pretrained(load_dir, load_dir, encoder_config=encoder_config, decoder_config=decoder_config)

# Train for some time...

# Save model!
model.save_pretrained('ok')

# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)

I think it would be a good idea to add similar examples in the docs for clarity. Specially for EncoderDecoderConfig.from_pretrained("ok") and .from_pretrained(load_dir, is_decoder=True) since as you pointed out, doing so carelessly can lead to load the decoder config as encoder. I'm happy to help with the examples if you agree with them!

afcruzs on 18 Jul 2020

Hey @afcruzs,

I agree very much that the EncoderDecoderModel should have better documentation.

My plan was to release a notebook soon that explains in detail how to use the EncoderDecoderModel and then also to update the docs.

I won't be able to start with this until 3/08 so feel free to open A PR :-)

patrickvonplaten on 24 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings