Transformers: Vocab size mismatch on EncoderDecoder model from_pretrained

Created on 16 Jul 2020  路  3Comments  路  Source: huggingface/transformers

馃悰 Bug

Information

Model I am using (Bert, XLNet ...): EncoderDecoder with bert-base-multilingual-cased in both

Language I am using the model on (English, Chinese ...): not relevant for the bug

The problem arises when using:

  • [ ] the official example scripts: (give details below)
  • [x] my own modified scripts: (give details below)

The tasks I am working on is: Not relevant for the bug

To reproduce

Steps to reproduce the behavior:

I am trying to load a training checkpoint using the save_pretrained and from_pretrained API with the EncoderDecoder model. EncoderDecoderModel.from_pretrained fails to load the model when the configuration is loaded from the previously checkpointed model. I believe it's because it is loading a default vocab size (30522) instead of whatever is defined in the saved config (119547) in my case. To repro this run:

from transformers import EncoderDecoderModel, BertTokenizer, BertConfig, EncoderDecoderConfig

# Loading encoder-decoder model and saving it
load_dir = 'bert-base-multilingual-cased'
encoder_config = BertConfig.from_pretrained(load_dir)
decoder_config = BertConfig.from_pretrained(load_dir, is_decoder=True)
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
tokenizer = BertTokenizer.from_pretrained(load_dir)
model = EncoderDecoderModel.from_encoder_decoder_pretrained(load_dir, load_dir, encoder_config=encoder_config, decoder_config=decoder_config) # initialize Bert2Bert
model.save_pretrained('ok')

# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws

The exception is the following:

File "/home/ancruzsa/.local/lib/python3.6/site-packages/transformers/modeling_utils.py", line 781, in from_pretrained
    model.__class__.__name__, "\n\t".join(error_msgs)
RuntimeError: Error(s) in loading state_dict for EncoderDecoderModel:
        size mismatch for encoder.embeddings.word_embeddings.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
        size mismatch for decoder.bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
        size mismatch for decoder.cls.predictions.bias: copying a param with shape torch.Size([119547]) from checkpoint, the shape in current model is torch.Size([30522]).
        size mismatch for decoder.cls.predictions.decoder.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
        size mismatch for decoder.cls.predictions.decoder.bias: copying a param with shape torch.Size([119547]) from checkpoint, the shape in current model is torch.Size([30522]).

Expected behavior

from_pretrained(path) should load the model without issues and using the provided configuration.

Edit: I was expecting from_pretrained with a single path as argument to work as explained in #4595 comment. However, it seems like doing EncoderDecoderModel.from_encoder_decoder_pretrained('ok', 'ok', encoder_config=encoder_config, decoder_config=decoder_config) does not throw an exception but it gives different results in text generation compared to EncoderDecoderModel.from_pretrained(path). It would be great to confirm if both are supported and load the model weights correctly.

Environment info

  • transformers version: 3.0.2
  • Platform: Linux
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.5.0 / Yes with GPU
  • Tensorflow version (GPU?): None
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Most helpful comment

Hey @afcruzs,

These lines are the problem I think:

# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws

If you replace these lines with

# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)

no error should be thrown.

This line here:

encoder_config = BertConfig.from_pretrained('ok')

saves a EncoderDecoderConfig as a Bert Encoder config which should not be done IMO.

All 3 comments

Hey @afcruzs,

These lines are the problem I think:

# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws

If you replace these lines with

# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)

no error should be thrown.

This line here:

encoder_config = BertConfig.from_pretrained('ok')

saves a EncoderDecoderConfig as a Bert Encoder config which should not be done IMO.

Thanks @patrickvonplaten! that is indeed much clearer. My actual use case is to load the hf pretrained module with possibly modifying the config, saving with save_pretrained, and then later loading with from_pretrained. So this is my final code:

load_dir = 'bert-base-multilingual-cased'
encoder_config = BertConfig.from_pretrained(load_dir)
decoder_config = BertConfig.from_pretrained(load_dir, is_decoder=True)
model = EncoderDecoderModel.from_encoder_decoder_pretrained(load_dir, load_dir, encoder_config=encoder_config, decoder_config=decoder_config)

# Train for some time...

# Save model!
model.save_pretrained('ok')

# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)

I think it would be a good idea to add similar examples in the docs for clarity. Specially for EncoderDecoderConfig.from_pretrained("ok") and .from_pretrained(load_dir, is_decoder=True) since as you pointed out, doing so carelessly can lead to load the decoder config as encoder. I'm happy to help with the examples if you agree with them!

Hey @afcruzs,

I agree very much that the EncoderDecoderModel should have better documentation.

My plan was to release a notebook soon that explains in detail how to use the EncoderDecoderModel and then also to update the docs.

I won't be able to start with this until 3/08 so feel free to open A PR :-)

Was this page helpful?
0 / 5 - 0 ratings