Model I am using (Bert, XLNet ...): EncoderDecoder with bert-base-multilingual-cased in both
Language I am using the model on (English, Chinese ...): not relevant for the bug
The problem arises when using:
The tasks I am working on is: Not relevant for the bug
Steps to reproduce the behavior:
I am trying to load a training checkpoint using the save_pretrained and from_pretrained API with the EncoderDecoder model. EncoderDecoderModel.from_pretrained fails to load the model when the configuration is loaded from the previously checkpointed model. I believe it's because it is loading a default vocab size (30522) instead of whatever is defined in the saved config (119547) in my case. To repro this run:
from transformers import EncoderDecoderModel, BertTokenizer, BertConfig, EncoderDecoderConfig
# Loading encoder-decoder model and saving it
load_dir = 'bert-base-multilingual-cased'
encoder_config = BertConfig.from_pretrained(load_dir)
decoder_config = BertConfig.from_pretrained(load_dir, is_decoder=True)
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
tokenizer = BertTokenizer.from_pretrained(load_dir)
model = EncoderDecoderModel.from_encoder_decoder_pretrained(load_dir, load_dir, encoder_config=encoder_config, decoder_config=decoder_config) # initialize Bert2Bert
model.save_pretrained('ok')
# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws
The exception is the following:
File "/home/ancruzsa/.local/lib/python3.6/site-packages/transformers/modeling_utils.py", line 781, in from_pretrained
model.__class__.__name__, "\n\t".join(error_msgs)
RuntimeError: Error(s) in loading state_dict for EncoderDecoderModel:
size mismatch for encoder.embeddings.word_embeddings.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
size mismatch for decoder.bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
size mismatch for decoder.cls.predictions.bias: copying a param with shape torch.Size([119547]) from checkpoint, the shape in current model is torch.Size([30522]).
size mismatch for decoder.cls.predictions.decoder.weight: copying a param with shape torch.Size([119547, 768]) from checkpoint, the shape in current model is torch.Size([30522, 768]).
size mismatch for decoder.cls.predictions.decoder.bias: copying a param with shape torch.Size([119547]) from checkpoint, the shape in current model is torch.Size([30522]).
from_pretrained(path) should load the model without issues and using the provided configuration.
Edit: I was expecting from_pretrained with a single path as argument to work as explained in #4595 comment. However, it seems like doing EncoderDecoderModel.from_encoder_decoder_pretrained('ok', 'ok', encoder_config=encoder_config, decoder_config=decoder_config) does not throw an exception but it gives different results in text generation compared to EncoderDecoderModel.from_pretrained(path). It would be great to confirm if both are supported and load the model weights correctly.
transformers version: 3.0.2Hey @afcruzs,
These lines are the problem I think:
# Loading saved model and its configuration
encoder_config = BertConfig.from_pretrained('ok')
decoder_config = BertConfig.from_pretrained('ok')
print(encoder_config.vocab_size)
print(encoder_config.vocab_size)
encoder_decoder_config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config)
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config) # This throws
If you replace these lines with
# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)
no error should be thrown.
This line here:
encoder_config = BertConfig.from_pretrained('ok')
saves a EncoderDecoderConfig as a Bert Encoder config which should not be done IMO.
Thanks @patrickvonplaten! that is indeed much clearer. My actual use case is to load the hf pretrained module with possibly modifying the config, saving with save_pretrained, and then later loading with from_pretrained. So this is my final code:
load_dir = 'bert-base-multilingual-cased'
encoder_config = BertConfig.from_pretrained(load_dir)
decoder_config = BertConfig.from_pretrained(load_dir, is_decoder=True)
model = EncoderDecoderModel.from_encoder_decoder_pretrained(load_dir, load_dir, encoder_config=encoder_config, decoder_config=decoder_config)
# Train for some time...
# Save model!
model.save_pretrained('ok')
# Loading saved model and its configuration
encoder_decoder_config = EncoderDecoderConfig.from_pretrained("ok")
model2 = EncoderDecoderModel.from_pretrained('ok', config=encoder_decoder_config)
I think it would be a good idea to add similar examples in the docs for clarity. Specially for EncoderDecoderConfig.from_pretrained("ok") and .from_pretrained(load_dir, is_decoder=True) since as you pointed out, doing so carelessly can lead to load the decoder config as encoder. I'm happy to help with the examples if you agree with them!
Hey @afcruzs,
I agree very much that the EncoderDecoderModel should have better documentation.
My plan was to release a notebook soon that explains in detail how to use the EncoderDecoderModel and then also to update the docs.
I won't be able to start with this until 3/08 so feel free to open A PR :-)
Most helpful comment
Hey @afcruzs,
These lines are the problem I think:
If you replace these lines with
no error should be thrown.
This line here:
saves a EncoderDecoderConfig as a Bert Encoder config which should not be done IMO.