Transformers: unable to completely load T5 pretrained model; missing/unexpected keys

Created on 31 Mar 2020 · 8Comments · Source: huggingface/transformers

🐛 Bug

Information

Model I am using: T5

To reproduce

model, info = T5ForConditionalGeneration.from_pretrained('t5-small',output_loading_info=True)

info is
{'missing_keys': ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'], 'unexpected_keys': ['encoder.block.0.layer.0.layer_norm.bias', 'encoder.block.0.layer.1.layer_norm.bias', 'encoder.block.1.layer.0.layer_norm.bias', 'encoder.block.1.layer.1.layer_norm.bias', 'encoder.block.2.layer.0.layer_norm.bias', 'encoder.block.2.layer.1.layer_norm.bias', 'encoder.block.3.layer.0.layer_norm.bias', 'encoder.block.3.layer.1.layer_norm.bias', 'encoder.block.4.layer.0.layer_norm.bias', 'encoder.block.4.layer.1.layer_norm.bias', 'encoder.block.5.layer.0.layer_norm.bias', 'encoder.block.5.layer.1.layer_norm.bias', 'encoder.final_layer_norm.bias', 'decoder.block.0.layer.0.layer_norm.bias', 'decoder.block.0.layer.1.layer_norm.bias', 'decoder.block.0.layer.2.layer_norm.bias', 'decoder.block.1.layer.0.layer_norm.bias', 'decoder.block.1.layer.1.layer_norm.bias', 'decoder.block.1.layer.2.layer_norm.bias', 'decoder.block.2.layer.0.layer_norm.bias', 'decoder.block.2.layer.1.layer_norm.bias', 'decoder.block.2.layer.2.layer_norm.bias', 'decoder.block.3.layer.0.layer_norm.bias', 'decoder.block.3.layer.1.layer_norm.bias', 'decoder.block.3.layer.2.layer_norm.bias', 'decoder.block.4.layer.0.layer_norm.bias', 'decoder.block.4.layer.1.layer_norm.bias', 'decoder.block.4.layer.2.layer_norm.bias', 'decoder.block.5.layer.0.layer_norm.bias', 'decoder.block.5.layer.1.layer_norm.bias', 'decoder.block.5.layer.2.layer_norm.bias', 'decoder.final_layer_norm.bias'], 'error_msgs': []}

Expected behavior

No keys should be missing or unexpected

Environment info

transformers version: 2.7.0
Platform: Ubuntu
Python version: 3.6
PyTorch version (GPU?): 1.2.0 (yes)
Tensorflow version (GPU?): nope
Using GPU in script?: yes
Using distributed or parallel set-up in script?: nope

Source

dhecloud

Most helpful comment

Yeah this should not be a problem, all these weights are [encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'] are weights tied to the input embedding matrix and therefore don't need to be initialized.

patrickvonplaten on 5 May 2020

👍2

All 8 comments

Hi @dhecloud,

Thanks for you issue :-)
Does the model still work fine?

patrickvonplaten on 31 Mar 2020

Hi @dhecloud,

Thanks for you issue :-)
Does the model still work fine?

Hi, thanks for your reply.
Using the examples provided in the doc, the model works fine.
Before i used T5WithLMHeadModel in version 2.5.1 which did not raise this missing keys warning. After i moved to T5ForConditionalGeneration in 2.7.0 there was this warning and my training loss diverged so i thought i might raise this issue in case there was some sort of change in naming in the checkpoint

dhecloud on 31 Mar 2020

I'm gonna take a look :-)

patrickvonplaten on 31 Mar 2020

👍2

Hi guys,
Any news on this?
When I try to load t5-base I receive this:

INFO:transformers.modeling_utils:Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']

majid999 on 4 May 2020

Hi guys,
Any news on this?
When I try to load t5-base I receive this:

INFO:transformers.modeling_utils:Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']

i think it's mostly a harmless misplaced error. The model should still work fine. You can test it by trying out the examples

dhecloud on 5 May 2020

👍1

patrickvonplaten on 5 May 2020

👍2

How can we silence the error?