Transformers: unable to completely load T5 pretrained model; missing/unexpected keys

Created on 31 Mar 2020  路  8Comments  路  Source: huggingface/transformers

馃悰 Bug

Information

Model I am using: T5

To reproduce

model, info = T5ForConditionalGeneration.from_pretrained('t5-small',output_loading_info=True)

info is
{'missing_keys': ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'], 'unexpected_keys': ['encoder.block.0.layer.0.layer_norm.bias', 'encoder.block.0.layer.1.layer_norm.bias', 'encoder.block.1.layer.0.layer_norm.bias', 'encoder.block.1.layer.1.layer_norm.bias', 'encoder.block.2.layer.0.layer_norm.bias', 'encoder.block.2.layer.1.layer_norm.bias', 'encoder.block.3.layer.0.layer_norm.bias', 'encoder.block.3.layer.1.layer_norm.bias', 'encoder.block.4.layer.0.layer_norm.bias', 'encoder.block.4.layer.1.layer_norm.bias', 'encoder.block.5.layer.0.layer_norm.bias', 'encoder.block.5.layer.1.layer_norm.bias', 'encoder.final_layer_norm.bias', 'decoder.block.0.layer.0.layer_norm.bias', 'decoder.block.0.layer.1.layer_norm.bias', 'decoder.block.0.layer.2.layer_norm.bias', 'decoder.block.1.layer.0.layer_norm.bias', 'decoder.block.1.layer.1.layer_norm.bias', 'decoder.block.1.layer.2.layer_norm.bias', 'decoder.block.2.layer.0.layer_norm.bias', 'decoder.block.2.layer.1.layer_norm.bias', 'decoder.block.2.layer.2.layer_norm.bias', 'decoder.block.3.layer.0.layer_norm.bias', 'decoder.block.3.layer.1.layer_norm.bias', 'decoder.block.3.layer.2.layer_norm.bias', 'decoder.block.4.layer.0.layer_norm.bias', 'decoder.block.4.layer.1.layer_norm.bias', 'decoder.block.4.layer.2.layer_norm.bias', 'decoder.block.5.layer.0.layer_norm.bias', 'decoder.block.5.layer.1.layer_norm.bias', 'decoder.block.5.layer.2.layer_norm.bias', 'decoder.final_layer_norm.bias'], 'error_msgs': []}

Expected behavior

No keys should be missing or unexpected

Environment info

  • transformers version: 2.7.0
  • Platform: Ubuntu
  • Python version: 3.6
  • PyTorch version (GPU?): 1.2.0 (yes)
  • Tensorflow version (GPU?): nope
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: nope

Most helpful comment

Yeah this should not be a problem, all these weights are [encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'] are weights tied to the input embedding matrix and therefore don't need to be initialized.

All 8 comments

Hi @dhecloud,

Thanks for you issue :-)
Does the model still work fine?

Hi @dhecloud,

Thanks for you issue :-)
Does the model still work fine?

Hi, thanks for your reply.
Using the examples provided in the doc, the model works fine.
Before i used T5WithLMHeadModel in version 2.5.1 which did not raise this missing keys warning. After i moved to T5ForConditionalGeneration in 2.7.0 there was this warning and my training loss diverged so i thought i might raise this issue in case there was some sort of change in naming in the checkpoint

I'm gonna take a look :-)

Hi guys,
Any news on this?
When I try to load t5-base I receive this:

INFO:transformers.modeling_utils:Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']

Hi guys,
Any news on this?
When I try to load t5-base I receive this:

INFO:transformers.modeling_utils:Weights of T5ForConditionalGeneration not initialized from pretrained model: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']

i think it's mostly a harmless misplaced error. The model should still work fine. You can test it by trying out the examples

Yeah this should not be a problem, all these weights are [encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'] are weights tied to the input embedding matrix and therefore don't need to be initialized.

How can we silence the error?

It should be enough to lower the cli-logger

Was this page helpful?
0 / 5 - 0 ratings