Transformers: [ALBERT]: 'AlbertForMaskedLM' object has no attribute 'bias'

Created on 30 Nov 2019  ·  17Comments  ·  Source: huggingface/transformers

Hi,

I wanted to convert an own trained ALBERT model with the convert_albert_original_tf_checkpoint_to_pytorch.py script:

$ python3 convert_albert_original_tf_checkpoint_to_pytorch.py --tf_checkpoint_path /mnt/albert-base-secrect-language-cased/ --albert_config_file /mnt/albert-base-secrect-language-cased/config.json --pytorch_dump_path pytorch_model.bin

Unfortunately, the following error message is returned:

<--snip-->
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
cls/predictions/output_bias
cls/predictions/output_bias/adam_m
cls/predictions/output_bias/adam_v
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/beta/adam_m
cls/predictions/transform/LayerNorm/beta/adam_v
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/LayerNorm/gamma/adam_m
cls/predictions/transform/LayerNorm/gamma/adam_v
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/bias/adam_m
cls/predictions/transform/dense/bias/adam_v
cls/predictions/transform/dense/kernel
cls/predictions/transform/dense/kernel/adam_m
cls/predictions/transform/dense/kernel/adam_v
cls/seq_relationship/output_bias
cls/seq_relationship/output_bias/adam_m
cls/seq_relationship/output_bias/adam_v
cls/seq_relationship/output_weights
cls/seq_relationship/output_weights/adam_m
cls/seq_relationship/output_weights/adam_v
global_step
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
Traceback (most recent call last):
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 66, in <module>
    args.pytorch_dump_path)
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 37, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_albert(model, config, tf_checkpoint_path)
  File "/mnt/transformers/transformers/modeling_albert.py", line 92, in load_tf_weights_in_albert
    pointer = getattr(pointer, 'bias')
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'AlbertForMaskedLM' object has no attribute 'bias'

I'm using the latest commit in google-research for training the ALBERT model. Configuration is:

{
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "embedding_size": 128,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_hidden_groups": 1,
  "net_structure_type": 0,
  "gap_size": 0,
  "num_memory_blocks": 0,
  "inner_group_num": 1,
  "down_scale_factor": 1,
  "type_vocab_size": 2,
  "vocab_size": 32000
}

Most helpful comment

Alright, please let me know if e85855f fixed it. I tested it with models saved from run_pretraning.py (with AlbertForMaskedLM as the host model) and run_classifier_sp.py (with AlbertForSequenceClassifiication) and both seem to work fine now.

Please keep in mind that we have no albert model that can do next sentence prediction so the weights from cls/seq_relationship are dropped.

All 17 comments

Same issue here. I did slightly different steps, but same result.

model = AlbertModel(config=config)
model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

Then I get,

AttributeError                            Traceback (most recent call last)
<ipython-input-5-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
     90                 pointer = getattr(pointer, 'weight')
     91             elif l[0] == 'output_bias' or l[0] == 'beta':
---> 92                 pointer = getattr(pointer, 'bias')
     93             elif l[0] == 'output_weights':
     94                 pointer = getattr(pointer, 'weight')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    589                 return modules[name]
    590         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591             type(self).__name__, name))
    592 
    593     def __setattr__(self, name, value):

AttributeError: 'AlbertModel' object has no attribute 'bias'

Miserably waiting for the solution :(
The pretrained tensorflow checkpoints were generated using the codes in https://github.com/google-research/google-research/tree/master/albert

It seems the latest code update was 3 days ago (Nov. 27). My training was initiated after that.

Please help us.

Same issue here.

You can Try my repo convert Albert tf to torch .py

On Mon, Dec 2, 2019 at 11:28 SunYan notifications@github.com wrote:

Same issue here.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/2006?email_source=notifications&email_token=AIEAE4BXOVAOQN7RGG35JHLQWR6GRA5CNFSM4JTGZEWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSCWFA#issuecomment-560212756,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIEAE4DCFSKTP2GFFB3GYDTQWR6GRANCNFSM4JTGZEWA
.

You can Try my repo convert Albert tf to torch .py

On Mon, Dec 2, 2019 at 11:28 SunYan @.*> wrote: Same issue here. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2006?email_source=notifications&email_token=AIEAE4BXOVAOQN7RGG35JHLQWR6GRA5CNFSM4JTGZEWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSCWFA#issuecomment-560212756>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4DCFSKTP2GFFB3GYDTQWR6GRANCNFSM4JTGZEWA .

秀!

Hi, this should have been fixed with b3d834a, you can load the changes by installing from source.

Let me know if you still have an error.

@LysandreJik Thank you for your help. I am getting a different error saying that object Embedding doesn't have 'shape'

It seems the module is expecting numpy array, while the checkpoint contains object called Embedding, thus has no attribute "shape"

I am not sure how to correct it though.

Thank you again!

AttributeError                            Traceback (most recent call last)
<ipython-input-4-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
    130             array = np.transpose(array)
    131         try:
--> 132             assert pointer.shape == array.shape
    133         except AssertionError as e:
    134             e.args += (pointer.shape, array.shape)

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    589                 return modules[name]
    590         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591             type(self).__name__, name))
    592 
    593     def __setattr__(self, name, value):

AttributeError: 'Embedding' object has no attribute 'shape'

Hi @hansaimlim, what is the size of the model you are loading? Could you paste here the 5-10 lines output by the conversion before the error was raised?

I could also reproduce that error:

global_step
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_m'] from bert/embeddings/LayerNorm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_v'] from bert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_m'] from bert/embeddings/LayerNorm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_v'] from bert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/position_embeddings/adam_m
Traceback (most recent call last):
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 66, in <module>
    args.pytorch_dump_path)
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 37, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_albert(model, config, tf_checkpoint_path)
  File "/mnt/transformers/transformers/modeling_albert.py", line 134, in load_tf_weights_in_albert
    assert pointer.shape == array.shape
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'Embedding' object has no attribute 'shape'

@LysandreJik Sure. Thanks for prompt feedback!

my_albert_config.json

attention_probs_dropout_prob:0
hidden_act:"gelu"
hidden_dropout_prob:0
embedding_size:128
hidden_size:312
initializer_range:0.02
intermediate_size:1248
max_position_embeddings:512
num_attention_heads:12
num_hidden_layers:4
num_hidden_groups:1
net_structure_type:0
gap_size:0
num_memory_blocks:0
inner_group_num:1
down_scale_factor:1
type_vocab_size:2
ln_type:"postln"
vocab_size:19686
bert/embeddings/LayerNorm/beta
bert/embeddings/LayerNorm/beta/adam_m
bert/embeddings/LayerNorm/beta/adam_v
bert/embeddings/LayerNorm/gamma
bert/embeddings/LayerNorm/gamma/adam_m
bert/embeddings/LayerNorm/gamma/adam_v
bert/embeddings/position_embeddings
bert/embeddings/position_embeddings/adam_m
bert/embeddings/position_embeddings/adam_v
bert/embeddings/token_type_embeddings
bert/embeddings/token_type_embeddings/adam_m
bert/embeddings/token_type_embeddings/adam_v
bert/embeddings/word_embeddings
bert/embeddings/word_embeddings/adam_m
bert/embeddings/word_embeddings/adam_v
bert/encoder/embedding_hidden_mapping_in/bias
bert/encoder/embedding_hidden_mapping_in/bias/adam_m
bert/encoder/embedding_hidden_mapping_in/bias/adam_v
bert/encoder/embedding_hidden_mapping_in/kernel
bert/encoder/embedding_hidden_mapping_in/kernel/adam_m
bert/encoder/embedding_hidden_mapping_in/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
cls/predictions/output_bias
cls/predictions/output_bias/adam_m
cls/predictions/output_bias/adam_v
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/beta/adam_m
cls/predictions/transform/LayerNorm/beta/adam_v
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/LayerNorm/gamma/adam_m
cls/predictions/transform/LayerNorm/gamma/adam_v
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/bias/adam_m
cls/predictions/transform/dense/bias/adam_v
cls/predictions/transform/dense/kernel
cls/predictions/transform/dense/kernel/adam_m
cls/predictions/transform/dense/kernel/adam_v
cls/seq_relationship/output_bias
cls/seq_relationship/output_bias/adam_m
cls/seq_relationship/output_bias/adam_v
cls/seq_relationship/output_weights
cls/seq_relationship/output_weights/adam_m
cls/seq_relationship/output_weights/adam_v
global_step
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_m'] from bert/embeddings/LayerNorm/beta/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_v'] from bert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_m'] from bert/embeddings/LayerNorm/gamma/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_v'] from bert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
    130             array = np.transpose(array)
    131         try:
--> 132             assert pointer.shape == array.shape
    133         except AssertionError as e:
    134             e.args += (pointer.shape, array.shape)

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    589                 return modules[name]
    590         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591             type(self).__name__, name))
    592 
    593     def __setattr__(self, name, value):

AttributeError: 'Embedding' object has no attribute 'shape'

Alright, I see where the issue stems from, I'm patching it and will get back to you soon.

Alright, please let me know if e85855f fixed it. I tested it with models saved from run_pretraning.py (with AlbertForMaskedLM as the host model) and run_classifier_sp.py (with AlbertForSequenceClassifiication) and both seem to work fine now.

Please keep in mind that we have no albert model that can do next sentence prediction so the weights from cls/seq_relationship are dropped.

@LysandreJik

Works fine!! :)))) Thank you so much! 👍

Glad I could help!

Thanks @LysandreJik ! I can also confirm that the conversion script is working now :+1:

Short update: I used the converted ALBERT model to perform NER. F-score was ~0.1%. I've seen this strange behaviour for v2 ALBERT models but still have no solution for that.

@hansaimlim have you done some evaluations with your trained model? Would be great to know if this problem also occurs for non-NER tasks!

@stefan-it I'm working on drug activity prediction. In my case, I used v2 ALBERT as well, and its performance for masked LM was fine, and I haven't done downstream prediction tasks yet. Assuming you're working on human language, I believe our tasks are very different. How was it when you use BERT?

I used my trained model for predicting a masked token, and the model always returns <unk> (which is not the case for the English v1 and v2 models), so I guess I did something wrong in the pre-training steps...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lcswillems picture lcswillems  ·  3Comments

iedmrc picture iedmrc  ·  3Comments

delip picture delip  ·  3Comments

alphanlp picture alphanlp  ·  3Comments

fyubang picture fyubang  ·  3Comments