Transformers: [ALBERT]: 'AlbertForMaskedLM' object has no attribute 'bias'

Created on 30 Nov 2019 · 17Comments · Source: huggingface/transformers

Hi,

I wanted to convert an own trained ALBERT model with the convert_albert_original_tf_checkpoint_to_pytorch.py script:

$ python3 convert_albert_original_tf_checkpoint_to_pytorch.py --tf_checkpoint_path /mnt/albert-base-secrect-language-cased/ --albert_config_file /mnt/albert-base-secrect-language-cased/config.json --pytorch_dump_path pytorch_model.bin

Unfortunately, the following error message is returned:

<--snip-->
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
cls/predictions/output_bias
cls/predictions/output_bias/adam_m
cls/predictions/output_bias/adam_v
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/beta/adam_m
cls/predictions/transform/LayerNorm/beta/adam_v
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/LayerNorm/gamma/adam_m
cls/predictions/transform/LayerNorm/gamma/adam_v
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/bias/adam_m
cls/predictions/transform/dense/bias/adam_v
cls/predictions/transform/dense/kernel
cls/predictions/transform/dense/kernel/adam_m
cls/predictions/transform/dense/kernel/adam_v
cls/seq_relationship/output_bias
cls/seq_relationship/output_bias/adam_m
cls/seq_relationship/output_bias/adam_v
cls/seq_relationship/output_weights
cls/seq_relationship/output_weights/adam_m
cls/seq_relationship/output_weights/adam_v
global_step
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
Traceback (most recent call last):
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 66, in <module>
    args.pytorch_dump_path)
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 37, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_albert(model, config, tf_checkpoint_path)
  File "/mnt/transformers/transformers/modeling_albert.py", line 92, in load_tf_weights_in_albert
    pointer = getattr(pointer, 'bias')
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'AlbertForMaskedLM' object has no attribute 'bias'

I'm using the latest commit in google-research for training the ALBERT model. Configuration is:

{
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "embedding_size": 128,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_hidden_groups": 1,
  "net_structure_type": 0,
  "gap_size": 0,
  "num_memory_blocks": 0,
  "inner_group_num": 1,
  "down_scale_factor": 1,
  "type_vocab_size": 2,
  "vocab_size": 32000
}

Source

stefan-it

Most helpful comment

Alright, please let me know if e85855f fixed it. I tested it with models saved from run_pretraning.py (with AlbertForMaskedLM as the host model) and run_classifier_sp.py (with AlbertForSequenceClassifiication) and both seem to work fine now.

Please keep in mind that we have no albert model that can do next sentence prediction so the weights from cls/seq_relationship are dropped.

LysandreJik on 3 Dec 2019

👍4 ❤1

All 17 comments

Same issue here. I did slightly different steps, but same result.

model = AlbertModel(config=config)
model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

Then I get,

AttributeError                            Traceback (most recent call last)
<ipython-input-5-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
     90                 pointer = getattr(pointer, 'weight')
     91             elif l[0] == 'output_bias' or l[0] == 'beta':
---> 92                 pointer = getattr(pointer, 'bias')
     93             elif l[0] == 'output_weights':
     94                 pointer = getattr(pointer, 'weight')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    589                 return modules[name]
    590         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591             type(self).__name__, name))
    592 
    593     def __setattr__(self, name, value):

AttributeError: 'AlbertModel' object has no attribute 'bias'

Miserably waiting for the solution :(
The pretrained tensorflow checkpoints were generated using the codes in https://github.com/google-research/google-research/tree/master/albert

It seems the latest code update was 3 days ago (Nov. 27). My training was initiated after that.

Please help us.

hansaimlim on 30 Nov 2019

Same issue here.

SunYanCN on 2 Dec 2019

You can Try my repo convert Albert tf to torch .py

On Mon, Dec 2, 2019 at 11:28 SunYan notifications@github.com wrote:

Same issue here.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/2006?email_source=notifications&email_token=AIEAE4BXOVAOQN7RGG35JHLQWR6GRA5CNFSM4JTGZEWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSCWFA#issuecomment-560212756,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIEAE4DCFSKTP2GFFB3GYDTQWR6GRANCNFSM4JTGZEWA
.

pohanchi on 2 Dec 2019

You can Try my repo convert Albert tf to torch .py
…
On Mon, Dec 2, 2019 at 11:28 SunYan @.*> wrote: Same issue here. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2006?email_source=notifications&email_token=AIEAE4BXOVAOQN7RGG35JHLQWR6GRA5CNFSM4JTGZEWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSCWFA#issuecomment-560212756>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4DCFSKTP2GFFB3GYDTQWR6GRANCNFSM4JTGZEWA .

秀！

pluto-junzeng on 2 Dec 2019

Hi, this should have been fixed with b3d834a, you can load the changes by installing from source.

Let me know if you still have an error.

LysandreJik on 2 Dec 2019

@LysandreJik Thank you for your help. I am getting a different error saying that object Embedding doesn't have 'shape'

It seems the module is expecting numpy array, while the checkpoint contains object called Embedding, thus has no attribute "shape"

I am not sure how to correct it though.

Thank you again!

AttributeError                            Traceback (most recent call last)
<ipython-input-4-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
    130             array = np.transpose(array)
    131         try:
--> 132             assert pointer.shape == array.shape
    133         except AssertionError as e:
    134             e.args += (pointer.shape, array.shape)

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    589                 return modules[name]
    590         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591             type(self).__name__, name))
    592 
    593     def __setattr__(self, name, value):

AttributeError: 'Embedding' object has no attribute 'shape'

hansaimlim on 2 Dec 2019

Hi @hansaimlim, what is the size of the model you are loading? Could you paste here the 5-10 lines output by the conversion before the error was raised?

LysandreJik on 2 Dec 2019

I could also reproduce that error:

global_step
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_m'] from bert/embeddings/LayerNorm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_v'] from bert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_m'] from bert/embeddings/LayerNorm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_v'] from bert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/position_embeddings/adam_m
Traceback (most recent call last):
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 66, in <module>
    args.pytorch_dump_path)
  File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 37, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_albert(model, config, tf_checkpoint_path)
  File "/mnt/transformers/transformers/modeling_albert.py", line 134, in load_tf_weights_in_albert
    assert pointer.shape == array.shape
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'Embedding' object has no attribute 'shape'

stefan-it on 2 Dec 2019

@LysandreJik Sure. Thanks for prompt feedback!

my_albert_config.json

attention_probs_dropout_prob:0
hidden_act:"gelu"
hidden_dropout_prob:0
embedding_size:128
hidden_size:312
initializer_range:0.02
intermediate_size:1248
max_position_embeddings:512
num_attention_heads:12
num_hidden_layers:4
num_hidden_groups:1
net_structure_type:0
gap_size:0
num_memory_blocks:0
inner_group_num:1
down_scale_factor:1
type_vocab_size:2
ln_type:"postln"
vocab_size:19686

bert/embeddings/LayerNorm/beta
bert/embeddings/LayerNorm/beta/adam_m
bert/embeddings/LayerNorm/beta/adam_v
bert/embeddings/LayerNorm/gamma
bert/embeddings/LayerNorm/gamma/adam_m
bert/embeddings/LayerNorm/gamma/adam_v
bert/embeddings/position_embeddings
bert/embeddings/position_embeddings/adam_m
bert/embeddings/position_embeddings/adam_v
bert/embeddings/token_type_embeddings
bert/embeddings/token_type_embeddings/adam_m
bert/embeddings/token_type_embeddings/adam_v
bert/embeddings/word_embeddings
bert/embeddings/word_embeddings/adam_m
bert/embeddings/word_embeddings/adam_v
bert/encoder/embedding_hidden_mapping_in/bias
bert/encoder/embedding_hidden_mapping_in/bias/adam_m
bert/encoder/embedding_hidden_mapping_in/bias/adam_v
bert/encoder/embedding_hidden_mapping_in/kernel
bert/encoder/embedding_hidden_mapping_in/kernel/adam_m
bert/encoder/embedding_hidden_mapping_in/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
cls/predictions/output_bias
cls/predictions/output_bias/adam_m
cls/predictions/output_bias/adam_v
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/beta/adam_m
cls/predictions/transform/LayerNorm/beta/adam_v
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/LayerNorm/gamma/adam_m
cls/predictions/transform/LayerNorm/gamma/adam_v
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/bias/adam_m
cls/predictions/transform/dense/bias/adam_v
cls/predictions/transform/dense/kernel
cls/predictions/transform/dense/kernel/adam_m
cls/predictions/transform/dense/kernel/adam_v
cls/seq_relationship/output_bias
cls/seq_relationship/output_bias/adam_m
cls/seq_relationship/output_bias/adam_v
cls/seq_relationship/output_weights
cls/seq_relationship/output_weights/adam_m
cls/seq_relationship/output_weights/adam_v
global_step
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_m'] from bert/embeddings/LayerNorm/beta/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_v'] from bert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_m'] from bert/embeddings/LayerNorm/gamma/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_v'] from bert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
    130             array = np.transpose(array)
    131         try:
--> 132             assert pointer.shape == array.shape
    133         except AssertionError as e:
    134             e.args += (pointer.shape, array.shape)

~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    589                 return modules[name]
    590         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591             type(self).__name__, name))
    592 
    593     def __setattr__(self, name, value):

AttributeError: 'Embedding' object has no attribute 'shape'

hansaimlim on 2 Dec 2019

Alright, I see where the issue stems from, I'm patching it and will get back to you soon.

LysandreJik on 2 Dec 2019

👍3 ❤2

Please keep in mind that we have no albert model that can do next sentence prediction so the weights from cls/seq_relationship are dropped.

LysandreJik on 3 Dec 2019

👍4 ❤1

@LysandreJik

Works fine!! :)))) Thank you so much! 👍

hansaimlim on 3 Dec 2019

Glad I could help!

LysandreJik on 3 Dec 2019

Thanks @LysandreJik ! I can also confirm that the conversion script is working now :+1:

stefan-it on 3 Dec 2019

🎉2

Short update: I used the converted ALBERT model to perform NER. F-score was ~0.1%. I've seen this strange behaviour for v2 ALBERT models but still have no solution for that.

@hansaimlim have you done some evaluations with your trained model? Would be great to know if this problem also occurs for non-NER tasks!

stefan-it on 4 Dec 2019

@stefan-it I'm working on drug activity prediction. In my case, I used v2 ALBERT as well, and its performance for masked LM was fine, and I haven't done downstream prediction tasks yet. Assuming you're working on human language, I believe our tasks are very different. How was it when you use BERT?

hansaimlim on 4 Dec 2019

I used my trained model for predicting a masked token, and the model always returns <unk> (which is not the case for the English v1 and v2 models), so I guess I did something wrong in the pre-training steps...

stefan-it on 10 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings