Hi,
I wanted to convert an own trained ALBERT model with the convert_albert_original_tf_checkpoint_to_pytorch.py script:
$ python3 convert_albert_original_tf_checkpoint_to_pytorch.py --tf_checkpoint_path /mnt/albert-base-secrect-language-cased/ --albert_config_file /mnt/albert-base-secrect-language-cased/config.json --pytorch_dump_path pytorch_model.bin
Unfortunately, the following error message is returned:
<--snip-->
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
cls/predictions/output_bias
cls/predictions/output_bias/adam_m
cls/predictions/output_bias/adam_v
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/beta/adam_m
cls/predictions/transform/LayerNorm/beta/adam_v
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/LayerNorm/gamma/adam_m
cls/predictions/transform/LayerNorm/gamma/adam_v
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/bias/adam_m
cls/predictions/transform/dense/bias/adam_v
cls/predictions/transform/dense/kernel
cls/predictions/transform/dense/kernel/adam_m
cls/predictions/transform/dense/kernel/adam_v
cls/seq_relationship/output_bias
cls/seq_relationship/output_bias/adam_m
cls/seq_relationship/output_bias/adam_v
cls/seq_relationship/output_weights
cls/seq_relationship/output_weights/adam_m
cls/seq_relationship/output_weights/adam_v
global_step
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping bert/embeddings/attention/LayerNorm/beta
Traceback (most recent call last):
File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 66, in <module>
args.pytorch_dump_path)
File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 37, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_albert(model, config, tf_checkpoint_path)
File "/mnt/transformers/transformers/modeling_albert.py", line 92, in load_tf_weights_in_albert
pointer = getattr(pointer, 'bias')
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
type(self).__name__, name))
AttributeError: 'AlbertForMaskedLM' object has no attribute 'bias'
I'm using the latest commit in google-research for training the ALBERT model. Configuration is:
{
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"embedding_size": 128,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_hidden_groups": 1,
"net_structure_type": 0,
"gap_size": 0,
"num_memory_blocks": 0,
"inner_group_num": 1,
"down_scale_factor": 1,
"type_vocab_size": 2,
"vocab_size": 32000
}
Same issue here. I did slightly different steps, but same result.
model = AlbertModel(config=config)
model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')
Then I get,
AttributeError Traceback (most recent call last)
<ipython-input-5-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')
~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
90 pointer = getattr(pointer, 'weight')
91 elif l[0] == 'output_bias' or l[0] == 'beta':
---> 92 pointer = getattr(pointer, 'bias')
93 elif l[0] == 'output_weights':
94 pointer = getattr(pointer, 'weight')
~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
589 return modules[name]
590 raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591 type(self).__name__, name))
592
593 def __setattr__(self, name, value):
AttributeError: 'AlbertModel' object has no attribute 'bias'
Miserably waiting for the solution :(
The pretrained tensorflow checkpoints were generated using the codes in https://github.com/google-research/google-research/tree/master/albert
It seems the latest code update was 3 days ago (Nov. 27). My training was initiated after that.
Please help us.
Same issue here.
You can Try my repo convert Albert tf to torch .py
On Mon, Dec 2, 2019 at 11:28 SunYan notifications@github.com wrote:
Same issue here.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/2006?email_source=notifications&email_token=AIEAE4BXOVAOQN7RGG35JHLQWR6GRA5CNFSM4JTGZEWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSCWFA#issuecomment-560212756,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIEAE4DCFSKTP2GFFB3GYDTQWR6GRANCNFSM4JTGZEWA
.
You can Try my repo convert Albert tf to torch .py
…
On Mon, Dec 2, 2019 at 11:28 SunYan @.*> wrote: Same issue here. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2006?email_source=notifications&email_token=AIEAE4BXOVAOQN7RGG35JHLQWR6GRA5CNFSM4JTGZEWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSCWFA#issuecomment-560212756>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEAE4DCFSKTP2GFFB3GYDTQWR6GRANCNFSM4JTGZEWA .
秀!
Hi, this should have been fixed with b3d834a, you can load the changes by installing from source.
Let me know if you still have an error.
@LysandreJik Thank you for your help. I am getting a different error saying that object Embedding doesn't have 'shape'
It seems the module is expecting numpy array, while the checkpoint contains object called Embedding, thus has no attribute "shape"
I am not sure how to correct it though.
Thank you again!
AttributeError Traceback (most recent call last)
<ipython-input-4-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')
~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
130 array = np.transpose(array)
131 try:
--> 132 assert pointer.shape == array.shape
133 except AssertionError as e:
134 e.args += (pointer.shape, array.shape)
~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
589 return modules[name]
590 raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591 type(self).__name__, name))
592
593 def __setattr__(self, name, value):
AttributeError: 'Embedding' object has no attribute 'shape'
Hi @hansaimlim, what is the size of the model you are loading? Could you paste here the 5-10 lines output by the conversion before the error was raised?
I could also reproduce that error:
global_step
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_m'] from bert/embeddings/LayerNorm/beta/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_v'] from bert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_m'] from bert/embeddings/LayerNorm/gamma/adam_m
INFO:transformers.modeling_albert:Skipping albert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_v'] from bert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
INFO:transformers.modeling_albert:Skipping albert/embeddings/position_embeddings/adam_m
Traceback (most recent call last):
File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 66, in <module>
args.pytorch_dump_path)
File "convert_albert_original_tf_checkpoint_to_pytorch.py", line 37, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_albert(model, config, tf_checkpoint_path)
File "/mnt/transformers/transformers/modeling_albert.py", line 134, in load_tf_weights_in_albert
assert pointer.shape == array.shape
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 585, in __getattr__
type(self).__name__, name))
AttributeError: 'Embedding' object has no attribute 'shape'
@LysandreJik Sure. Thanks for prompt feedback!
my_albert_config.json
attention_probs_dropout_prob:0
hidden_act:"gelu"
hidden_dropout_prob:0
embedding_size:128
hidden_size:312
initializer_range:0.02
intermediate_size:1248
max_position_embeddings:512
num_attention_heads:12
num_hidden_layers:4
num_hidden_groups:1
net_structure_type:0
gap_size:0
num_memory_blocks:0
inner_group_num:1
down_scale_factor:1
type_vocab_size:2
ln_type:"postln"
vocab_size:19686
bert/embeddings/LayerNorm/beta
bert/embeddings/LayerNorm/beta/adam_m
bert/embeddings/LayerNorm/beta/adam_v
bert/embeddings/LayerNorm/gamma
bert/embeddings/LayerNorm/gamma/adam_m
bert/embeddings/LayerNorm/gamma/adam_v
bert/embeddings/position_embeddings
bert/embeddings/position_embeddings/adam_m
bert/embeddings/position_embeddings/adam_v
bert/embeddings/token_type_embeddings
bert/embeddings/token_type_embeddings/adam_m
bert/embeddings/token_type_embeddings/adam_v
bert/embeddings/word_embeddings
bert/embeddings/word_embeddings/adam_m
bert/embeddings/word_embeddings/adam_v
bert/encoder/embedding_hidden_mapping_in/bias
bert/encoder/embedding_hidden_mapping_in/bias/adam_m
bert/encoder/embedding_hidden_mapping_in/bias/adam_v
bert/encoder/embedding_hidden_mapping_in/kernel
bert/encoder/embedding_hidden_mapping_in/kernel/adam_m
bert/encoder/embedding_hidden_mapping_in/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta/adam_v
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_m
bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias/adam_v
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_m
bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel/adam_v
bert/pooler/dense/bias
bert/pooler/dense/bias/adam_m
bert/pooler/dense/bias/adam_v
bert/pooler/dense/kernel
bert/pooler/dense/kernel/adam_m
bert/pooler/dense/kernel/adam_v
cls/predictions/output_bias
cls/predictions/output_bias/adam_m
cls/predictions/output_bias/adam_v
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/beta/adam_m
cls/predictions/transform/LayerNorm/beta/adam_v
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/LayerNorm/gamma/adam_m
cls/predictions/transform/LayerNorm/gamma/adam_v
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/bias/adam_m
cls/predictions/transform/dense/bias/adam_v
cls/predictions/transform/dense/kernel
cls/predictions/transform/dense/kernel/adam_m
cls/predictions/transform/dense/kernel/adam_v
cls/seq_relationship/output_bias
cls/seq_relationship/output_bias/adam_m
cls/seq_relationship/output_bias/adam_v
cls/seq_relationship/output_weights
cls/seq_relationship/output_weights/adam_m
cls/seq_relationship/output_weights/adam_v
global_step
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta'] from bert/embeddings/LayerNorm/beta
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_m'] from bert/embeddings/LayerNorm/beta/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'beta', 'adam_v'] from bert/embeddings/LayerNorm/beta/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma'] from bert/embeddings/LayerNorm/gamma
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_m'] from bert/embeddings/LayerNorm/gamma/adam_m
Initialize PyTorch weight ['albert', 'embeddings', 'LayerNorm', 'gamma', 'adam_v'] from bert/embeddings/LayerNorm/gamma/adam_v
Initialize PyTorch weight ['albert', 'embeddings', 'position_embeddings'] from bert/embeddings/position_embeddings
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-a47f5e7bff26> in <module>
----> 1 model = load_tf_weights_in_albert(model,config,'sample_tf_checkpoint/model.ckpt-100000')
~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/transformers/modeling_albert.py in load_tf_weights_in_albert(model, config, tf_checkpoint_path)
130 array = np.transpose(array)
131 try:
--> 132 assert pointer.shape == array.shape
133 except AssertionError as e:
134 e.args += (pointer.shape, array.shape)
~/anaconda3/envs/pytorch_py37/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
589 return modules[name]
590 raise AttributeError("'{}' object has no attribute '{}'".format(
--> 591 type(self).__name__, name))
592
593 def __setattr__(self, name, value):
AttributeError: 'Embedding' object has no attribute 'shape'
Alright, I see where the issue stems from, I'm patching it and will get back to you soon.
Alright, please let me know if e85855f fixed it. I tested it with models saved from run_pretraning.py (with AlbertForMaskedLM as the host model) and run_classifier_sp.py (with AlbertForSequenceClassifiication) and both seem to work fine now.
Please keep in mind that we have no albert model that can do next sentence prediction so the weights from cls/seq_relationship are dropped.
@LysandreJik
Works fine!! :)))) Thank you so much! 👍
Glad I could help!
Thanks @LysandreJik ! I can also confirm that the conversion script is working now :+1:
Short update: I used the converted ALBERT model to perform NER. F-score was ~0.1%. I've seen this strange behaviour for v2 ALBERT models but still have no solution for that.
@hansaimlim have you done some evaluations with your trained model? Would be great to know if this problem also occurs for non-NER tasks!
@stefan-it I'm working on drug activity prediction. In my case, I used v2 ALBERT as well, and its performance for masked LM was fine, and I haven't done downstream prediction tasks yet. Assuming you're working on human language, I believe our tasks are very different. How was it when you use BERT?
I used my trained model for predicting a masked token, and the model always returns <unk> (which is not the case for the English v1 and v2 models), so I guess I did something wrong in the pre-training steps...
Most helpful comment
Alright, please let me know if e85855f fixed it. I tested it with models saved from
run_pretraning.py(withAlbertForMaskedLMas the host model) andrun_classifier_sp.py(withAlbertForSequenceClassifiication) and both seem to work fine now.Please keep in mind that we have no albert model that can do next sentence prediction so the weights from
cls/seq_relationshipare dropped.