Trying to convert BERT checkpoints to pytorch checkpoints. It worked for default uncased bert_model.ckpt. However, after we did a custom training of tensorflow version and then tried to convert TF checkpoints to pytorch, it is giving error: 'BertPreTrainingHeads' object has no attribute 'squad'
When printed
elif l[0] == 'output_bias' or l[0] == 'beta':
pointer = getattr(pointer, 'bias')
elif l[0] == 'output_weights':
pointer = getattr(pointer, 'weight')
else:
print("--> ", str(l)) ############### printed this
print("==> ", str(pointer)) ################# printed this
pointer = getattr(pointer, l[0])
output:
--> ['squad']
==> BertPreTrainingHeads(
(predictions): BertLMPredictionHead(
(transform): BertPredictionHeadTransform(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
)
(decoder): Linear(in_features=768, out_features=30522, bias=False)
)
(seq_relationship): Linear(in_features=768, out_features=2, bias=True)
)
Hi @SandeepBhutani,
Can you point me to the script you use for finetuning in Tensorflow?
Hi @thomwolf : Thanks for reply.
Fine Tuning is done by mentioning do_train=True on run_squad.py (From google bert release github page: https://github.com/google-research/bert)
Internally, it calls estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
Finetuning file was also same train-v1.1.json.. https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
Sample header of train file is :
{"data": [{"title": "University_of_Notre_Dame", "paragraphs": [{"context": "Architecturally
Following observation in case it is useful:
While converting checkpoint of origional uncased bert_model.ckpt following log is printed:
Loading TF weight bert/encoder/layer_9/output/LayerNorm/gamma with shape [768]
Loading TF weight bert/encoder/layer_9/output/dense/bias with shape [768]
Loading TF weight bert/encoder/layer_9/output/dense/kernel with shape [3072, 768]
Loading TF weight bert/pooler/dense/bias with shape [768]
Loading TF weight bert/pooler/dense/kernel with shape [768, 768]
Loading TF weight cls/predictions/output_bias with shape [30522]
Loading TF weight cls/predictions/transform/LayerNorm/beta with shape [768]
Loading TF weight cls/predictions/transform/LayerNorm/gamma with shape [768]
Loading TF weight cls/predictions/transform/dense/bias with shape [768]
Loading TF weight cls/predictions/transform/dense/kernel with shape [768, 768]
Loading TF weight cls/seq_relationship/output_bias with shape [2]
Loading TF weight cls/seq_relationship/output_weights with shape [2, 768]
Building PyTorch model from configuration: {
While converting checkpoint after finetuning is done (model.ckpt-9000) following log is printed:
Loading TF weight bert/encoder/layer_9/output/dense/kernel/adam_m with shape [3072, 768]
Loading TF weight bert/encoder/layer_9/output/dense/kernel/adam_v with shape [3072, 768]
Loading TF weight bert/pooler/dense/bias with shape [768]
Loading TF weight bert/pooler/dense/kernel with shape [768, 768]
Loading TF weight cls/squad/output_bias with shape [2]
Loading TF weight cls/squad/output_bias/adam_m with shape [2]
Loading TF weight cls/squad/output_bias/adam_v with shape [2]
Loading TF weight cls/squad/output_weights with shape [2, 768]
Loading TF weight cls/squad/output_weights/adam_m with shape [2, 768]
Loading TF weight cls/squad/output_weights/adam_v with shape [2, 768]
Loading TF weight global_step with shape []
Building PyTorch model from configuration: {
_cls/predictions_ is gone and _cls/squad_ appeared
After reading the code of both tensorflow and pytorch version, figured out that tensorflow version is referring squad in create_model, like below (cls/squad/output_weights):
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
final_hidden = model.get_sequence_output()
final_hidden_shape = modeling.get_shape_list(final_hidden, expected_rank=3)
batch_size = final_hidden_shape[0]
seq_length = final_hidden_shape[1]
hidden_size = final_hidden_shape[2]
output_weights = tf.get_variable(
"cls/squad/output_weights", [2, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"cls/squad/output_bias", [2], initializer=tf.zeros_initializer())
Any suggestion, what should be tweaked? And where (create_model in tensorflow version should be changed or convert_tf_checkpoint_to_pytorch in pytorch version should be changed?)
Looks like the definition of pytorch model (BertForPreTraining mentioned in conversion script) is different from tensorflow version, when fine tuned. That is why cls -> squad -> output_bias is not found. Is my understanding correct? If yes, is correct class already available which we can refer while conversion?
Hi @thomwolf ,
To make the conversion work, in modeling.py of pytorch version, I have added the class and 1 line of code in BertPreTrainingHeads below. After this conversion is happening. But I am not sure if I have done correct thing (_being a beginner in both tf and pytorch_).
Would you like to validate/correct please.
class SandeepSquadClass(nn.Module): ########this class sandeep added
def __init__(self, config, bert_model_embedding_weights):
super(SandeepSquadClass, self).__init__()
self.weight = Variable(torch.ones(2, config.hidden_size), requires_grad=True)
self.bias = Variable(torch.ones(2), requires_grad=True)
def forward(self):
print("What to do?")
class BertPreTrainingHeads(nn.Module):
def __init__(self, config, bert_model_embedding_weights):
super(BertPreTrainingHeads, self).__init__()
self.predictions = BertLMPredictionHead(config, bert_model_embedding_weights)
#sandeep code below 3 apr
self.squad = SandeepSquadClass(config, bert_model_embedding_weights) ###this line sandeep added
self.seq_relationship = nn.Linear(config.hidden_size, 2)
Hi @SandeepBhutani, I pushed a commit to master which should help you do this kind of thing.
First, switch to master by cloning the repo and then follow the following instructions:
The convert_tf_checkpoint_to_pytorch
conversion script is made to create BertForPretraining
model which is not your use case but you can load another type of model by reproducing the behavior of this script as follows:
from pytorch_pretrained_bert import BertConfig, BertForTokenClassification, load_tf_weights_in_bert
# Initialise a configuration according to your model
config = BertConfig.from_pretrained('bert-XXX-XXX')
# You will need to load a BertForTokenClassification model
model = BertForTokenClassification(config)
# Load weights from tf checkpoint
load_tf_weights_in_bert(model, tf_checkpoint_path)
# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)
Thanks @thomwolf ... After following change the checkpoint generated smoothly.
#model = BertForPreTraining(config) ##commented this
model = BertForTokenClassification(config, 2) ## Added this
Let us give a try to run prediction using new .bin file. Hope the results would be same as using tensorflow version with .ckpt file.
Appreciate 馃憤
I downloaded tensorflow checkpoints for domain specific bert model and extracted the zip file into the folder pretrained_bert which contains the following the three files
I used the following code to convert tensorflow checkpoints to pytorch
import torch
from pytorch_transformers.modeling_bert import BertConfig, BertForPreTraining, load_tf_weights_in_bert
tf_checkpoint_path="pretrained_bert/model.ckpt"
bert_config_file = "bert-base-cased-config.json"
pytorch_dump_path="pytorch_bert"
config = BertConfig.from_json_file(bert_config_file)
print("Building PyTorch model from configuration: {}".format(str(config)))
model = BertForPreTraining(config)
# Load weights from tf checkpoint
load_tf_weights_in_bert(model, config, tf_checkpoint_path)
# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)
I got this error when I ran the above code
NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for pretrained_bert/model.ckpt
Any help is really appreciated............
Seems like the script cannot find your checkpoint. Try giving it the full absolute path to the file.
@thomwolf
Thanks, I didn't get any error when I gave absolute path of the file.
I was trying to convert my fine tuned model to pytorch using the following command.
`
tf_checkpoint_path='models/model.ckpt-21'
bert_config_file='PRETRAINED_MODELS/uncased_L-12_H-768_A-12/bert_config.json'
pytorch_dump_path='pytorch_models/pytorch_model.bin'
python convert_bert_original_tf_checkpoint_to_pytorch.py --tf_checkpoint_path=$tf_checkpoint_path --bert_config_file=$bert_config_file --pytorch_dump_path=$pytorch_dump_path `
The issue that I face is given below. Any help would be appreciated.
Traceback (most recent call last):
File "convert_bert_original_tf_checkpoint_to_pytorch.py", line 65, in
args.pytorch_dump_path)
File "convert_bert_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_bert(model, config, tf_checkpoint_path)
File "/home/cibin/virtual_envs/pytorch/lib/python3.7/site-packages/transformers/modeling_bert.py", line 98, in load_tf_weights_in_bert
pointer = getattr(pointer, 'classifier')
File "/home/cibin/virtual_envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 585, in getattr
type(self).name, name))
AttributeError: 'BertPreTrainingHeads' object has no attribute 'classifier'
A possible solution if you're copying a SQuAD-fine-tuned Bert from TF to PT
Issue:
AttributeError: 'BertPreTrainingHeads' object has no attribute 'classifier'
It works for me by doing the following steps:
Step 1.
In the script convert_tf_checkpoint_to_pytorch.py
(or convert_bert_original_tf_checkpoint_to_pytorch.py
):
BertForPreTraining
with BertForQuestionAnswering
.Step 2.
Open the source code file modeling_bert.py
in your package site-packages\transformers
:
load_tf_weights_in_bert
, replaceelif l[0] == 'squad':
pointer = getattr(pointer, 'classifier')
elif l[0] == 'squad':
pointer = getattr(pointer, 'qa_outputs')
It should work since qa_outputs
is the attribute name for the output layer of BertForQuestionAnswering
instead of classifier
.
Step 3.
After copying, check your pytorch model by evaluating the dev-v2.0.json
with a script like this:
python run_squad.py --model_type bert --model_name_or_path MODEL_PATH --do_eval --train_file None --predict_file dev-v2.0.json --max_seq_length 384 --doc_stride 128 --output_dir ./output/ --version_2_with_negative
where output_dir
should contain a copy of the pytorch model.
This will result in an evaluation like this:
{
"exact": 72.99755748336563,
"f1": 76.24686988414918,
"total": 11873,
"HasAns_exact": 72.82388663967612,
"HasAns_f1": 79.33182964482165,
"HasAns_total": 5928,
"NoAns_exact": 73.17073170731707,
"NoAns_f1": 73.17073170731707,
"NoAns_total": 5945,
"best_exact": 74.3619978101575,
"best_exact_thresh": -3.6369030475616455,
"best_f1": 77.12234803941384,
"best_f1_thresh": -3.6369030475616455
}
for a BERT-Base
model.
However, if using BertForTokenClassification
instead, the model will not be correctly copied since the structures for the classification layer are different. I tried this and got a model that had a f1 score of 10%.
AttributeError: 'BertForTokenClassification' object has no attribute 'predict'
How do I use BERT trained model for prediction?
@rashibudati, please take a look at the docs, namely the Usage section which shows how to use token classification models.
@Hya-cinthus Thank you so much! This saved me a lot of headache!
Most helpful comment
A possible solution if you're copying a SQuAD-fine-tuned Bert from TF to PT
Issue:
AttributeError: 'BertPreTrainingHeads' object has no attribute 'classifier'
It works for me by doing the following steps:
Step 1.
In the script
convert_tf_checkpoint_to_pytorch.py
(orconvert_bert_original_tf_checkpoint_to_pytorch.py
):BertForPreTraining
withBertForQuestionAnswering
.Step 2.
Open the source code file
modeling_bert.py
in your packagesite-packages\transformers
:load_tf_weights_in_bert
, replaceelif l[0] == 'squad':
pointer = getattr(pointer, 'classifier')
with
elif l[0] == 'squad':
pointer = getattr(pointer, 'qa_outputs')
It should work since
qa_outputs
is the attribute name for the output layer ofBertForQuestionAnswering
instead ofclassifier
.Step 3.
After copying, check your pytorch model by evaluating the
dev-v2.0.json
with a script like this:python run_squad.py --model_type bert --model_name_or_path MODEL_PATH --do_eval --train_file None --predict_file dev-v2.0.json --max_seq_length 384 --doc_stride 128 --output_dir ./output/ --version_2_with_negative
where
output_dir
should contain a copy of the pytorch model.This will result in an evaluation like this:
{ "exact": 72.99755748336563, "f1": 76.24686988414918, "total": 11873, "HasAns_exact": 72.82388663967612, "HasAns_f1": 79.33182964482165, "HasAns_total": 5928, "NoAns_exact": 73.17073170731707, "NoAns_f1": 73.17073170731707, "NoAns_total": 5945, "best_exact": 74.3619978101575, "best_exact_thresh": -3.6369030475616455, "best_f1": 77.12234803941384, "best_f1_thresh": -3.6369030475616455 }
for a
BERT-Base
model.However, if using
BertForTokenClassification
instead, the model will not be correctly copied since the structures for the classification layer are different. I tried this and got a model that had a f1 score of 10%.