Transformers: convert_tf_checkpoint_to_pytorch 'BertPreTrainingHeads' object has no attribute 'squad'

Created on 2 Apr 2019  路  14Comments  路  Source: huggingface/transformers

Trying to convert BERT checkpoints to pytorch checkpoints. It worked for default uncased bert_model.ckpt. However, after we did a custom training of tensorflow version and then tried to convert TF checkpoints to pytorch, it is giving error: 'BertPreTrainingHeads' object has no attribute 'squad'
When printed

elif l[0] == 'output_bias' or l[0] == 'beta':
                pointer = getattr(pointer, 'bias')
            elif l[0] == 'output_weights':
                pointer = getattr(pointer, 'weight')
            else:
                print("--> ", str(l))  ############### printed this
                print("==> ", str(pointer)) ################# printed this
                pointer = getattr(pointer, l[0])

output:

--> ['squad']
==> BertPreTrainingHeads(
  (predictions): BertLMPredictionHead(
    (transform): BertPredictionHeadTransform(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (LayerNorm): BertLayerNorm()
    )
    (decoder): Linear(in_features=768, out_features=30522, bias=False)
  )
  (seq_relationship): Linear(in_features=768, out_features=2, bias=True)
)
  • Can you please tell us what is happening? Does tensorflow add something during finetuning? Not sure from where squad word got into tensorflow ckpt file.
  • And, what needs to be done to fix this?
  • Are you planning to fix this and release updated code?

Most helpful comment

A possible solution if you're copying a SQuAD-fine-tuned Bert from TF to PT

Issue:
AttributeError: 'BertPreTrainingHeads' object has no attribute 'classifier'

It works for me by doing the following steps:

Step 1.
In the script convert_tf_checkpoint_to_pytorch.py (or convert_bert_original_tf_checkpoint_to_pytorch.py):

  • Replace all BertForPreTrainingwith BertForQuestionAnswering.

Step 2.
Open the source code file modeling_bert.py in your package site-packages\transformers:

  • In the function load_tf_weights_in_bert, replace
    elif l[0] == 'squad':
    pointer = getattr(pointer, 'classifier')
    with
    elif l[0] == 'squad':
    pointer = getattr(pointer, 'qa_outputs')

It should work since qa_outputs is the attribute name for the output layer of BertForQuestionAnswering instead of classifier.

Step 3.
After copying, check your pytorch model by evaluating the dev-v2.0.json with a script like this:
python run_squad.py --model_type bert --model_name_or_path MODEL_PATH --do_eval --train_file None --predict_file dev-v2.0.json --max_seq_length 384 --doc_stride 128 --output_dir ./output/ --version_2_with_negative
where output_dir should contain a copy of the pytorch model.

This will result in an evaluation like this:
{ "exact": 72.99755748336563, "f1": 76.24686988414918, "total": 11873, "HasAns_exact": 72.82388663967612, "HasAns_f1": 79.33182964482165, "HasAns_total": 5928, "NoAns_exact": 73.17073170731707, "NoAns_f1": 73.17073170731707, "NoAns_total": 5945, "best_exact": 74.3619978101575, "best_exact_thresh": -3.6369030475616455, "best_f1": 77.12234803941384, "best_f1_thresh": -3.6369030475616455 }
for a BERT-Base model.

However, if using BertForTokenClassification instead, the model will not be correctly copied since the structures for the classification layer are different. I tried this and got a model that had a f1 score of 10%.

All 14 comments

Hi @SandeepBhutani,
Can you point me to the script you use for finetuning in Tensorflow?

Hi @thomwolf : Thanks for reply.
Fine Tuning is done by mentioning do_train=True on run_squad.py (From google bert release github page: https://github.com/google-research/bert)
Internally, it calls estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
Finetuning file was also same train-v1.1.json.. https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
Sample header of train file is :
{"data": [{"title": "University_of_Notre_Dame", "paragraphs": [{"context": "Architecturally

Following observation in case it is useful:
While converting checkpoint of origional uncased bert_model.ckpt following log is printed:

Loading TF weight bert/encoder/layer_9/output/LayerNorm/gamma with shape [768]
Loading TF weight bert/encoder/layer_9/output/dense/bias with shape [768]
Loading TF weight bert/encoder/layer_9/output/dense/kernel with shape [3072, 768]
Loading TF weight bert/pooler/dense/bias with shape [768]
Loading TF weight bert/pooler/dense/kernel with shape [768, 768]
Loading TF weight cls/predictions/output_bias with shape [30522]
Loading TF weight cls/predictions/transform/LayerNorm/beta with shape [768]
Loading TF weight cls/predictions/transform/LayerNorm/gamma with shape [768]
Loading TF weight cls/predictions/transform/dense/bias with shape [768]
Loading TF weight cls/predictions/transform/dense/kernel with shape [768, 768]
Loading TF weight cls/seq_relationship/output_bias with shape [2]
Loading TF weight cls/seq_relationship/output_weights with shape [2, 768]
Building PyTorch model from configuration: {

While converting checkpoint after finetuning is done (model.ckpt-9000) following log is printed:

Loading TF weight bert/encoder/layer_9/output/dense/kernel/adam_m with shape [3072, 768]
Loading TF weight bert/encoder/layer_9/output/dense/kernel/adam_v with shape [3072, 768]
Loading TF weight bert/pooler/dense/bias with shape [768]
Loading TF weight bert/pooler/dense/kernel with shape [768, 768]
Loading TF weight cls/squad/output_bias with shape [2]
Loading TF weight cls/squad/output_bias/adam_m with shape [2]
Loading TF weight cls/squad/output_bias/adam_v with shape [2]
Loading TF weight cls/squad/output_weights with shape [2, 768]
Loading TF weight cls/squad/output_weights/adam_m with shape [2, 768]
Loading TF weight cls/squad/output_weights/adam_v with shape [2, 768]
Loading TF weight global_step with shape []
Building PyTorch model from configuration: {

_cls/predictions_ is gone and _cls/squad_ appeared

After reading the code of both tensorflow and pytorch version, figured out that tensorflow version is referring squad in create_model, like below (cls/squad/output_weights):

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
                 use_one_hot_embeddings):
    """Creates a classification model."""
    model = modeling.BertModel(
        config=bert_config,
        is_training=is_training,
        input_ids=input_ids,
        input_mask=input_mask,
        token_type_ids=segment_ids,
        use_one_hot_embeddings=use_one_hot_embeddings)

    final_hidden = model.get_sequence_output()

    final_hidden_shape = modeling.get_shape_list(final_hidden, expected_rank=3)
    batch_size = final_hidden_shape[0]
    seq_length = final_hidden_shape[1]
    hidden_size = final_hidden_shape[2]

    output_weights = tf.get_variable(
        "cls/squad/output_weights", [2, hidden_size],
        initializer=tf.truncated_normal_initializer(stddev=0.02))

    output_bias = tf.get_variable(
        "cls/squad/output_bias", [2], initializer=tf.zeros_initializer())

Any suggestion, what should be tweaked? And where (create_model in tensorflow version should be changed or convert_tf_checkpoint_to_pytorch in pytorch version should be changed?)

Looks like the definition of pytorch model (BertForPreTraining mentioned in conversion script) is different from tensorflow version, when fine tuned. That is why cls -> squad -> output_bias is not found. Is my understanding correct? If yes, is correct class already available which we can refer while conversion?

Hi @thomwolf ,
To make the conversion work, in modeling.py of pytorch version, I have added the class and 1 line of code in BertPreTrainingHeads below. After this conversion is happening. But I am not sure if I have done correct thing (_being a beginner in both tf and pytorch_).
Would you like to validate/correct please.

class SandeepSquadClass(nn.Module):    ########this class sandeep added
    def __init__(self, config, bert_model_embedding_weights): 
        super(SandeepSquadClass, self).__init__()
        self.weight = Variable(torch.ones(2, config.hidden_size), requires_grad=True) 
        self.bias = Variable(torch.ones(2), requires_grad=True)

    def forward(self):
        print("What to do?")

class BertPreTrainingHeads(nn.Module):
    def __init__(self, config, bert_model_embedding_weights):
        super(BertPreTrainingHeads, self).__init__()
        self.predictions = BertLMPredictionHead(config, bert_model_embedding_weights)
        #sandeep code below 3 apr
        self.squad = SandeepSquadClass(config, bert_model_embedding_weights)    ###this line sandeep added
        self.seq_relationship = nn.Linear(config.hidden_size, 2)


Hi @SandeepBhutani, I pushed a commit to master which should help you do this kind of thing.

First, switch to master by cloning the repo and then follow the following instructions:

The convert_tf_checkpoint_to_pytorch conversion script is made to create BertForPretraining model which is not your use case but you can load another type of model by reproducing the behavior of this script as follows:

from pytorch_pretrained_bert import BertConfig, BertForTokenClassification, load_tf_weights_in_bert

# Initialise a configuration according to your model
config = BertConfig.from_pretrained('bert-XXX-XXX')

# You will need to load a BertForTokenClassification model
model = BertForTokenClassification(config)

# Load weights from tf checkpoint
load_tf_weights_in_bert(model, tf_checkpoint_path)

# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)

Thanks @thomwolf ... After following change the checkpoint generated smoothly.

    #model = BertForPreTraining(config)   ##commented this
    model = BertForTokenClassification(config, 2)   ## Added this

Let us give a try to run prediction using new .bin file. Hope the results would be same as using tensorflow version with .ckpt file.
Appreciate 馃憤

I downloaded tensorflow checkpoints for domain specific bert model and extracted the zip file into the folder pretrained_bert which contains the following the three files

  • model.ckpt.data-00000-of-00001
  • model.ckpt.index
  • model.ckpt.meta

I used the following code to convert tensorflow checkpoints to pytorch

import torch

from pytorch_transformers.modeling_bert import BertConfig, BertForPreTraining, load_tf_weights_in_bert


tf_checkpoint_path="pretrained_bert/model.ckpt"
bert_config_file = "bert-base-cased-config.json"
pytorch_dump_path="pytorch_bert"

config = BertConfig.from_json_file(bert_config_file)
print("Building PyTorch model from configuration: {}".format(str(config)))
model = BertForPreTraining(config)

# Load weights from tf checkpoint
load_tf_weights_in_bert(model, config, tf_checkpoint_path)

# Save pytorch-model
print("Save PyTorch model to {}".format(pytorch_dump_path))
torch.save(model.state_dict(), pytorch_dump_path)

I got this error when I ran the above code

NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for pretrained_bert/model.ckpt

Any help is really appreciated............

Seems like the script cannot find your checkpoint. Try giving it the full absolute path to the file.

@thomwolf
Thanks, I didn't get any error when I gave absolute path of the file.

I was trying to convert my fine tuned model to pytorch using the following command.

`
tf_checkpoint_path='models/model.ckpt-21'
bert_config_file='PRETRAINED_MODELS/uncased_L-12_H-768_A-12/bert_config.json'
pytorch_dump_path='pytorch_models/pytorch_model.bin'

python convert_bert_original_tf_checkpoint_to_pytorch.py --tf_checkpoint_path=$tf_checkpoint_path --bert_config_file=$bert_config_file --pytorch_dump_path=$pytorch_dump_path `

The issue that I face is given below. Any help would be appreciated.

Traceback (most recent call last):
File "convert_bert_original_tf_checkpoint_to_pytorch.py", line 65, in
args.pytorch_dump_path)
File "convert_bert_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
load_tf_weights_in_bert(model, config, tf_checkpoint_path)
File "/home/cibin/virtual_envs/pytorch/lib/python3.7/site-packages/transformers/modeling_bert.py", line 98, in load_tf_weights_in_bert
pointer = getattr(pointer, 'classifier')
File "/home/cibin/virtual_envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 585, in getattr
type(self).name, name))
AttributeError: 'BertPreTrainingHeads' object has no attribute 'classifier'

A possible solution if you're copying a SQuAD-fine-tuned Bert from TF to PT

Issue:
AttributeError: 'BertPreTrainingHeads' object has no attribute 'classifier'

It works for me by doing the following steps:

Step 1.
In the script convert_tf_checkpoint_to_pytorch.py (or convert_bert_original_tf_checkpoint_to_pytorch.py):

  • Replace all BertForPreTrainingwith BertForQuestionAnswering.

Step 2.
Open the source code file modeling_bert.py in your package site-packages\transformers:

  • In the function load_tf_weights_in_bert, replace
    elif l[0] == 'squad':
    pointer = getattr(pointer, 'classifier')
    with
    elif l[0] == 'squad':
    pointer = getattr(pointer, 'qa_outputs')

It should work since qa_outputs is the attribute name for the output layer of BertForQuestionAnswering instead of classifier.

Step 3.
After copying, check your pytorch model by evaluating the dev-v2.0.json with a script like this:
python run_squad.py --model_type bert --model_name_or_path MODEL_PATH --do_eval --train_file None --predict_file dev-v2.0.json --max_seq_length 384 --doc_stride 128 --output_dir ./output/ --version_2_with_negative
where output_dir should contain a copy of the pytorch model.

This will result in an evaluation like this:
{ "exact": 72.99755748336563, "f1": 76.24686988414918, "total": 11873, "HasAns_exact": 72.82388663967612, "HasAns_f1": 79.33182964482165, "HasAns_total": 5928, "NoAns_exact": 73.17073170731707, "NoAns_f1": 73.17073170731707, "NoAns_total": 5945, "best_exact": 74.3619978101575, "best_exact_thresh": -3.6369030475616455, "best_f1": 77.12234803941384, "best_f1_thresh": -3.6369030475616455 }
for a BERT-Base model.

However, if using BertForTokenClassification instead, the model will not be correctly copied since the structures for the classification layer are different. I tried this and got a model that had a f1 score of 10%.

AttributeError: 'BertForTokenClassification' object has no attribute 'predict'
How do I use BERT trained model for prediction?

@rashibudati, please take a look at the docs, namely the Usage section which shows how to use token classification models.

@Hya-cinthus Thank you so much! This saved me a lot of headache!

Was this page helpful?
0 / 5 - 0 ratings