Transformers: Bert for passage reranking

Created on 3 May 2019  路  14Comments  路  Source: huggingface/transformers

Hi I am currently trying to implement bert for passage reranking in pytorch. Here is the paper and github repo.
https://arxiv.org/abs/1901.04085
https://github.com/nyu-dl/dl4marco-bert

I've downloaded their bert large model checkpoint and bert config for the task the convert_tf_checkpoint_to_pytorch function seems to successfully extract the weights from tensorflow.

Then while initialising the pytorch model

Initialize PyTorch weight ['bert', 'pooler', 'dense', 'kernel']
Skipping bert/pooler/dense/kernel/adam_m
Skipping bert/pooler/dense/kernel/adam_v
Skipping global_step

```~/anaconda3/envs/new_fast_ai/lib/python3.7/site-packages/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py in convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path)
35
36 # Load weights from tf checkpoint
---> 37 load_tf_weights_in_bert(model, tf_checkpoint_path)
38
39 # Save pytorch-model

~/anaconda3/envs/new_fast_ai/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py in load_tf_weights_in_bert(model, tf_checkpoint_path)
88 pointer = getattr(pointer, 'weight')
89 elif l[0] == 'output_bias' or l[0] == 'beta':
---> 90 pointer = getattr(pointer, 'bias')
91 elif l[0] == 'output_weights':
92 pointer = getattr(pointer, 'weight')

~/anaconda3/envs/new_fast_ai/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
533 return modules[name]
534 raise AttributeError("'{}' object has no attribute '{}'".format(
--> 535 type(self).__name__, name))
536
537 def __setattr__(self, name, value):

AttributeError: 'BertForPreTraining' object has no attribute 'bias'
```
I assume it is issues with the final layer
What is the best way for me to go about resolving this?

thanks in advance!

Most helpful comment

Update for latest transformers, add modeling_bert.py:78:

    for name, array in zip(names, arrays):
        if name in ['output_weights', 'output_bias']:
            name = 'classifier/' + name

and convert_bert_original_tf_checkpoint_to_pytorch.py

config.num_labels = 2
    print("Building PyTorch model from configuration: {}".format(str(config)))
    model = BertForSequenceClassification(config)

All 14 comments

The convert_tf_checkpoint_to_pytorch script is made to convert the Google pre-trained weights in BertForPretraining model, you have to modify it to convert another type model.

In your case, you want to load the passage re-ranking model in a BertForSequenceClassification model which has the same structure (BERT + a classifier on top of the pooled output) as the NYU model.

here is a quick way to do that:

  • install pytorch-pretrained-bert from source so you can modify it
  • change https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py#L34 to initialize a BertForSequenceClassification model instead of the BertForPreTraining model in the conversion script.
  • the structure is not exactly identical so you need to ADD a line that say pointer = getattr(pointer, 'cls') in the TWO if-conditions related to output_weights and output_bias (between L89 and L90 and between L91 and L92 in modeling.py here: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L90 and https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L92).
  • this should let you convert the tensorflow model in a pytorch one using the scripts.

Thanks so much! Your comment saved me a lot of time. However there was a small issue I got around by just changing the tf variable names.

For anyone else out there the solution was

Hello @oisin-dolphin and @thomwolf
I followed above suggestions but getting following error.
tensorflow.python.framework.errors_impl.NotFoundError: Key classifier/output_bias not found in checkpoint

Also what is significance of following line of code
pointer = getattr(pointer, 'cls')

Please suggest.

Thanks
Mahesh

The convert_tf_checkpoint_to_pytorch script is made to convert the Google pre-trained weights in BertForPretraining model, you have to modify it to convert another type model.

In your case, you want to load the passage re-ranking model in a BertForSequenceClassification model which has the same structure (BERT + a classifier on top of the pooled output) as the NYU model.

here is a quick way to do that:

  • install pytorch-pretrained-bert from source so you can modify it
  • change https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py#L34 to initialize a BertForSequenceClassification model instead of the BertForPreTraining model in the conversion script.
  • the structure is not exactly identical so you need to ADD a line that say pointer = getattr(pointer, 'cls') in the TWO if-conditions related to output_weights and output_bias (between L89 and L90 and between L91 and L92 in modeling.py here: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L90 and https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L92).
  • this should let you convert the tensorflow model in a pytorch one using the scripts.

I followed these instructions for the SequenceClassification model but I still end up getting the same error for 'BertForSequenceClassification' object has no attribute 'bias'.

Update for latest transformers, add modeling_bert.py:78:

    for name, array in zip(names, arrays):
        if name in ['output_weights', 'output_bias']:
            name = 'classifier/' + name

and convert_bert_original_tf_checkpoint_to_pytorch.py

config.num_labels = 2
    print("Building PyTorch model from configuration: {}".format(str(config)))
    model = BertForSequenceClassification(config)

you are my lifesaver @pertschuk Thank you for the instructions

glad they helped @Soonhwan-Kwon.

I used a similar reranking model as part of a project I just released which hooks in to Elasticsearch and reranks search results out of the box, check it out if this sounds like it would be useful! repo: https://github.com/koursaros-ai/nboost

You can create a subclass of BertForSequenceClassification and add self.weight and self.bias to the__init__ method. Then instantiate your new class and it is ready to use it:

class BertForPassageRanking(BertForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)
        self.weight = torch.autograd.Variable(torch.ones(2, config.hidden_size),
                                              requires_grad=True)
        self.bias = torch.autograd.Variable(torch.ones(2), requires_grad=True)


bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)

BERT_PASSAGE_RANKING_PATH is the path where your tf checkpoints files and config json file are stored. You will need to rename the files as follows:

config.json
model.ckpt.index
model.ckpt.meta

Another option if you do not want to change the file names is to load the json config file with BertConfig.from_json_file() and then pass to BertForPassageRanking.from_pretained() the path + ckpt file name and the configuration that you have already loaded with BertConfig.from_json_file() .

I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications.

https://huggingface.co/nboost

I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications.

https://huggingface.co/nboost

Hi, I have a question regarding the output of your models. In transformers library, the bert_base model (transformers.BertModel class) has as output a tuple, where the first element is the last hidden state and the 2nd element is the pooler output. The last hidden state is a tensor of size (batch_size, sequence_length, hidden_dim). For example for a batch size of 64 and 512 tokens we obtain for BERT an output of size (64x512x768). The pooler output has size (batch_size, hidden_size). This output is obtained training a linear layer with tanh activation function which had as input the CLS token hidden state (last layer hidden-state of the first oken of the sequence). Those weights have been trained from the next sentence prediction.

Your model follows similar structure, at least nboost/pt-biobert-base-msmarco. However, a passage re-ranking model is a sequence classification model. Basically, the passage re-ranking model proposed by https://github.com/nyu-dl/dl4marco-bert is the BERT model fine-tuned with a dense layer on top to learn to classify a sequence as relevant or not relevant. Their first element of the tuple output is a tensor of size (batch_size, num_classes), where num_classes is two (whether the sequence to classify is a relevant document).

How should we use your model for passage re-ranking?
Thanks a lot

I added passage pytorch msmarco reranking models to the huggingface / transformers bucket, no need for subclassing / modifications.
https://huggingface.co/nboost

Hi, I have a question regarding the output of your models. In transformers library, the bert_base model (transformers.BertModel class) has as output a tuple, where the first element is the last hidden state and the 2nd element is the pooler output. The last hidden state is a tensor of size (batch_size, sequence_length, hidden_dim). For example for a batch size of 64 and 512 tokens we obtain for BERT an output of size (64x512x768). The pooler output has size (batch_size, hidden_size). This output is obtained training a linear layer with tanh activation function which had as input the CLS token hidden state (last layer hidden-state of the first oken of the sequence). Those weights have been trained from the next sentence prediction.

Your model follows similar structure, at least nboost/pt-biobert-base-msmarco. However, a passage re-ranking model is a sequence classification model. Basically, the passage re-ranking model proposed by https://github.com/nyu-dl/dl4marco-bert is the BERT model fine-tuned with a dense layer on top to learn to classify a sequence as relevant or not relevant. Their first element of the tuple output is a tensor of size (batch_size, num_classes), where num_classes is two (whether the sequence to classify is a relevant document).

How should we use your model for passage re-ranking?
Thanks a lot

I found where was the problem. As pointed in the model's page (https://huggingface.co/nboost/pt-biobert-base-msmarco#) to load the model you have to do the following:

model = AutoModel.from_pretrained("nboost/pt-biobert-base-msmarco")
This creates as output a tuple where the first element is a tensor of size (64x512x768).

However, we should do the following, since our problem is a sequence classification:

model = AutoModelForSequenceClassification.from_pretrained("nboost/pt-biobert-base-msmarco")
This creates the correct output, a tuple where the first element is a tensor of size (batch_size, num_classes)

I suggest to the authors to change the model info and model card in https://huggingface.co/nboost/pt-biobert-base-msmarco#, since it is little bit confusing

You can create a subclass of BertForSequenceClassification and add self.weight and self.bias to the__init__ method. Then instantiate your new class and it is ready to use it:

class BertForPassageRanking(BertForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)
        self.weight = torch.autograd.Variable(torch.ones(2, config.hidden_size),
                                              requires_grad=True)
        self.bias = torch.autograd.Variable(torch.ones(2), requires_grad=True)


bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)

BERT_PASSAGE_RANKING_PATH is the path where your tf checkpoints files and config json file are stored. You will need to rename the files as follows:

config.json
model.ckpt.index
model.ckpt.meta

Another option if you do not want to change the file names is to load the json config file with BertConfig.from_json_file() and then pass to BertForPassageRanking.from_pretained() the path + ckpt file name and the configuration that you have already loaded with BertConfig.from_json_file() .

Thanks a lot. I was having the same question about 'nboost' and was trying this method. However, the output seems to change when I run the same code multiple times, even though i am in the eval mode. Do you have any hint about what I am doing wrong here?

bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)

dummy_query = [
    'Rutgers is a good university. I like my experience there.', 
    "Hello, my dog is cute. My cute dog is amazing.",
    'Florida is a nice place but tiger king may be better',
]

dummy_passage = [
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
]
bert_ranking.eval()
with torch.no_grad():
    for idx in range(len(dummy_query)):
        input_ids = torch.tensor(tokenizer.encode(text=dummy_query[idx], \
            text_pair=dummy_passage[idx], add_special_tokens=True)).unsqueeze(0)
        outputs = bert_ranking(input_ids)
        print(outputs)

Thanks a lot. I was having the same question about 'nboost' and was trying this method. However, the output seems to change when I run the same code multiple times, even though i am in the eval mode. Do you have any hint about what I am doing wrong here?

bert_ranking = BertForPassageRanking.from_pretrained(BERT_PASSAGE_RANKING_PATH,
                                                     from_tf=True)

dummy_query = [
    'Rutgers is a good university. I like my experience there.', 
    "Hello, my dog is cute. My cute dog is amazing.",
    'Florida is a nice place but tiger king may be better',
]

dummy_passage = [
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
    'My cat is really cute but my dog is even better.',
]
bert_ranking.eval()
with torch.no_grad():
    for idx in range(len(dummy_query)):
        input_ids = torch.tensor(tokenizer.encode(text=dummy_query[idx], \
            text_pair=dummy_passage[idx], add_special_tokens=True)).unsqueeze(0)
        outputs = bert_ranking(input_ids)
        print(outputs)

Sorry, I have no idea. Finally I am not using this approximation. I did not achieve good results for my purpose. Intead, I am using the model provided by nboost (https://huggingface.co/nboost/pt-tinybert-msmarco) and it works fine for me. Remember to load the model as follows:

model = AutoModelForSequenceClassification.from_pretrained("nboost/pt-tinybert-msmarco")

I am using tinybert-msmarco, however you can use one of the following models:

nboost/pt-bert-base-uncased-msmarco
nboost/pt-bert-large-msmarco
nboost/pt-biobert-base-msmarco
nboost/pt-tinybert-msmarco

Hi, I have fine tuned a multilingual model, taken from hugging face, on the passage reranking task. Now I am facing difficulties with converting the tensorflow checkpoint to a pytorch model, so that I can use the model using BertForSequenceClassification.
I am using the following conversion function, but I get this error

```
File "", line 1, in
convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path)

File "", line 63, in convert_tf_checkpoint_to_pytorch
assert pointer.shape == array.shape

File "/home/igli/anaconda3/envs/search-boost/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(

AttributeError: 'LayerNorm' object has no attribute 'shape'

The conversion method:

def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path):
config_path = os.path.abspath(bert_config_file)
tf_path = os.path.abspath(tf_checkpoint_path)
print("Converting TensorFlow checkpoint from {} with config at {}".format(tf_path, config_path))
# Load weights from TF model
init_vars = tf.train.list_variables(tf_path)
names = []
arrays = []
for name, shape in init_vars:
print("Loading TF weight {} with shape {}".format(name, shape))
array = tf.train.load_variable(tf_path, name)
names.append(name)
arrays.append(array)

    # Initialise PyTorch model
    config = BertConfig.from_json_file(bert_config_file)
    config.num_labels = 2

    print("Building PyTorch model from configuration: {}".format(str(config)))
    model = BertForSequenceClassification()(config=config)


    for name, array in zip(names, arrays):
        if name in ['output_weights' , 'output_bias']:
                name = 'classifier/' + name
        name = name.split('/')
        # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
        # which are not required for using pretrained model
        if name[-1] in ["adam_v", "adam_m"]:
            print("Skipping {}".format("/".join(name)))
            continue
        pointer = model

        for m_name in name:  

            if re.fullmatch(r'[A-Za-z]+_\d+', m_name):
                l = re.split(r'_(\d+)', m_name)
            else:
                l = [m_name]
            if l[0] == 'kernel':
                pointer = getattr(pointer, 'weight')
            elif l[0] == 'output_bias':
                pointer = getattr(pointer, 'bias')
                pointer = getattr(pointer, 'cls')
            elif l[0] == 'output_weights':
                pointer = getattr(pointer, 'weight')
                pointer = getattr(pointer, 'cls')       
            else:
                try:
                    pointer = getattr(pointer, l[0])
                except:
                    pass

            if len(l) >= 2:
                num = int(l[1])
                pointer = pointer[num]
        if m_name[-11:] == '_embeddings':
            pointer = getattr(pointer, 'weight')
        elif m_name == 'kernel':
            array = np.transpose(array)
        try:
            assert pointer.shape == array.shape
        except AssertionError as e:
            e.args += (pointer.shape, array.shape)
            raise
            #pass

        print("Initialize PyTorch weight {}".format(name))
        array = np.array(array)
        print(array)
        print(type(array))
        pointer.data = torch.from_numpy(array)

    # Save pytorch-model
    print("Save PyTorch model to {}".format(pytorch_dump_path))
    torch.save(model.state_dict(), pytorch_dump_path)

```
I have currently no clue, where the problem might be. Thanks in advanvce!

Was this page helpful?
0 / 5 - 0 ratings