Transformers: bert-large-uncased-whole-word-masking-finetuned-squad or BertForQuestionAnswering?

Created on 10 Oct 2019  ยท  11Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help


I'm trying to use the pre-trained model bert-large-uncased-whole-word-masking-finetuned-squad to get answer to a question from a text, and I'm able to run:

model = BertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
model.eval()

but what should I do next? There's some example code using BertForQuestionAnswering:

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  # Batch size 1
start_positions = torch.tensor([1])
end_positions = torch.tensor([3])
outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
loss, start_scores, end_scores = outputs[:2]

But when I try the code above, I get the following error:

I1009 23:26:51.743415 4495961408 modeling_utils.py:337] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at /Users/ailabby/.cache/torch/transformers/aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
I1009 23:26:54.848274 4495961408 modeling_utils.py:405] Weights of BertForQuestionAnswering not initialized from pretrained model: ['qa_outputs.weight', 'qa_outputs.bias']
I1009 23:26:54.848431 4495961408 modeling_utils.py:408] Weights from pretrained model not used in BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-48-0738102265a4> in <module>
      5 end_positions = torch.tensor([3])
      6 outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
----> 7 loss, start_scores, end_scores = outputs[:2]

ValueError: not enough values to unpack (expected 3, got 2)

Should I use the pre-trained model bert-large-uncased-whole-word-masking-finetuned-squad or the BertForQuestionAnswering class, or both, to input a text and question and get an answer? Thanks for the help!

wontfix

Most helpful comment

OK after a lot of reading and testing, I got my final complete little working program that ends up using bert-large-uncased-whole-word-masking-finetuned-squad with BertForQuestionAnswering:

import torch
from transformers import BertTokenizer, BertForQuestionAnswering

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
input_text = "[CLS] " + question + " [SEP] " + text + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))] 

start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)  
print(' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]))
# a nice puppet

Thanks huggingface for the cool stuff, although your documentation could be cooler :)

All 11 comments

Hey @jeffxtang in your last line you are asking for 3 outputs, but only index from [:2]. You need to change it to

loss, start_scores, end_scores = outputs[:3]

The documentation is off in that example. As for your last question, I don't entirely understand it; however, BertForQuestionAnswering is the architecture you are using and bert-large-uncased-whole-word-masking-finetuned-squad is the weights (fine tuned on Squad 1.1) you are using in that architecture.

Hope that helps!

Thanks @cformosa ! My bad, I should've checked the value of outputs instead of just asking for help :)

So my last question is how I can use the Bert model fine tuned on Squad in Python the same way as it's used in iOS, which expects a text and a question as input then outputs a possible answer from the text. From your answer, BertForQuestionAnswering uses the pre-trained finetuned-on-squad weights so I should be able to just use the BertForQuestionAnswering class?

I think I'm getting closer to the solution - the code below returns predictions with shape [1, 14, 1024]:

model = BertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
model.eval()

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a nice puppet [SEP]"
tokenized_text = tokenizer.tokenize(text)

indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]

tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])

with torch.no_grad():
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
    predictions = outputs[0]

So the model with the pre-trained weights bert-large-uncased-whole-word-masking-finetuned-squad gets an input with the question "Who was Jim Henson ?" and the text "Jim Henson was a nice puppet" and outputs info that can be used to get the "a nice puppet" answer's indexes (10 and 12) from the text value in the code. But why 1024 in the predictions's shape? (14 is the length of the text) I think I'd use argmax on predictions to find out the begin and end indexes of the answer, but how exactly? Thanks!

OK after a lot of reading and testing, I got my final complete little working program that ends up using bert-large-uncased-whole-word-masking-finetuned-squad with BertForQuestionAnswering:

import torch
from transformers import BertTokenizer, BertForQuestionAnswering

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
input_text = "[CLS] " + question + " [SEP] " + text + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))] 

start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)  
print(' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]))
# a nice puppet

Thanks huggingface for the cool stuff, although your documentation could be cooler :)

Yes we are always a bit behind on documentation, just too many projects at the same time.

If you want to submit a PR fixing this part of the documentation that you noticed was wrong, that would be the most awesome thing!

Totally understandable :) and would love to do a PR, but first, I'd like to understand whether what I did is THE right way or one of the right ways to use the bert-large-uncased-whole-word-masking-finetuned-squad model.

To be more specific: Can I use also model = BertModel.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad') to get the right start_score and end_score? Or dp I have to use model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')?

Use BertForQuestionAnswering, otherwise your model will not initialize its final span classification layer.

OK after a lot of reading and testing, I got my final complete little working program that ends up using bert-large-uncased-whole-word-masking-finetuned-squad with BertForQuestionAnswering:

import torch
from transformers import BertTokenizer, BertForQuestionAnswering

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
input_text = "[CLS] " + question + " [SEP] " + text + " [SEP]"
input_ids = tokenizer.encode(input_text)
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))] 

start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)  
print(' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]))
# a nice puppet

Thanks huggingface for the cool stuff, although your documentation could be cooler :)

@jeffxtang , thanks for sharing this.
There may be an issue with your output. For instance, question, text = "Was Jim Henson a nice puppet?", "Jim Henson was a nice puppet". You answer text could be part of question, because you are using the start_scores/end_scores of all_tokens. It is possible that highest score is within the question.

Thanks.
Luke

Thanks @luke4u but I think that's what the Squad-fine-tuned Bert model is supposed to do - its iOS version also returns "Jim Henson was a nice puppet" for the question "Was Jim Henson a nice puppet?", although ideally the answer should be simply "yes". My understanding is that answers returned by the model always have the highest start and end scores located in the text (not the question) - maybe @thomwolf or @julien-c can please verify this?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lcswillems picture lcswillems  ยท  3Comments

fabiocapsouza picture fabiocapsouza  ยท  3Comments

alphanlp picture alphanlp  ยท  3Comments

yspaik picture yspaik  ยท  3Comments

hsajjad picture hsajjad  ยท  3Comments