Transformers: BertForNextSentencePrediction is giving high score for non similar sentences .

Created on 11 Nov 2019  ยท  8Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help


import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction
tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
BertNSP=BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

text1 = "How old are you?"
text2 = "The Eiffel Tower is in Paris"

text1_toks = ["[CLS]"] + tokenizer.tokenize(text1) + ["[SEP]"]
text2_toks = tokenizer.tokenize(text2) + ["[SEP]"]
text=text1_toks+text2_toks
print(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(text1_toks + text2_toks)
segments_ids = [0]len(text1_toks) + [1]len(text2_toks)

tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print(indexed_tokens)
print(segments_ids)
BertNSP.eval()
prediction = BertNSP(tokens_tensor, segments_tensors)
prediction=prediction[0] # tuple to tensor
print(predictions)
softmax = torch.nn.Softmax(dim=1)
prediction_sm = softmax(prediction)
print (prediction_sm)

o/p of predictions
tensor([[ 2.1772, -0.8097]], grad_fn=)

o/p of prediction_sm
tensor([[0.9923, 0.0077]], grad_fn=)

why is the score still high 0.9923 even after apply softmax ?

wontfix

Most helpful comment

As explained in #1790, you're passing the token_type_ids as the attention mask. Change the model forward pass as such:

prediction = BertNSP(tokens_tensor, token_type_ids=segments_tensors)

Your results will be more accurate:

tensor([[-2.3808,  5.4018]], grad_fn=<AddmmBackward>)
tensor([[4.1673e-04, 9.9958e-01]], grad_fn=<SoftmaxBackward>)

All 8 comments

As explained in #1790, you're passing the token_type_ids as the attention mask. Change the model forward pass as such:

prediction = BertNSP(tokens_tensor, token_type_ids=segments_tensors)

Your results will be more accurate:

tensor([[-2.3808,  5.4018]], grad_fn=<AddmmBackward>)
tensor([[4.1673e-04, 9.9958e-01]], grad_fn=<SoftmaxBackward>)

@LysandreJik thanks for the information .Dose this apply for all the bert models or only for the next sentence prediction alone ?

It would be better to use named arguments in all the models. They are bound to change with breaking changes as new versions come up.

I recommend specifying the arguments' names when possible.

As explained in #1790, you're passing the token_type_ids as the attention mask. Change the model forward pass as such:

prediction = BertNSP(tokens_tensor, token_type_ids=segments_tensors)

Your results will be more accurate:

tensor([[-2.3808,  5.4018]], grad_fn=<AddmmBackward>)
tensor([[4.1673e-04, 9.9958e-01]], grad_fn=<SoftmaxBackward>)

Hi,
@LysandreJik Can you explain me what are the scores for?
What is the 4.1673e-04 used for and what is the 9.9958e-01 used for?
Which one of them says that: X% sure that A sentence is followed by B sentence ? And what is the other for?

Thanks!

I'm having the same problem, and I think I've followed all the directions mentioned above.

def line_continues(model, tokenizer, line1, line2):
line1 = [tokenizer.cls_token] + tokenizer.tokenize(line1) + [tokenizer.sep_token]
line2 = tokenizer.tokenize(line2) + [tokenizer.sep_token]
input_idx = tokenizer.convert_tokens_to_ids(line1 + line2)
segment_idx = [0]len(line1) + [1]len(line2)
tokens_tensor = torch.tensor([input_idx])
segment_tensor = torch.tensor([segment_idx])
predictions = model(tokens_tensor, token_type_ids=segment_tensor)
probs = F.softmax(predictions[0], dim=1)
return probs[0][0]

model = BertForNextSentencePrediction.from_pretrained('bert-base-cased')
model.eval()

random sentences:

line1='these articles tell us about where leadership communication is going and where it'
line2='issues gave us the chance to engage with many well-established and emerging experts'
prob = line_continues(model, tokenizer, line1, line2)
0.9993

contiguous sentences:

line1='these articles tell us about where leadership communication is going and where it'
line2='needs to go in addition to using the model.'
prob = line_continues(model, tokenizer, line1, line2)
0.9991

Thanks!

I am also experiencing this kind of issue. After experimenting with some sequence pairs, I think the relation between two sequence should be zero (and also non-sensical). For example:

Sent1: Paris is the capital of France.
Sent2: Cow is a domestic animal. 

This level of stupid sequence.
if there is any slightest connection on word level, it outputs that it is 90% sure that two sequence is coherent.
In your example, may be leadership and experts are two near words in semantic space.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This is still a problem and should be re-opened. From the documentation: https://huggingface.co/transformers/model_doc/bert.html#bertfornextsentenceprediction

# documentation example - good
In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced.
The sky is blue due to the shorter wavelength of blue light.
logits[0, 0]: -3.072946548461914, logits[0, 1]: 5.905644416809082, is random: True

# my own example - ???
I took my money to the bank on 23rd street
My monkey was cake and cockroaches have radiation
logits[0, 0]: 3.0128183364868164, logits[0, 1]: -1.984398365020752, is random: False

I'm not sure how to interpret this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

quocnle picture quocnle  ยท  3Comments

0x01h picture 0x01h  ยท  3Comments

lemonhu picture lemonhu  ยท  3Comments

adigoryl picture adigoryl  ยท  3Comments

siddsach picture siddsach  ยท  3Comments