Transformers: BertForNextSentencePrediction is giving high score for non similar sentences .

Created on 11 Nov 2019 · 8Comments · Source: huggingface/transformers

❓ Questions & Help

import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction
tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
BertNSP=BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

text1 = "How old are you?"
text2 = "The Eiffel Tower is in Paris"

text1_toks = ["[CLS]"] + tokenizer.tokenize(text1) + ["[SEP]"]
text2_toks = tokenizer.tokenize(text2) + ["[SEP]"]
text=text1_toks+text2_toks
print(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(text1_toks + text2_toks)
segments_ids = [0]len(text1_toks) + [1]len(text2_toks)

tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print(indexed_tokens)
print(segments_ids)
BertNSP.eval()
prediction = BertNSP(tokens_tensor, segments_tensors)
prediction=prediction[0] # tuple to tensor
print(predictions)
softmax = torch.nn.Softmax(dim=1)
prediction_sm = softmax(prediction)
print (prediction_sm)

o/p of predictions
tensor([[ 2.1772, -0.8097]], grad_fn=)

o/p of prediction_sm
tensor([[0.9923, 0.0077]], grad_fn=)

why is the score still high 0.9923 even after apply softmax ?

wontfix

Source

ajbot2019

👀1

Most helpful comment

As explained in #1790, you're passing the token_type_ids as the attention mask. Change the model forward pass as such:

prediction = BertNSP(tokens_tensor, token_type_ids=segments_tensors)

Your results will be more accurate:

tensor([[-2.3808,  5.4018]], grad_fn=<AddmmBackward>)
tensor([[4.1673e-04, 9.9958e-01]], grad_fn=<SoftmaxBackward>)

LysandreJik on 11 Nov 2019

👍3

All 8 comments

As explained in #1790, you're passing the token_type_ids as the attention mask. Change the model forward pass as such:

prediction = BertNSP(tokens_tensor, token_type_ids=segments_tensors)

Your results will be more accurate:

tensor([[-2.3808,  5.4018]], grad_fn=<AddmmBackward>)
tensor([[4.1673e-04, 9.9958e-01]], grad_fn=<SoftmaxBackward>)

LysandreJik on 11 Nov 2019

👍3

@LysandreJik thanks for the information .Dose this apply for all the bert models or only for the next sentence prediction alone ?

ajbot2019 on 19 Nov 2019

It would be better to use named arguments in all the models. They are bound to change with breaking changes as new versions come up.

I recommend specifying the arguments' names when possible.

LysandreJik on 20 Nov 2019

As explained in #1790, you're passing the token_type_ids as the attention mask. Change the model forward pass as such:
prediction = BertNSP(tokens_tensor, token_type_ids=segments_tensors)
Your results will be more accurate:
tensor([[-2.3808,  5.4018]], grad_fn=<AddmmBackward>)
tensor([[4.1673e-04, 9.9958e-01]], grad_fn=<SoftmaxBackward>)

Hi,
@LysandreJik Can you explain me what are the scores for?
What is the 4.1673e-04 used for and what is the 9.9958e-01 used for?
Which one of them says that: X% sure that A sentence is followed by B sentence ? And what is the other for?

Thanks!

tothniki on 29 Nov 2019

👍1

I'm having the same problem, and I think I've followed all the directions mentioned above.

def line_continues(model, tokenizer, line1, line2):
line1 = [tokenizer.cls_token] + tokenizer.tokenize(line1) + [tokenizer.sep_token]
line2 = tokenizer.tokenize(line2) + [tokenizer.sep_token]
input_idx = tokenizer.convert_tokens_to_ids(line1 + line2)
segment_idx = [0]len(line1) + [1]len(line2)
tokens_tensor = torch.tensor([input_idx])
segment_tensor = torch.tensor([segment_idx])
predictions = model(tokens_tensor, token_type_ids=segment_tensor)
probs = F.softmax(predictions[0], dim=1)
return probs[0][0]

model = BertForNextSentencePrediction.from_pretrained('bert-base-cased')
model.eval()

random sentences:

line1='these articles tell us about where leadership communication is going and where it'
line2='issues gave us the chance to engage with many well-established and emerging experts'
prob = line_continues(model, tokenizer, line1, line2)
0.9993

contiguous sentences:

line1='these articles tell us about where leadership communication is going and where it'
line2='needs to go in addition to using the model.'
prob = line_continues(model, tokenizer, line1, line2)
0.9991

Thanks!

wcollins-ebsco on 15 Jan 2020

I am also experiencing this kind of issue. After experimenting with some sequence pairs, I think the relation between two sequence should be zero (and also non-sensical). For example:

Sent1: Paris is the capital of France.
Sent2: Cow is a domestic animal.

This level of stupid sequence.
if there is any slightest connection on word level, it outputs that it is 90% sure that two sequence is coherent.
In your example, may be leadership and experts are two near words in semantic space.

mainulquraishi on 25 Feb 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 25 Apr 2020

This is still a problem and should be re-opened. From the documentation: https://huggingface.co/transformers/model_doc/bert.html#bertfornextsentenceprediction

# documentation example - good
In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced.
The sky is blue due to the shorter wavelength of blue light.
logits[0, 0]: -3.072946548461914, logits[0, 1]: 5.905644416809082, is random: True

# my own example - ???
I took my money to the bank on 23rd street
My monkey was cake and cockroaches have radiation
logits[0, 0]: 3.0128183364868164, logits[0, 1]: -1.984398365020752, is random: False

I'm not sure how to interpret this.