I was trying to use BERT as a language model to assign a score(could be PPL score) of a given sentence. Something like
P("He is go to school")=0.008
P("He is going to school")=0.08
Which is indicating that the probability of second sentence is higher than first sentence. Is there a way to get a score like this?
Thanks
I don't think you can do that with Bert. The masked LM loss is not a Language Modeling loss, it doesn't work nicely with the chain rule like the usual Language Modeling loss.
Please see the discussion on the TensorFlow repo on that here.
Hello @thomwolf I can see it is possible to assign score by using BERT . By masking each word sequentially. Then score sentence by summary of word score. Here is how people were doing it for Tensorflow. I am trying to do following
import numpy as np
import torch
from pytorch_pretrained_bert import BertTokenizer,BertForMaskedLM
# Load pre-trained model (weights)
with torch.no_grad():
model = BertForMaskedLM.from_pretrained('bert-large-cased')
model.eval()
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-large-cased')
def score(sentence):
tokenize_input = tokenizer.tokenize(sentence)
tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
sentence_loss=0.
for i,word in enumerate(tokenize_input):
tokenize_input[i]='[MASK]'
mask_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
word_loss=model(mask_input, masked_lm_labels=tensor_input).data.numpy()
sentence_loss +=word_loss
#print("Word: %s : %f"%(word, np.exp(-word_loss)))
return np.exp(sentence_loss/len(tokenize_input))
score("There is a book on the table")
88.899999
Is it the right way to assign score using BERT?
Hello @thomwolf I can see it is possible to assign score by using BERT . By masking each word sequentially. Then score sentence by summary of word score. Here is how people were doing it for Tensorflow. I am trying to do following
import numpy as np import torch from pytorch_pretrained_bert import BertTokenizer,BertForMaskedLM # Load pre-trained model (weights) with torch.no_grad(): model = BertForMaskedLM.from_pretrained('bert-large-cased') model.eval() # Load pre-trained model tokenizer (vocabulary) tokenizer = BertTokenizer.from_pretrained('bert-large-cased') def score(sentence): tokenize_input = tokenizer.tokenize(sentence) tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)]) sentence_loss=0. for i,word in enumerate(tokenize_input): tokenize_input[i]='[MASK]' mask_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)]) word_loss=model(mask_input, masked_lm_labels=tensor_input).data.numpy() sentence_loss +=word_loss #print("Word: %s : %f"%(word, np.exp(-word_loss))) return np.exp(sentence_loss/len(tokenize_input))score("There is a book on the table") 88.899999Is it the right way to assign score using BERT?
no, you masked word but not restore.
@mdasadul Did you managed to do it?
Yes please check my tweet on this @mdasaduluofa
On Wed, May 27, 2020, 1:37 PM orko19 notifications@github.com wrote:
@mdasadul https://github.com/mdasadul Did you managed to do it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/37#issuecomment-634485380,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5DO5N2MGF6QCTAZ3L3NITRTS7J3ANCNFSM4GFFKJJA
.
@mdasadul Do you mean this one?
https://twitter.com/mdasaduluofa/status/1181917072999231489/photo/1
I see this it for GPT-2, do you have a code for BERT?
It should be similar. Following code is for distilBert
```import math
from torch.multiprocessing import TimeoutError, Pool,set_start_method,Queue
import torch.multiprocessing as mp
import torch
from transformers import DistilBertTokenizer,DistilBertForMaskedLM
from flask import Flask,request
import json
try:
set_start_method('spawn')
except RuntimeError:
pass
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def load_model():
model = DistilBertForMaskedLM.from_pretrained('distilbert-base-uncased').to(device)
model.eval()
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
return tokenizer, model
tokenizer, model =load_model()
def score(sentence):
if len(sentence.strip().split())<=1 : return 10000
tokenize_input = tokenizer.tokenize(sentence)
if len(tokenize_input)>512: return 10000
input_ids = torch.tensor(tokenizer.encode(tokenize_input)).unsqueeze(0).to(device)
with torch.no_grad():
loss=model(input_ids,masked_lm_labels = input_ids)[0]
return math.exp(loss.item()/len(tokenize_input))```
@mdasadul I get the error:
TypeError: forward() got an unexpected keyword argument 'masked_lm_labels'
Also, can you please explain why for following steps are necessary:
unsqueeze(0)torch.no_grad()model.eval()The score is equivalent to perplexity. Hence lower the score better the sentence, right?
Yes that is right
Md Asadul Islam
Machine Learning Engineer
Scribendi Inc
On Mon, Jul 6, 2020 at 11:54 PM nlp-sudo notifications@github.com wrote:
The score is equivalent to perplexity. Hence lower the score better the
sentence, right?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/37#issuecomment-654618996,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5DO5KTBQJEEM7J72TCH2LR2K2AVANCNFSM4GFFKJJA
.
Most helpful comment
I don't think you can do that with Bert. The masked LM loss is not a Language Modeling loss, it doesn't work nicely with the chain rule like the usual Language Modeling loss.
Please see the discussion on the TensorFlow repo on that here.