Transformers: using BERT as a language Model

Created on 19 Nov 2018 · 10Comments · Source: huggingface/transformers

I was trying to use BERT as a language model to assign a score(could be PPL score) of a given sentence. Something like
P("He is go to school")=0.008
P("He is going to school")=0.08
Which is indicating that the probability of second sentence is higher than first sentence. Is there a way to get a score like this?

Thanks

Source

mdasadul

Most helpful comment

I don't think you can do that with Bert. The masked LM loss is not a Language Modeling loss, it doesn't work nicely with the chain rule like the usual Language Modeling loss.
Please see the discussion on the TensorFlow repo on that here.

thomwolf on 20 Nov 2018

👍7

All 10 comments

thomwolf on 20 Nov 2018

👍7

Hello @thomwolf I can see it is possible to assign score by using BERT . By masking each word sequentially. Then score sentence by summary of word score. Here is how people were doing it for Tensorflow. I am trying to do following

import numpy as np
import torch
from pytorch_pretrained_bert import BertTokenizer,BertForMaskedLM
# Load pre-trained model (weights)
with torch.no_grad():
    model = BertForMaskedLM.from_pretrained('bert-large-cased')
    model.eval()
    # Load pre-trained model tokenizer (vocabulary)
    tokenizer = BertTokenizer.from_pretrained('bert-large-cased')
def score(sentence):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    sentence_loss=0.
    for i,word in enumerate(tokenize_input):

        tokenize_input[i]='[MASK]'
        mask_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
        word_loss=model(mask_input, masked_lm_labels=tensor_input).data.numpy()
        sentence_loss +=word_loss
        #print("Word: %s : %f"%(word, np.exp(-word_loss)))

    return np.exp(sentence_loss/len(tokenize_input))

score("There is a book on the table")
88.899999

Is it the right way to assign score using BERT?

mdasadul on 11 Apr 2019

👍1

import numpy as np
import torch
from pytorch_pretrained_bert import BertTokenizer,BertForMaskedLM
# Load pre-trained model (weights)
with torch.no_grad():
    model = BertForMaskedLM.from_pretrained('bert-large-cased')
    model.eval()
    # Load pre-trained model tokenizer (vocabulary)
    tokenizer = BertTokenizer.from_pretrained('bert-large-cased')
def score(sentence):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    sentence_loss=0.
    for i,word in enumerate(tokenize_input):

        tokenize_input[i]='[MASK]'
        mask_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
        word_loss=model(mask_input, masked_lm_labels=tensor_input).data.numpy()
        sentence_loss +=word_loss
        #print("Word: %s : %f"%(word, np.exp(-word_loss)))

    return np.exp(sentence_loss/len(tokenize_input))

score("There is a book on the table")
88.899999

Is it the right way to assign score using BERT?

no, you masked word but not restore.

zhangyichang on 30 May 2019

👍2

@mdasadul Did you managed to do it?

orko19 on 27 May 2020

Yes please check my tweet on this @mdasaduluofa

On Wed, May 27, 2020, 1:37 PM orko19 notifications@github.com wrote:

@mdasadul https://github.com/mdasadul Did you managed to do it?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/37#issuecomment-634485380,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5DO5N2MGF6QCTAZ3L3NITRTS7J3ANCNFSM4GFFKJJA
.

mdasadul on 27 May 2020

@mdasadul Do you mean this one?
https://twitter.com/mdasaduluofa/status/1181917072999231489/photo/1
I see this it for GPT-2, do you have a code for BERT?

orko19 on 27 May 2020

It should be similar. Following code is for distilBert
```import math
from torch.multiprocessing import TimeoutError, Pool,set_start_method,Queue
import torch.multiprocessing as mp
import torch
from transformers import DistilBertTokenizer,DistilBertForMaskedLM
from flask import Flask,request
import json

try:
set_start_method('spawn')
except RuntimeError:
pass

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def load_model():
model = DistilBertForMaskedLM.from_pretrained('distilbert-base-uncased').to(device)
model.eval()
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
return tokenizer, model

tokenizer, model =load_model()

st.text('Done!')

def score(sentence):
if len(sentence.strip().split())<=1 : return 10000
tokenize_input = tokenizer.tokenize(sentence)
if len(tokenize_input)>512: return 10000
input_ids = torch.tensor(tokenizer.encode(tokenize_input)).unsqueeze(0).to(device)
with torch.no_grad():
loss=model(input_ids,masked_lm_labels = input_ids)[0]
return math.exp(loss.item()/len(tokenize_input))```

mdasadul on 1 Jun 2020

🎉2

@mdasadul I get the error:
TypeError: forward() got an unexpected keyword argument 'masked_lm_labels'
Also, can you please explain why for following steps are necessary:

unsqueeze(0)
add torch.no_grad()
add model.eval()

orko19 on 1 Jun 2020

The score is equivalent to perplexity. Hence lower the score better the sentence, right?

nlp-sudo on 7 Jul 2020

Yes that is right
Md Asadul Islam
Machine Learning Engineer
Scribendi Inc

On Mon, Jul 6, 2020 at 11:54 PM nlp-sudo notifications@github.com wrote:

The score is equivalent to perplexity. Hence lower the score better the
sentence, right?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/37#issuecomment-654618996,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5DO5KTBQJEEM7J72TCH2LR2K2AVANCNFSM4GFFKJJA
.