Transformers: using BERT as a language Model

Created on 19 Nov 2018  Â·  10Comments  Â·  Source: huggingface/transformers

I was trying to use BERT as a language model to assign a score(could be PPL score) of a given sentence. Something like
P("He is go to school")=0.008
P("He is going to school")=0.08
Which is indicating that the probability of second sentence is higher than first sentence. Is there a way to get a score like this?

Thanks

Most helpful comment

I don't think you can do that with Bert. The masked LM loss is not a Language Modeling loss, it doesn't work nicely with the chain rule like the usual Language Modeling loss.
Please see the discussion on the TensorFlow repo on that here.

All 10 comments

I don't think you can do that with Bert. The masked LM loss is not a Language Modeling loss, it doesn't work nicely with the chain rule like the usual Language Modeling loss.
Please see the discussion on the TensorFlow repo on that here.

Hello @thomwolf I can see it is possible to assign score by using BERT . By masking each word sequentially. Then score sentence by summary of word score. Here is how people were doing it for Tensorflow. I am trying to do following

import numpy as np
import torch
from pytorch_pretrained_bert import BertTokenizer,BertForMaskedLM
# Load pre-trained model (weights)
with torch.no_grad():
    model = BertForMaskedLM.from_pretrained('bert-large-cased')
    model.eval()
    # Load pre-trained model tokenizer (vocabulary)
    tokenizer = BertTokenizer.from_pretrained('bert-large-cased')
def score(sentence):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    sentence_loss=0.
    for i,word in enumerate(tokenize_input):

        tokenize_input[i]='[MASK]'
        mask_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
        word_loss=model(mask_input, masked_lm_labels=tensor_input).data.numpy()
        sentence_loss +=word_loss
        #print("Word: %s : %f"%(word, np.exp(-word_loss)))

    return np.exp(sentence_loss/len(tokenize_input))

score("There is a book on the table")
88.899999

Is it the right way to assign score using BERT?

Hello @thomwolf I can see it is possible to assign score by using BERT . By masking each word sequentially. Then score sentence by summary of word score. Here is how people were doing it for Tensorflow. I am trying to do following

import numpy as np
import torch
from pytorch_pretrained_bert import BertTokenizer,BertForMaskedLM
# Load pre-trained model (weights)
with torch.no_grad():
    model = BertForMaskedLM.from_pretrained('bert-large-cased')
    model.eval()
    # Load pre-trained model tokenizer (vocabulary)
    tokenizer = BertTokenizer.from_pretrained('bert-large-cased')
def score(sentence):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    sentence_loss=0.
    for i,word in enumerate(tokenize_input):

        tokenize_input[i]='[MASK]'
        mask_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
        word_loss=model(mask_input, masked_lm_labels=tensor_input).data.numpy()
        sentence_loss +=word_loss
        #print("Word: %s : %f"%(word, np.exp(-word_loss)))

    return np.exp(sentence_loss/len(tokenize_input))
score("There is a book on the table")
88.899999

Is it the right way to assign score using BERT?

no, you masked word but not restore.

@mdasadul Did you managed to do it?

Yes please check my tweet on this @mdasaduluofa

On Wed, May 27, 2020, 1:37 PM orko19 notifications@github.com wrote:

@mdasadul https://github.com/mdasadul Did you managed to do it?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/37#issuecomment-634485380,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5DO5N2MGF6QCTAZ3L3NITRTS7J3ANCNFSM4GFFKJJA
.

@mdasadul Do you mean this one?
https://twitter.com/mdasaduluofa/status/1181917072999231489/photo/1
I see this it for GPT-2, do you have a code for BERT?

It should be similar. Following code is for distilBert
```import math
from torch.multiprocessing import TimeoutError, Pool,set_start_method,Queue
import torch.multiprocessing as mp
import torch
from transformers import DistilBertTokenizer,DistilBertForMaskedLM
from flask import Flask,request
import json

try:
set_start_method('spawn')
except RuntimeError:
pass

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def load_model():
model = DistilBertForMaskedLM.from_pretrained('distilbert-base-uncased').to(device)
model.eval()
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
return tokenizer, model

tokenizer, model =load_model()

st.text('Done!')

def score(sentence):
if len(sentence.strip().split())<=1 : return 10000
tokenize_input = tokenizer.tokenize(sentence)
if len(tokenize_input)>512: return 10000
input_ids = torch.tensor(tokenizer.encode(tokenize_input)).unsqueeze(0).to(device)
with torch.no_grad():
loss=model(input_ids,masked_lm_labels = input_ids)[0]
return math.exp(loss.item()/len(tokenize_input))```

@mdasadul I get the error:
TypeError: forward() got an unexpected keyword argument 'masked_lm_labels'
Also, can you please explain why for following steps are necessary:

  1. unsqueeze(0)
  2. add torch.no_grad()
  3. add model.eval()

The score is equivalent to perplexity. Hence lower the score better the sentence, right?

Yes that is right
Md Asadul Islam
Machine Learning Engineer
Scribendi Inc

On Mon, Jul 6, 2020 at 11:54 PM nlp-sudo notifications@github.com wrote:

The score is equivalent to perplexity. Hence lower the score better the
sentence, right?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/37#issuecomment-654618996,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5DO5KTBQJEEM7J72TCH2LR2K2AVANCNFSM4GFFKJJA
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rsanjaykamath picture rsanjaykamath  Â·  3Comments

hsajjad picture hsajjad  Â·  3Comments

lcswillems picture lcswillems  Â·  3Comments

zhezhaoa picture zhezhaoa  Â·  3Comments

lemonhu picture lemonhu  Â·  3Comments