Bert: how use BERT language model to predict next word

Created on 2 Jan 2019 · 13Comments · Source: google-research/bert

Can someone provide an example of code that takes in a sequence of words and, using the BERT language model, predicts what the most likely next word is?

Source

cvenour

👍14

Most helpful comment

hi @jacobdevlin-google - would you be able to answer the above? A bunch of us are wondering how to, given a sequence of tokens, get BERT to predict what the next token should be.

cvenour on 14 Jan 2019

👍9 🚀1

All 13 comments

I'm also looking for something similar as in #339

CapitalZe on 7 Jan 2019

🎉1

hi @jacobdevlin-google - would you be able to answer the above? A bunch of us are wondering how to, given a sequence of tokens, get BERT to predict what the next token should be.

cvenour on 14 Jan 2019

👍9 🚀1

I would suggest adding a "target" (it's a word, literally "target") to your sentence in the end, and mask it.

sueqian6 on 12 Mar 2019

👍3 😕2

@sueqian6 what do you think of this approach:

For a sentence like "the quick brown fox jumps over the lazy dog", we just input to the model "the quick brown fox jumps over the lazy" and have it predict "dog".

hsm207 on 12 Mar 2019

If you add a word after “lazy” and mask the word you can predict it. It doesn’t matter what word you choose because it is masked.
On Mar 12, 2019, 4:08 PM -0400, hsm207 notifications@github.com, wrote:

@sueqian6 what do you think of this approach:
For a sentence like "the quick brown fox jumps over the lazy dog", we just input to the model "the quick brown fox jumps over the lazy" and have it predict "dog".
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

sueqian6 on 13 Mar 2019

❤1 👍1

I guess this means that for any word different than the last one, you'd need to apply the same approach. This means providing a sufficient part of the sentence, in the common case including also its ending. I am not sure what percentage of the sentence "sufficient" means.

mapto on 13 Mar 2019

👍1

Same question. Thank you.

guotong1988 on 14 Mar 2019

for anyone interested in the code:
(gender is the name of my dataset. it is a df
target is the word to predict
Seq is sentence)
for i in range(0,len(gender)): gender.text_bert[i]=gender.Seq[i]+" "+gender.target[i] tokenized_text = tokenizer.tokenize(gender.text_bert[i]) mask_idx = len(tokenizer.tokenize(gender.Seq[i]+" "+gender.target[i]))-1 tokenized_text[mask_idx] = mask_tok tok_idxs = tokenizer.convert_tokens_to_ids(tokenized_text) seg_idxs = [0] * len(tok_idxs) tok_tensor = torch.tensor([tok_idxs]) seg_tensor = torch.tensor([seg_idxs]) preds = model(tok_tensor, seg_tensor) pred_idx = torch.argmax(preds[0, mask_idx]).item() pred_tok = tokenizer.convert_ids_to_tokens([pred_idx])[0] gender.label[i]=pred_tok

sueqian6 on 14 Mar 2019

👍4

BERT is not really a language model: https://arxiv.org/pdf/1904.09408.pdf or https://github.com/google-research/bert/issues/139

Shujian2015 on 22 Jun 2019

👍1

I would suggest adding a "target" (it's a word, literally "target") to your sentence in the end, and mask it.

Seems to be good. but since BERT is a bidirectional it will consider complete sentence as input, in autosuggestion case it should only consider the previous words right ?

Does it gave good results.. interesting to know ..

wickkiey on 24 Jun 2019

I’m not sure I understand the question correctly. If you want to predict the last word of a sequence, and have Bert take into consideration only the sequence before the target, you can use this sequence as the input.
On Jun 23, 2019, 11:18 PM -0700, Vivek Ananthan notifications@github.com, wrote:

I would suggest adding a "target" (it's a word, literally "target") to your sentence in the end, and mask it.
Seems to be good. but since BERT is a bidirectional it will consider complete sentence as input, in autosuggestion case it should only consider the previous words right ?
Does it gave good results.. interesting to know ..
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

sueqian6 on 24 Jun 2019

👍1

It iss the NEXT word - which is not necessarily the LAST one - to predict in a sentence.
Here is a same discussion:
https://datascience.stackexchange.com/questions/46377/can-bert-do-the-next-word-predict-task

I’m not sure I understand the question correctly. If you want to predict the last word of a sequence, and have Bert take into consideration only the sequence before the target, you can use this sequence as the input.
…
On Jun 23, 2019, 11:18 PM -0700, Vivek Ananthan @.*>, wrote: > I would suggest adding a "target" (it's a word, literally "target") to your sentence in the end, and mask it. Seems to be good. but since BERT is a bidirectional it will consider complete sentence as input, in autosuggestion case it should only consider the previous words right ? Does it gave good results.. interesting to know .. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.