Feature/Question: With GPT-2 is it possible to get previous word prediction?
Hi,
I say this after seeing this https://towardsdatascience.com/deconstructing-bert-distilling-6-patterns-from-100-million-parameters-b49113672f77
And wondering how I could maybe write a method that would allow me to predict the previous word? (ideally for GPT2)
Many thanks,
Vince.
Hi! There is one big difference between BERT and GPT-2, in that BERT is trained using masked language modeling, whereas GPT-2 is trained using causal language modeling.
During pre-training, BERT learns to predict masked words given a bi-directional context. GPT-2, on the other hand, learns to predict a word given only its left context. This is why GPT-2 is very good at text generation (it only needs the left-hand side context), while BERT isn't.
Given this, GPT-2 won't be able to do previous word prediction, as it does not handle the right-hand side context.
If you want to train your own GPT-2 model to predict previous words, you could feed in your entire training set in reverse word order. Then GPT-2 would learn to predict text backwards, and that model would then be able to tell you what word should come before a piece of text.
Most helpful comment
Hi! There is one big difference between BERT and GPT-2, in that BERT is trained using masked language modeling, whereas GPT-2 is trained using causal language modeling.
During pre-training, BERT learns to predict masked words given a bi-directional context. GPT-2, on the other hand, learns to predict a word given only its left context. This is why GPT-2 is very good at text generation (it only needs the left-hand side context), while BERT isn't.
Given this, GPT-2 won't be able to do previous word prediction, as it does not handle the right-hand side context.