Hello,
I am hearing a lot that I should use BERT for applications instead of word embeddings, but I am not understanding what BERT is? Maybe becuase I do not know transformer.
Could anyone explain, what is BERT? How does it differ from word2vec, Glove and more importantly elmo.
How does bert differ from openai-gpt?
How can I adapt bert to a question-answering model or any classification task?
Any help is highly appreciated.
Thank you.
Most of your questions are explained in the original paper:
Guess you want this awesome post: illustrated-bert

Most helpful comment
Guess you want this awesome post: illustrated-bert
