It's at line 185 in modelling.py
Leaving this issue open as someone (or me) might have some doubts about the implementation (which is not explained in detail in the paper)
yes me
On Feb 6, 2019 6:44 PM, João Lages notifications@github.com wrote:
Leaving this issue open as someone (or me) might have some doubts about the implementation (which is not explained in detail in the paper)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/386#issuecomment-460994435, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AWBz_C5HTZdbNwzPGRg1NqIH-eFhq0w-ks5vKsADgaJpZM4aMcK8.
@hsm207 would you care to explain the implementation of these embeddings in detail? I.e., what are each embeddings (token, segment and position) and how they are combined? Thanks :)
@JoaoLages I wrote a blog post explaining the embeddings. I hope its detailed enough :)
@hsm207 Once again, thank you A LOT for the amazing blog post. I just have another question: BERT mentions BPE in their paper. But for what I see the embedding table is actually a lookup table, that does not deal with OOV problems.
To my understanding, BPE is used to train the word piece embeddings. Idk if BERT retrained those embeddings previously or just used the pretrained ones in the lookup table for the token embeddings.
@JoaoLages I'm glad it helped.
Where in the paper was BPE mentioned?
Oh sorry, I was confused with the openAI GPT-2 model. All good :)
Most helpful comment
@JoaoLages I wrote a blog post explaining the embeddings. I hope its detailed enough :)