Bert: Feature vectors represent word embeddings ?

Created on 7 Nov 2018 · 2Comments · Source: google-research/bert

I thought the feature vectors extracted from BERT represents word embeddings.

So I thought, in order to use these embeddings, one just have to extract it (using extract_features.py), then load the weights in an Embedding layer (yes, I'm a Keras person). Then just build whatever we want on the top of this Embedding layer.

But it is wrong, isn't it ? Using extract_features.py, I got the weights of the last 4 layers, for each words in each sentences fed as input !

So instead of having 4 * X weights (X being the size of a layer) as I expected, I have
4 * X * tokens_used_in_input_file weights !

How do I use the Feature vectors to build on top of BERT a task-specific model architecture ?

Source

astariul-colanim

Most helpful comment

The embedding table is context-free wordpiece embeddings. These are not particularly useful. They will just be worse versions of what you would get from GloVe/word2vec/FastText etc.

extract_features.py gives you contextual representations, which are "embeddings" of each token in the context of the sentence. This is what you would want to build a model on. For this, you need to run your full training and test data through extract_features.py and use the input vector just like you would use an embedding (to handle the 4x, you can just concatenate the 4 vectors for each word).

jacobdevlin-google on 7 Nov 2018

👍8

All 2 comments

The embedding table is context-free wordpiece embeddings. These are not particularly useful. They will just be worse versions of what you would get from GloVe/word2vec/FastText etc.

jacobdevlin-google on 7 Nov 2018

👍8

you need to run your full training and test data through extract_features.py and use the input vector just like you would use an embedding (to handle the 4x, you can just concatenate the 4 vectors for each word).

Oh I see.

I thought extract_features.py is a script to process the Embeddings and then we can use these wherever we want.

But from what you said, extract_features.py _IS_ the Embeddings layer.

It makes sense, having Embeddings for each words independently would mean no context.

Thank you very much for your kind and clear explanations.

astariul-colanim on 7 Nov 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings