Bert: How can I make word embedding using Bert?

Created on 11 Jan 2019  路  5Comments  路  Source: google-research/bert

Hi,

I want to make feature vectors from my documents using Bert.
I would like to make a vector for each word in my texts, make the average vectors of my words for each document and add it as one of the features to my classifier.
I have read extract_features.py script, but I couldn't get how I can use Bert and make the word embedding and extract features from my text docs.
Would you please help me understand what the step by step process is for making this vector representation?
Do I need to customize Bert, if yes would you please point me to the files that need to be changed?

Many thanks!

Most helpful comment

@saeideh-sh
Here is another way to get word embedding from BERT. Please check it out!
https://github.com/imgarylai/bert-embedding

All 5 comments

Many thanks!
I want to make sure I got the idea. So, I have a question regarding Bert word embedding result.
I am going to make a list of all my documents and then use bc.encode([doc_1,doc_2,...]). Does Bert make an array of weights for each document and not an individual word?

Thanks again for your help!

@saeideh-sh
Here is another way to get word embedding from BERT. Please check it out!
https://github.com/imgarylai/bert-embedding

Many thanks!
I want to make sure I got the idea. So, I have a question regarding Bert word embedding result.
I am going to make a list of all my documents and then use bc.encode([doc_1,doc_2,...]). Does Bert make an array of weights for each document and not an individual word?

Thanks again for your help!

Hi @saeideh-sh - did you ever find out if each word has a unique embedding ? :)

Hi. Have you found any solution on this?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sharavsambuu picture sharavsambuu  路  3Comments

awasthiabhijeet picture awasthiabhijeet  路  3Comments

okgrammer picture okgrammer  路  4Comments

miyamonz picture miyamonz  路  3Comments

quincyliang picture quincyliang  路  4Comments