Transformers: how to use extracted features in extract_features.py?

Created on 16 Apr 2019 · 17Comments · Source: huggingface/transformers

I extract features like examples in extarct_features.py. But went I used these features (the last encoded_layers) as word embeddings in a text classification task, I got a worse result than using 300D Glove(any other parameters are the same). I also used these features to compute the cos similarity for each word in sentences, I found that all values were around 0.6. So are these features can be used as Glove or word2vec embeddings? What exactly these features are?

Discussion wontfix

Source

heslowen

Most helpful comment

Without fine-tuning, BERT features are usually less useful than plain GloVe or wrd2vec indeed.
They start to be interesting when you fine-tune a classifier on top of BERT.

See the recent study by Matthew Peters, Sebastian Ruder, Noah A. Smith (To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks) for some practical tips on that.

thomwolf on 17 Apr 2019

👍9

All 17 comments

Without fine-tuning, BERT features are usually less useful than plain GloVe or wrd2vec indeed.
They start to be interesting when you fine-tune a classifier on top of BERT.

See the recent study by Matthew Peters, Sebastian Ruder, Noah A. Smith (To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks) for some practical tips on that.

thomwolf on 17 Apr 2019

👍9

thank you so much~

heslowen on 17 Apr 2019

@heslowen could you please share the code for extracting features in order to use them for learning a classifier? Thanks.

joistick11 on 23 Apr 2019

👍3

@joistick11 you can find a demo in extract_features.py

heslowen on 24 Apr 2019

Could you please help me?
I was using bert-as-service (https://github.com/hanxiao/bert-as-service) and there is model method encode, which accepts list and returns list of the same size, each element containing sentence embedding. All the elements of the same size.

When I use extract_features.py, it returns embedding for each recognized symbol in the sentence from the specified layers. I mean, instead of sentence embedding it returns symbols embeddings. How should I use it, for instance, to train an SVM? I am using bert-base-multilingual-cased
Which layer output should I use? Is it with index -1?

Thanks you very much!

joistick11 on 24 Apr 2019

@joistick11 you want to embed a sentence to a vector?
all_encoder_layers, pooled_output = model(input_ids, token_type_ids=None, attention_mask=input_mask) pooled_output may help you.
I have no idea about using these features to train an SVM although I know the theory about SVM.
For the second question, please refer to thomwolf's answer.
I used the top 4 encoders_layers, but I did not get a better result than using Glove

heslowen on 25 Apr 2019

@heslowen Hello, would you please help me? For a sequence like [cls I have a dog.sep], when I input this to Bert and get the last hidden layer of sequence out, let’s say the output is “vector”, is the vector[0] embedding of cls, vector[1] embedding of I, etc. vector[-1] embedding of sep?

RomanShen on 29 Apr 2019

@heslowen How did you extract features after training a classifier on top of BERT? I've been trying to do the same, but I'm unable to do so.
Do I first follow run_classifier.py, and then extract the features from tf.Estimator?

rvoak on 2 May 2019

@rvoak I use pytorch. I did it as the demo in extract_featrues.py. it is easy to do that, you just need to load a tokenizer, a bert model, then tokenize your sentences, and then run the model to get the encoded_layers

heslowen on 5 May 2019

@RomanShen yes you're right

heslowen on 6 May 2019

@heslowen Thanks for your reply!

RomanShen on 6 May 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 5 Jul 2019

@heslowen sorry about my english, now i doing embedding for sentence task, i tuned with my corpus with this library, and i received config.json, vocab.txt and model.bin file, but in bert's original doc, can extract feature when load from ckpt tensorflow checkpoint. according to your answer, i must write feature_extraction for torch, that's right ? please help me

hungph-dev-ict on 11 Aug 2019

@hungph-dev-ict Do you mind opening a new issue with your problem? I'll try and help you out.

LysandreJik on 12 Aug 2019

❤1 👍1

@LysandreJik Thank you for your help, I will find solution for my problem, it's use last hidden layer in bert mechanism, but if you have a better solution, can you help me ?
So i have more concerns about with my corpus, with this library code, use tokenizer from pretrained BERT model, so I want use only BasicTokenizer. Can you help me ?

hungph-dev-ict on 15 Aug 2019

How long should the extract_features.py take to complete?

when using 'bert-large-uncased' it takes seconds however it writes a blank file.
when using 'bert-base-uncased' its been running for over 30 mins.

any advice?

the code I used:

!python extract_features.py \
--input_file data/src_train.txt \
--output_file data/output1.jsonl \
--bert_model bert-base-uncased \
--layers -1