Transformers: how to use extracted features in extract_features.py?

Created on 16 Apr 2019  Â·  17Comments  Â·  Source: huggingface/transformers

I extract features like examples in extarct_features.py. But went I used these features (the last encoded_layers) as word embeddings in a text classification task, I got a worse result than using 300D Glove(any other parameters are the same). I also used these features to compute the cos similarity for each word in sentences, I found that all values were around 0.6. So are these features can be used as Glove or word2vec embeddings? What exactly these features are?

Discussion wontfix

Most helpful comment

Without fine-tuning, BERT features are usually less useful than plain GloVe or wrd2vec indeed.
They start to be interesting when you fine-tune a classifier on top of BERT.

See the recent study by Matthew Peters, Sebastian Ruder, Noah A. Smith (To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks) for some practical tips on that.

All 17 comments

Without fine-tuning, BERT features are usually less useful than plain GloVe or wrd2vec indeed.
They start to be interesting when you fine-tune a classifier on top of BERT.

See the recent study by Matthew Peters, Sebastian Ruder, Noah A. Smith (To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks) for some practical tips on that.

thank you so much~

@heslowen could you please share the code for extracting features in order to use them for learning a classifier? Thanks.

@joistick11 you can find a demo in extract_features.py

Could you please help me?
I was using bert-as-service (https://github.com/hanxiao/bert-as-service) and there is model method encode, which accepts list and returns list of the same size, each element containing sentence embedding. All the elements of the same size.

  1. When I use extract_features.py, it returns embedding for each recognized symbol in the sentence from the specified layers. I mean, instead of sentence embedding it returns symbols embeddings. How should I use it, for instance, to train an SVM? I am using bert-base-multilingual-cased
  2. Which layer output should I use? Is it with index -1?

Thanks you very much!

@joistick11 you want to embed a sentence to a vector?
all_encoder_layers, pooled_output = model(input_ids, token_type_ids=None, attention_mask=input_mask) pooled_output may help you.
I have no idea about using these features to train an SVM although I know the theory about SVM.
For the second question, please refer to thomwolf's answer.
I used the top 4 encoders_layers, but I did not get a better result than using Glove

@heslowen Hello, would you please help me? For a sequence like [cls I have a dog.sep], when I input this to Bert and get the last hidden layer of sequence out, let’s say the output is “vector”, is the vector[0] embedding of cls, vector[1] embedding of I, etc. vector[-1] embedding of sep?

@heslowen How did you extract features after training a classifier on top of BERT? I've been trying to do the same, but I'm unable to do so.
Do I first follow run_classifier.py, and then extract the features from tf.Estimator?

@rvoak I use pytorch. I did it as the demo in extract_featrues.py. it is easy to do that, you just need to load a tokenizer, a bert model, then tokenize your sentences, and then run the model to get the encoded_layers

@RomanShen yes you're right

@heslowen Thanks for your reply!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@heslowen sorry about my english, now i doing embedding for sentence task, i tuned with my corpus with this library, and i received config.json, vocab.txt and model.bin file, but in bert's original doc, can extract feature when load from ckpt tensorflow checkpoint. according to your answer, i must write feature_extraction for torch, that's right ? please help me

@hungph-dev-ict Do you mind opening a new issue with your problem? I'll try and help you out.

@LysandreJik Thank you for your help, I will find solution for my problem, it's use last hidden layer in bert mechanism, but if you have a better solution, can you help me ?
So i have more concerns about with my corpus, with this library code, use tokenizer from pretrained BERT model, so I want use only BasicTokenizer. Can you help me ?

How long should the extract_features.py take to complete?

when using 'bert-large-uncased' it takes seconds however it writes a blank file.
when using 'bert-base-uncased' its been running for over 30 mins.

any advice?

the code I used:

!python extract_features.py \
--input_file data/src_train.txt \
--output_file data/output1.jsonl \
--bert_model bert-base-uncased \
--layers -1

You can look at what the BertForSequenceClassification model https://github.com/huggingface/transformers/blob/3ba5470eb85464df62f324bea88e20da234c423f/pytorch_pretrained_bert/modeling.py#L867 does in it’s forward 139.
The pooled_output obtained from self.bert would seem to be the features you are looking for.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fyubang picture fyubang  Â·  3Comments

fabiocapsouza picture fabiocapsouza  Â·  3Comments

0x01h picture 0x01h  Â·  3Comments

lemonhu picture lemonhu  Â·  3Comments

adigoryl picture adigoryl  Â·  3Comments