Fasttext: Questions regarding the embeddings produced by the `skipgram` and `supervised` options

Created on 20 Aug 2016 · 3Comments · Source: facebookresearch/fastText

Hello!

As far as I understand fastText is implementing two research papers [1, 2] and both papers can be used to learn word embeddings:

[1] learns the embeddings by predicting the current word from its surrounding character n-grams
[2] learns word embeddings that are specifically geared towards a classification task

A few questions:

Given that both systems have an embedding component, I was wondering whether: (i) you tried to perform the classification task on the skip-gram embeddings from [1]; (ii) you could modify the architecture in [2] to work on character n-grams.
In [2] you are averaging word embeddings to obtain the embedding of a sentence. Does averaging make sense for the skip-gram embeddings from [1] as well? More generally, when is it a good idea to average embeddings in order to obtain the embedding of a larger chunk of text? This question might be related to #26.
The help function suggests that the two parts of the code (skipgram and supervised) use the same arguments. Is this right? Do you use character n-grams for supervised (the minn and maxn options) or n-gram words for skipgram (the wordNgram option)?

Thanks!

[1] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification

Source

danoneata

Most helpful comment

We tried to use character n-grams for supervised classification. Early experiments on the sentiment analysis datasets used in [2] showed little or no improvement.
Averaging the vectors from a pre-trained skip-gram model to obtain vectors for larger chunks of text does not work well for classification tasks. This has been observed by multiple authors from the NLP community.
We currently do not use character n-grams for supervised and word n-grams for skipgram and cbow. We are working on adding these functionalities.

EdouardGrave on 22 Aug 2016

👍7

All 3 comments

We tried to use character n-grams for supervised classification. Early experiments on the sentiment analysis datasets used in [2] showed little or no improvement.
Averaging the vectors from a pre-trained skip-gram model to obtain vectors for larger chunks of text does not work well for classification tasks. This has been observed by multiple authors from the NLP community.
We currently do not use character n-grams for supervised and word n-grams for skipgram and cbow. We are working on adding these functionalities.

EdouardGrave on 22 Aug 2016

👍7

how does fastText outputs sentence representation by supervised model. I am using supervised model and want the vector representation fro each sentence.