Fasttext: What is the difference between textVector and sentenceVector?

Created on 27 Sep 2017 · 9Comments · Source: facebookresearch/fastText

In the source code, textVector() and sentenceVector() are both for generating the vector of a series of words. If the model is supervised text classification model, textVector() is used, and sentenceVector in other case. The only difference between them is that sentence vector is normed but textVector not. What is the reason for this?

Source

XuelinZeng

👍4

Most helpful comment

Hello @XuelinZeng and @ilaxes,

Thank you for your post. The difference between getWordVector and getSentenceVector is that the latter used getWordVector to assemble a single vector for a sequence of tokens (words).

That means, you use getWordVector if you want to receive an embeddings for a single word and getSentenceVector if you want to get an embedding for a sentence (sequence of words).

For a supervised model getSentenceVector will simply average the word vectors for each word in a line of text. For all other models (cbow and skipgram) getSentenceVector will divide each word vector by it's norm and then average them. Now, it is important to keep in mind that any sentence will end with a newline. That means "one two" actually translates into the vectors for "one", "two" and EOS.

$ ./fasttext supervised -input dbpedia.train -output model -thread 10 -epoch 8 -verbose 2 -dim 2
Read 32M words
Number of words:  803537
Number of labels: 14
Progress: 100.0% words/sec/thread: 3130504 lr:  0.000000 loss:  0.507742 ETA:   0h 0m

$ ./fasttext print-sentence-vectors model.bin
one two
2.6772 -3.0886

$ ./fasttext print-word-vectors model.bin
one
one -0.0065439 0.1416
two
two 0.086216 0.10804
</s>
</s> 7.9518 -9.5155

We now have (-0.0065439 + 0.086216 + 7.9518) / 3 = 2.6772 and (0.1416 + 0.10804 + -9.5155) / 3 = -3.0886 as expected for a supervised model.

I'm closing this issue now as I consider it resolved, but please feel encouraged to reopen it at any time if you don't.

Thanks,
Christian

cpuhrsch on 20 Dec 2017

👍20 ❤5

All 9 comments

I follow-up on this question because mine is related.

I ran a simple example of classification and had a look at an words/sentence vectors.
From the different discussions and the code, sentence vector is supposed to be the average of the normalised words vector but when I try to do the calculation by myself I don't match the outputs :

echo -e "__1 one two three\n__0 four five six" > train.txt
./fasttext/fasttext supervised -input train.txt -output model -dim 2 -label __
echo "one two" | ./fastText/fasttext print-word-vectors model.bin
one -0.46124 -0.36163 
two 0.092583 -0.19925
echo "one two" | ./fastText/fasttext print-sentence-vectors model.bin
-0.27689 -0.083967

My result for sentence vector : -0.18278494 -0.761943035

When I look at the .vec file there is a special character , my guess is it is used as start/end of sentence but is not included in the word vectors of my input. What is it exactly ?

Thanks,

ilaxes on 18 Oct 2017

Hello @XuelinZeng and @ilaxes,

Thank you for your post. The difference between getWordVector and getSentenceVector is that the latter used getWordVector to assemble a single vector for a sequence of tokens (words).

That means, you use getWordVector if you want to receive an embeddings for a single word and getSentenceVector if you want to get an embedding for a sentence (sequence of words).

$ ./fasttext supervised -input dbpedia.train -output model -thread 10 -epoch 8 -verbose 2 -dim 2
Read 32M words
Number of words:  803537
Number of labels: 14
Progress: 100.0% words/sec/thread: 3130504 lr:  0.000000 loss:  0.507742 ETA:   0h 0m

$ ./fasttext print-sentence-vectors model.bin
one two
2.6772 -3.0886

$ ./fasttext print-word-vectors model.bin
one
one -0.0065439 0.1416
two
two 0.086216 0.10804
</s>
</s> 7.9518 -9.5155

We now have (-0.0065439 + 0.086216 + 7.9518) / 3 = 2.6772 and (0.1416 + 0.10804 + -9.5155) / 3 = -3.0886 as expected for a supervised model.

I'm closing this issue now as I consider it resolved, but please feel encouraged to reopen it at any time if you don't.

Thanks,
Christian

cpuhrsch on 20 Dec 2017

👍20 ❤5

Hi,

I used minn=2, maxn=2 and trained my supervised model

When I used print-word-vectors, got

a 1.2892 0.35762 
</s> -4.0258 4.9202

but print-sentence-vectors showed

-0.063226 1.2798

why not (1.2892 + -4.0258) / 2 = −1.3683?

seanappler on 24 Apr 2018

👍1

Hi @cpuhrsch, thanks for the helpful comment. But the case you described works probably when the wordNgrams=1. How is the sentence vector calculated when for example wordNgrams=2. I suppose the vectors of n-grams are also calculated in the average. But I am trying for get the idea on just one word. And when I calculate (word_vec('x') + word_vec('</s>'))/2 != sent_vec('x').

Thanks in advance.

sipan17 on 13 Nov 2018

Hi @cpuhrsch , what kind of norm is the unsupervised model using? Besides, is this </s> necessary for corpora training?

Solved

After seeing the code at https://github.com/facebookresearch/fastText/blob/master/src/vector.cc#L35

I find it is L2 norm.

Besides, ther is no </s> for unsupervised model.

1049451037 on 21 Feb 2019

After hours searching, I think I need to clarify this:

from @cpuhrsch comment:

For all other models (cbow and skipgram) getSentenceVector will divide each word vector by it's norm and then average them

it needs to be noted that the averaging process is involve "divide each word vector by it's norm", so that's why your result is not same @seanappler @sipan17 you can see the source code here

it is also shown in the code above, that getSentenceVector only calculate the average of vectors that have positive L2 norm (see variable count). For example, if you use cc.en.300, a newline "\n" has 0 value L2 norm. So if your sentence is only "x" then the sum of the vector only divided by 1 (not 2)

rianrajagede on 24 May 2019

The approach @rianrajagede is suggesting is not necessarily true as it doesn't work for my case where minn and maxn are 4 in a fasttext supervised classification model. The caveat here is that my model is infact a supervised model and simple averaging should work but it doesn't.

sentence_vector_2 = ft_model_2.get_sentence_vector('cordless drills')
n_grams = ft_model_2.get_subwords('cordless')[0] + ft_model_2.get_subwords('drills')[0] + ft_model_2.get_subwords('</s>')[0]

def div_norm(x):
   norm_value = LA.norm(x)
   if norm_value > 0:
       return x * ( 1.0 / norm_value)
   else:
       return False
print('N-Grams:',n_grams)
start = np.zeros(100)
count = 0
for word in n_grams:
    add = div_norm(ft_model_2[word])
    if add.any() != False:
        start += add
        count +=1
print(count)
recreated_2  = start/count
print(np.round(recreated_2, 3) == np.round(sentence_vector_2, 3))

this yields an array of False's.

N-Grams: ['cordless', '<cor', 'cord', 'ordl', 'rdle', 'dles', 'less', 'ess>', 'drills', '<dri', 'dril', 'rill', 'ills', 'lls>', '</s>']
15
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False]

I have tried everything I can think of including simple averaging and averaging vectors divided by their l2-norms and am still pretty unclear on how fasttext is exactly creating the word vector in subword cases. if anyone has any insight on this please share. much appreciated.

Umar-Ayub on 4 Dec 2019

@cpuhrsch you are right for the simple case of minn and maxn being 0. However, this is not the case when minn and maxn are greater than 0 in a supervised model and the sentence vector does not seem to be a simple averaging of the n_gram vectors. This case should not be closed. For evidence you can see this https://github.com/facebookresearch/fastText/issues/966.

Umar-Ayub on 4 Dec 2019

Has there been any clarification provided on this?