Fasttext: Output of print-sentence-vectors is hard to parse

Created on 30 Aug 2017 · 6Comments · Source: facebookresearch/fastText

As tangentially noted by @kootenpv in #129, there is no delimiter, no clear way to know which part of the output line is the sentence, and which part is the vector.

For example:

This is not a test. -0.039222 -0.002648 -0.028442 0.039124 -0.0020073 -0.0052479 -0.020197 -0.028812 0.035525 -0.00065622 0.057748 0.026362 0.038559 0.10918 0.034084 -0.086161 0.01623 -0.064122

Imagine if the sentence is 'This is not a test. -0.039222'. The correct parse can be inferred by looking across all the lines, but that is quite a bit of code and may error out or make a false assumption.

So for now, code that uses this output should know what the sentence was, or know the number of dimensions.

Source

bittlingmayer

👍2

Most helpful comment

Hello @cpuhrsch
which release are you referring too? I just downloaded the version v0.1.0 (2nd Dec) and it still there!

Thanks,
Mohammad

akbari59 on 10 Jan 2018

👍2

All 6 comments

I don't have a solution for this, I just wanted to note how insane it is.

cigrainger on 31 Oct 2017

I have a (really slow) workaround. Reverse, take the number of -dim you set (split on space), then reverse again and spit that out to a file.

./fasttext print-sentence-vectors model.bin < sentences.txt | rev | cut -d ' ' -f 1-301 | rev > docvecs

cigrainger on 31 Oct 2017

Hello @bittlingmayer,

Thank you for your post. This has been fixed within recent commits. You will now only see the vector, but not the sentence itself. For example:

$ ./fasttext print-sentence-vectors model.bin
one two
2.6772 -3.0886

I'm going to close this issue now, but please feel encouraged to reopen it at any time if you don't consider this issue to be resolved.

Thanks,
Christian

cpuhrsch on 20 Dec 2017

👍1

Hello @cpuhrsch
which release are you referring too? I just downloaded the version v0.1.0 (2nd Dec) and it still there!

Thanks,
Mohammad

akbari59 on 10 Jan 2018

👍2

It looks like it works only for supervised models. If I train and print-sentence-vectors for skipgram model, input sentences (stdin) are in output.

mmrnustik on 18 Sep 2018

👍1

IMHO this is somewhat resolved by the fact that there are now official Python bindings, so bash scripts are no longer necessary.

bittlingmayer on 20 Sep 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Is there support for regression in fastText?

hughbzhang · 3Comments

how can I READ the model.bin in java or python

flybirp · 4Comments

About pre-trained embeddings from cbow model

ereday · 3Comments

Python fasttext build failure

shriiitk · 3Comments

About the input format of `fastext`

pengyu · 3Comments