I am currently trying to train a classifier using pretrained vectors. I am basically using the command:
./fasttext supervised -input <path-to-input-file> -output <path-to-output-file> -pretrainedVectors <path-to-pretrained-vectors-file>
Unfortunately, I always get the following error message:
Read 0M words
Number of words: 13564
Number of labels: 2
Dimension of pretrained vectors does not match -dim option
Even I explicitly set the dimension to the size of my pretrained vectors (using the "-dim" option), I still obtain the same error message. Have I misunderstood anything? Thank you.
just to clarify: I am using embeddings I have trained with Word2Vec. The file with those embeddings are used in the non-binary format.
What is the dim of your pretrained vector file?
400 dimensions
Is that a .bin file?
The file is not a binary file. It is a text file which is human readable.
I found the mistake: I did not include the first line of Word2Vec which explicitly states the size of the vocabulary and the number of dimensions. Now, having this information included, it seems to be accepted by fastText.
Most helpful comment
I found the mistake: I did not include the first line of Word2Vec which explicitly states the size of the vocabulary and the number of dimensions. Now, having this information included, it seems to be accepted by fastText.