Fasttext: Loading pre-trained .bin models for supervised training

Created on 25 Jul 2017  路  2Comments  路  Source: facebookresearch/fastText

I'm having trouble loading the pre-trained Wikipedia word vector models in the .bin file format for supervised training.

Specifically,
./fasttext supervised -input b_1.txt -outputfb -pretrainedVectors wiki.id.bin -dim 300
yields:
Dimension of pretrained vectors does not match -dim option
Here's the fb_1.txt

I am using the 300-dimension vectors and have confirmed that the .vec format contains 300 dimensions. Does anyone know how to do this?

I'm also having trouble loading .bin word vector models that I trained myself using:
./fasttext skipgram -input fb_1_unlabeled.txt -output fb_1_unsup

The reason I want to do this is that, to my understanding, the .bin model contains sub-word information such as character n-grams and also model parameters to allow training continuation - all of which should help build a better classifier. Am I wrong?

Most helpful comment

yes i think you just need to use the .vec file.

the below example trains OK for me:

./fasttext supervised \
  -pretrainedVectors mj/corpus/wiki.zh.vec \
  -input mj/corpus/faq.train \
  -dim 300 \
  -output mj/corpus/faq.model

however, it seems to be taking a long while :)

All 2 comments

I am new to fasttext but the docs and examples imply that you should give a .vec file as pretrainedVectors. I also would expect to be able to give the embeding you produced with fasttext (unsupervised) but my guess is it would have complicated their implementation. I hope I am wrong. Anyway I am interested in how they are using the vectors for classification, what happens to words from a document but not in the .vec file?

yes i think you just need to use the .vec file.

the below example trains OK for me:

./fasttext supervised \
  -pretrainedVectors mj/corpus/wiki.zh.vec \
  -input mj/corpus/faq.train \
  -dim 300 \
  -output mj/corpus/faq.model

however, it seems to be taking a long while :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hughbzhang picture hughbzhang  路  3Comments

pengyu picture pengyu  路  3Comments

ragvri picture ragvri  路  3Comments

kurtjanssensai picture kurtjanssensai  路  3Comments

mino98 picture mino98  路  3Comments