Fasttext: Default -maxn parameter

Created on 20 Apr 2017 · 7Comments · Source: facebookresearch/fastText

Hello!

I have been playing with fastText and stumbled upon something that confused me a little bit. According to the default parameters in the documentation it seems to me that -maxn is set to 0 by default:

-minn min length of char ngram [0]
-maxn max length of char ngram [0]
-thread number of threads [12]

However, when I run the code I get significantly different running time based on whether I call ./fasttext with the default -maxn parameter or explicitly set it to 0 (leading me to believe that the default value is not 0). For example:

With -maxn 0 : running time of ~3 minutes
Not calling -maxn : running time of ~18 minutes.

Hopefully you can clarify this for me. :)

Source

arnor-sigurdsson

Most helpful comment

Hi @arnor-sigurdsson, @bkj and @fnielsen,

This is correct, the default parameters are not the same for the different modes of fastText.

The best way to get the default parameters for a given mode is to run fastText without arguments, e.g.

./fasttext skipgram

./fasttext supervised

We will make this clearer in the documentation.

EdouardGrave on 2 May 2017

👍2

All 7 comments

Which model are you talking about?

bkj on 20 Apr 2017

I am using cbow in both cases, here are the parameters and running times:

-maxn 0:

time ./fasttext cbow -input -lr 0.1 -epoch 5 -ws 5 -minCount 10 -neg 5 -loss ns -t 0.001 -thread 12 -wordNgrams 0 -maxn 0 -output

Read 100M words
Number of words: 279974
Number of labels: 0
Progress: 100.0% words/sec/thread: 395830 lr: 0.000000 loss: 1.265415 eta: 0h0m

real 3m14.749s
user 21m22.638s
sys 0m11.140s

Default -maxn

time ./fasttext cbow -input -lr 0.1 -epoch 5 -ws 5 -minCount 10 -neg 5 -loss ns -t 0.001 -thread 12 -wordNgrams 1 -output

Read 100M words
Number of words: 279974
Number of labels: 0
Progress: 36.1% words/sec/thread: 64077 lr: 0.063866 loss: 1.862925 eta: 0h6m ^C

real 6m48.212s
user 47m8.746s
sys 0m17.913s

As can be seen I did not finish running the default -maxn run, but it took longer time already at ~36% and has a much lower words/sec/thread count.

arnor-sigurdsson on 20 Apr 2017

The defaults are different depending on the model, and I don't think that the default maxn for CBOW is 0. Try running: fasttext cbow. I see

  -minn               min length of char ngram [3]
  -maxn               max length of char ngram [6]

(though I'm not sure about the version on this machine)

bkj on 20 Apr 2017

👍1

I think the documentation in the https://github.com/facebookresearch/fastText/blob/master/README.md could be confusing. The program itself will printout the "correct" default parameters for the unsupervised training when executed:

$ fasttext cbow
[...]
  -minn               min length of char ngram [3]
  -maxn               max length of char ngram [6]

This corresponds to https://github.com/facebookresearch/fastText/blob/master/src/args.cc#L32

Only when the program is called with fasttext supervised the "maxn=0" is displayed. So the default for cbow or skipgram training is not 0 but 6, - if I understand correctly. And that could explain the timing results.

I wonder if the README.md should be changed/extended with the default parameters for the unsupervised case.