Fasttext: Crashes with "Encountered NaN"

Created on 15 Dec 2017  路  6Comments  路  Source: facebookresearch/fastText

I stumbled on a problem that seems to consistently crash fastText when using a particular training data set for classification and specific training options. This seems to be very sensitive to both the data set and the parameters: if I change anything then most likely the crash will go away.

I'm using fastText 97fcde80ea107ca52d3d778a083564619175039c (Dec 14, 2017) from the master branch. My OS is Ubuntu 16.04 amd64. gcc --version reports gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609.

Here is what happens:

$ ./fasttext supervised -input train200k.txt -output model2 -lr 1.0 -epoch 25 -loss hs
Read 2M words
Number of words:  360860
Number of labels: 25272
Progress:   12.4% words/s:   1441702 lr:  0.875922 loss: 118.363022 eta:   0h 0mterminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
  what():  Encountered NaN.terminate called recursively
terminate called recursively
Aborted (core dumped)

Often the error message is a bit shorter than this but this was one of the more verbose runs. Also the crash happens at slightly different progress percentages between 12% and 13%.

I've put up the training data here (gzipped): http://tester-os-kktest.lib.helsinki.fi/fastText/train200k.txt.gz

This originally happened with a larger training data set but I was able to cut it down to these 200k lines. If I cut it further the crash will not happen anymore.

I tried on a different Ubuntu 16.04 machine too (same gcc version) and got the same crash so I believe this should be reproducible.

Most helpful comment

Hello @osma,

Thank you for your post. Please consult the FAQ. Let me know if this helps.

Thanks,
Christian

All 6 comments

Hello @osma,

Thank you for your post. Please consult the FAQ. Let me know if this helps.

Thanks,
Christian

Ah right. I had missed the FAQ entry on NaNs (often the error message is truncated so the NaN message was only shown maybe one out of 4 tries). Thanks for your quick response and sorry for the noise!

Hello @osma,

Thank you for your perfectly detailed original post by the way! I can see that the Error message is indeed not as visible as I'd want either. We'll work on this :)

Thanks,
Christian

I get this error after increase the learning rate. It`s amazing! No error after this command:

fasttext cbow -input data/pre_Manga_fruta.txt \
              -output models/cbow_manga_fruta -ws 7 \
              -lr 0.7 -epoch 22000 -dim 2 -minCount 1 -minn 1 -thread 1

But error after this one:

fasttext cbow -input data/pre_Manga_fruta.txt \
              -output models/cbow_manga_fruta -ws 7 \
              -lr 0.8 -epoch 22000 -dim 2 -minCount 1 -minn 1 -thread 1

terminate called after throwing an instance of 'std::runtime_error'
what(): Encountered NaN.
Aborted (core dumped)

Hi,
I'm having the same issue running on Colab with a learning rate of 0.3.
Does it depend on the available RAM?

Would it be better if i trained the supervised model on a cluster?
Cheers

The error, in the question, occurred in supervised mode. The loss was so high 118.363022! The main proposal of supervised model is classification (language or labels). In fact, the order of magnitude of labels' number is quite unusual: 25272!

Unfortunately, fastText doesn't have a constraint to avoid the gradient explosion!

I suggest to reduce the learning rate below 0.1.

Was this page helpful?
0 / 5 - 0 ratings