This doesn't seem right.. LDA training on enwiki with 1000 topics. (gensim unmodified)
2015-08-02 12:09:07,550 : INFO : merging changes from 3750 documents into a model of 3831719 documents
2015-08-02 12:09:35,378 : INFO : topic #938 (0.001): 0.037_census + 0.034_population + 0.027_unincorporated + 0.020_community + 0.017_households + 0.016_landmarks + 0.016_$
2015-08-02 12:09:35,522 : INFO : topic #986 (0.001): 0.015_festival + 0.014_films + 0.013_documentary + 0.010_director + 0.009_award + 0.008_directed + 0.008_producer + 0.$
2015-08-02 12:09:35,666 : INFO : topic #492 (0.001): 0.066_kaunas + 0.048_davidson + 0.037_rosenberg + 0.034_kalamazoo + 0.026_blood + 0.024_sha + 0.023_thorpe + 0.022_vei$
2015-08-02 12:09:35,811 : INFO : topic #392 (0.001): 0.018_laser + 0.016_tucker + 0.015_optical + 0.014_forensic + 0.012_imaging + 0.011_pulse + 0.011_lab + 0.009_sample +$
2015-08-02 12:09:35,954 : INFO : topic #890 (0.001): 0.126_dutch + 0.116_van + 0.071_netherlands + 0.069_amsterdam + 0.034_holland + 0.027_hague + 0.022_der + 0.021_willem$
2015-08-02 12:09:36,098 : INFO : topic #769 (0.001): 0.064_icf + 0.053_cove + 0.050_newfoundland + 0.043_vancouver + 0.041_nunataks + 0.036_columbia + 0.030_labrador + 0.0$
2015-08-02 12:09:36,242 : INFO : topic #75 (0.001): 0.043_dong + 0.042_xu + 0.042_yi + 0.025_narayana + 0.024_tao + 0.023_bingham + 0.023_fei + 0.020_parr + 0.020_ren + 0.$
2015-08-02 12:09:36,386 : INFO : topic #742 (0.001): 0.040_peters + 0.031_leith + 0.030_kahn + 0.028_levy + 0.028_bart + 0.022_hedley + 0.019_bandit + 0.018_robyn + 0.017_$
2015-08-02 12:09:36,529 : INFO : topic #438 (0.001): 0.035_editor + 0.035_newspaper + 0.034_magazine + 0.021_published + 0.018_news + 0.016_daily + 0.014_journalism + 0.01$
2015-08-02 12:09:36,673 : INFO : topic #410 (0.001): 0.046_forest + 0.030_reserve + 0.028_forests + 0.024_species + 0.023_conservation + 0.020_habitat + 0.016_moist + 0.01$
2015-08-02 12:09:36,816 : INFO : topic #322 (0.001): 0.000_jawbone + 0.000_antiochus + 0.000_ddr + 0.000_gault + 0.000_noon + 0.000_fahey + 0.000_toth + 0.000_toto + 0.000$
2015-08-02 12:09:36,960 : INFO : topic #407 (0.001): 0.000_jawbone + 0.000_antiochus + 0.000_ddr + 0.000_gault + 0.000_noon + 0.000_fahey + 0.000_toth + 0.000_toto + 0.000$
2015-08-02 12:09:37,103 : INFO : topic #808 (0.001): 0.091_sf + 0.067_jensen + 0.066_isaac + 0.056_slater + 0.047_informatics + 0.045_hospice + 0.045_rot + 0.042_koblenz +$
2015-08-02 12:09:37,248 : INFO : topic #282 (0.001): 0.000_jawbone + 0.000_antiochus + 0.000_ddr + 0.000_gault + 0.000_noon + 0.000_fahey + 0.000_toth + 0.000_toto + 0.000$
2015-08-02 12:09:37,391 : INFO : topic #894 (0.001): 0.000_jawbone + 0.000_antiochus + 0.000_ddr + 0.000_gault + 0.000_noon + 0.000_fahey + 0.000_toth + 0.000*toto + 0.000$
2015-08-02 12:09:37,606 : INFO : topic diff=inf, rho=0.008998
2015-08-02 12:09:37,902 : INFO : PROGRESS: pass 0, dispatched chunk #12366 = documents up to #3091750/3831719, outstanding queue size 3
2015-08-02 12:09:55,582 : INFO : PROGRESS: pass 0, dispatched chunk #12367 = documents up to #3092000/3831719, outstanding queue size 2
2015-08-02 12:10:03,008 : INFO : PROGRESS: pass 0, dispatched chunk #12368 = documents up to #3092250/3831719, outstanding queue size 3
2015-08-02 12:10:17,426 : INFO : PROGRESS: pass 0, dispatched chunk #12369 = documents up to #3092500/3831719, outstanding queue size 3
Well, the sampler is not guaranteed to converge :) And the perplexity was high and oscillating a lot. I'll post back if it works next time.
This seems to be due to the previous divide by zero error. It's also not limited to ldamulticore but also occurs in ldamodel, when simply trying to model Wikipedia with 1000 topics.
I have been running further tests, and it occurs with 750 topics, but not 500, when using 100,000 words in the vocab on the english wikipedia.
I received your log, I'm on it.
Sorry this is taking so long Brian. We're moving countries and I've only had time for "trivial" open source fixes lately. Debugging this one looks more substantial :)
Oh no worries, I am not trying to rush or anything. I didn't even realize they were the same bug at first.
Experiencing the same issue, but only when adjusting the eta prior
@brianmingus Is this resolved? If not, could you please post the ling to the log gist? Thanks
I doubt this is resolved - it won't be resolved by accident.
@brianmingus Ok, could you please turn into a more tractable bug report?
Upload log to a gist, provide code to reproduce etc
This is a serious bug in gensim where it fails to converge when there are a certain number of topics. I think this bug is sufficiently spec'd out - @piskvorky seems to grok it.
I got the same bug when I set topics=1000, and I solved this problem by setting the parameter alpha=50/topic_num, eta=0.1, iteration=500
@brianmingus @ocsponge please attach concrete code & dataset for reproducing your problem
You do not "need info" for this bug. It is sufficiently spec'd out. Please stop asking for more info.
@brianmingus I don't agree with you because I can't reproduce it now, for this reason, I asked for additional information (code and dataset).
I provided enough info to replicate; @piskvorky did not ask for more info.
If you are interested in working on this ticket, the appropriate steps are to check out gensim from the date the ticket is posted, and a current one. If you can replicate on the old one but not the new one, it's fixed.
@menshikh-iv , @tmylk , @piskvorky , I'm having the same issue and am including my dataset, dictionary, and code. This is a corpus pulled from gutenberg project, split into 3.5 M documents using a rather clipped vocabulary of ~66000 words. I did not have problems when trying a 400-topic version but did run into issues with 1000 topics.
Dataset is 2GB zipped and can be downloaded from google drive.
Dictionary and repro code are attached as zips.
Create_LDA_Model_repro.zip
dictionary.zip
When I run the code I get a numerical value for topic diff in the first tranche of documents viewed. But then later I get topic diff=inf.
Here is the logging information:
C:\Winpython\WinPython-64bit-3.5.4.0Qt5\python-3.5.4.amd64\lib\site-packages\gensim\utils.py:862: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
2017-10-30 17:30:34,223 : INFO : loading Dictionary object from clean_vcompact_dictionary.pickle
2017-10-30 17:30:34,253 : INFO : loaded clean_vcompact_dictionary.pickle
2017-10-30 17:30:34,699 : INFO : loaded corpus index from E:/Clean_Corpus/Full_Corpus_LD.mm.index
2017-10-30 17:30:34,700 : INFO : initializing corpus reader from E:/Clean_Corpus/Full_Corpus_LD.mm
2017-10-30 17:30:34,700 : INFO : accepted corpus with 3443509 documents, 66457 features, 612903679 non-zero entries
2017-10-30 17:30:34,703 : INFO : using symmetric alpha at 0.001
2017-10-30 17:30:34,703 : INFO : using symmetric eta at 1.5047323833456219e-05
2017-10-30 17:30:34,712 : INFO : using serial LDA version on this node
2017-10-30 17:36:51,844 : INFO : running online LDA training, 1000 topics, 1 passes over the supplied corpus of 3443509 documents, updating every 4000 documents, evaluating every ~40000 documents, iterating 500x with a convergence threshold of 0.001000
2017-10-30 17:36:51,849 : INFO : training LDA model using 2 processes
C:\Winpython\WinPython-64bit-3.5.4.0Qt5\python-3.5.4.amd64\lib\site-packages\gensim\utils.py:862: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
C:\Winpython\WinPython-64bit-3.5.4.0Qt5\python-3.5.4.amd64\lib\site-packages\gensim\utils.py:862: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
2017-10-30 17:36:52,407 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #2000/3443509, outstanding queue size 1
2017-10-30 17:36:52,718 : INFO : loading Dictionary object from clean_vcompact_dictionary.pickle
2017-10-30 17:36:52,718 : INFO : loading Dictionary object from clean_vcompact_dictionary.pickle
2017-10-30 17:36:52,760 : INFO : loaded clean_vcompact_dictionary.pickle
2017-10-30 17:36:52,762 : INFO : loaded clean_vcompact_dictionary.pickle
2017-10-30 17:36:53,317 : INFO : loaded corpus index from E:/Clean_Corpus/Full_Corpus_LD.mm.index
2017-10-30 17:36:53,317 : INFO : initializing corpus reader from E:/Clean_Corpus/Full_Corpus_LD.mm
2017-10-30 17:36:53,318 : INFO : accepted corpus with 3443509 documents, 66457 features, 612903679 non-zero entries
2017-10-30 17:36:53,330 : INFO : loaded corpus index from E:/Clean_Corpus/Full_Corpus_LD.mm.index
2017-10-30 17:36:53,330 : INFO : initializing corpus reader from E:/Clean_Corpus/Full_Corpus_LD.mm
2017-10-30 17:36:53,330 : INFO : accepted corpus with 3443509 documents, 66457 features, 612903679 non-zero entries
2017-10-30 17:36:54,732 : INFO : PROGRESS: pass 0, dispatched chunk #1 = documents up to #4000/3443509, outstanding queue size 2
2017-10-30 17:36:57,326 : INFO : PROGRESS: pass 0, dispatched chunk #2 = documents up to #6000/3443509, outstanding queue size 3
2017-10-30 17:36:59,845 : INFO : PROGRESS: pass 0, dispatched chunk #3 = documents up to #8000/3443509, outstanding queue size 4
2017-10-30 17:37:00,760 : INFO : PROGRESS: pass 0, dispatched chunk #4 = documents up to #10000/3443509, outstanding queue size 5
2017-10-30 17:37:01,560 : INFO : PROGRESS: pass 0, dispatched chunk #5 = documents up to #12000/3443509, outstanding queue size 6
2017-10-30 17:38:09,246 : INFO : PROGRESS: pass 0, dispatched chunk #6 = documents up to #14000/3443509, outstanding queue size 6
2017-10-30 17:38:26,611 : INFO : merging changes from 4000 documents into a model of 3443509 documents
2017-10-30 17:38:37,241 : INFO : topic #124 (0.001): 0.009*"like" + 0.007*"alexandra" + 0.007*"came" + 0.007*"went" + 0.006*"boys" + 0.005*"mother" + 0.005*"oak" + 0.005*"mrs" + 0.005*"looked" + 0.005*"read"
2017-10-30 17:38:37,242 : INFO : topic #64 (0.001): 0.037*"shall" + 0.029*"thy" + 0.022*"lord" + 0.020*"god" + 0.018*"king" + 0.016*"unto" + 0.015*"things" + 0.015*"precious" + 0.014*"forth" + 0.014*"nephi"
2017-10-30 17:38:37,243 : INFO : topic #287 (0.001): 0.022*"lord" + 0.014*"man" + 0.010*"unto" + 0.010*"came" + 0.009*"shall" + 0.009*"god" + 0.007*"power" + 0.006*"thee" + 0.006*"gold" + 0.006*"mormon"
2017-10-30 17:38:37,244 : INFO : topic #188 (0.001): 0.048*"god" + 0.032*"unto" + 0.031*"hezekiah" + 0.026*"hand" + 0.023*"people" + 0.022*"deliver" + 0.019*"saying" + 0.014*"lord" + 0.013*"fathers" + 0.013*"king"
2017-10-30 17:38:37,244 : INFO : topic #173 (0.001): 0.043*"unto" + 0.028*"shall" + 0.019*"god" + 0.014*"things" + 0.013*"jesus" + 0.012*"came" + 0.012*"lord" + 0.012*"come" + 0.012*"people" + 0.010*"hath"
2017-10-30 17:38:37,593 : INFO : topic diff=985.619743, rho=1.000000
2017-10-30 17:38:37,656 : INFO : PROGRESS: pass 0, dispatched chunk #7 = documents up to #16000/3443509, outstanding queue size 6
2017-10-30 17:39:29,406 : INFO : PROGRESS: pass 0, dispatched chunk #8 = documents up to #18000/3443509, outstanding queue size 6
C:\Winpython\WinPython-64bit-3.5.4.0Qt5\python-3.5.4.amd64\lib\site-packages\gensim\models\ldamodel.py:728: RuntimeWarning: divide by zero encountered in log
diff = np.log(self.expElogbeta)
2017-10-30 17:39:50,225 : INFO : merging changes from 4000 documents into a model of 3443509 documents
2017-10-30 17:40:00,745 : INFO : topic #153 (0.001): 0.010*"letters" + 0.009*"caps" + 0.007*"small" + 0.007*"mother" + 0.007*"book" + 0.006*"word" + 0.005*"little" + 0.005*"long" + 0.005*"anne" + 0.004*"father"
2017-10-30 17:40:00,746 : INFO : topic #398 (0.001): 0.017*"shall" + 0.012*"come" + 0.010*"man" + 0.009*"father" + 0.009*"unto" + 0.009*"like" + 0.008*"thee" + 0.008*"know" + 0.008*"think" + 0.007*"world"
2017-10-30 17:40:00,747 : INFO : topic #264 (0.001): 0.015*"father" + 0.014*"unto" + 0.010*"girl" + 0.008*"went" + 0.008*"let" + 0.008*"away" + 0.008*"came" + 0.008*"jesus" + 0.007*"tarzan" + 0.006*"shall"
2017-10-30 17:40:00,748 : INFO : topic #134 (0.001): 0.016*"shall" + 0.014*"things" + 0.012*"unto" + 0.009*"know" + 0.008*"come" + 0.007*"let" + 0.007*"thy" + 0.007*"man" + 0.006*"hath" + 0.006*"alma"
2017-10-30 17:40:00,749 : INFO : topic #917 (0.001): 0.070*"sir" + 0.018*"gareth" + 0.013*"knight" + 0.012*"smote" + 0.011*"encountered" + 0.010*"came" + 0.009*"spear" + 0.009*"king" + 0.009*"lord" + 0.008*"unto"
2017-10-30 17:40:01,201 : INFO : topic diff=inf, rho=0.333333
2017-10-30 17:40:01,292 : INFO : PROGRESS: pass 0, dispatched chunk #9 = documents up to #20000/3443509, outstanding queue size 6
2017-10-30 17:40:33,389 : INFO : PROGRESS: pass 0, dispatched chunk #10 = documents up to #22000/3443509, outstanding queue size 6
2017-10-30 17:41:06,384 : INFO : merging changes from 4000 documents into a model of 3443509 documents
2017-10-30 17:41:17,228 : INFO : topic #35 (0.001): 0.016*"shall" + 0.009*"lord" + 0.008*"came" + 0.007*"alice" + 0.007*"anne" + 0.007*"went" + 0.007*"thee" + 0.007*"hath" + 0.006*"know" + 0.005*"little"
2017-10-30 17:41:17,230 : INFO : topic #802 (0.001): 0.027*"unto" + 0.017*"lord" + 0.017*"god" + 0.012*"shall" + 0.011*"men" + 0.010*"came" + 0.008*"hand" + 0.007*"jacob" + 0.006*"david" + 0.006*"man"
2017-10-30 17:41:17,232 : INFO : topic #713 (0.001): 0.013*"thy" + 0.008*"shall" + 0.006*"power" + 0.006*"people" + 0.005*"time" + 0.005*"lord" + 0.005*"ether" + 0.005*"thou" + 0.005*"account" + 0.004*"unto"
2017-10-30 17:41:17,234 : INFO : topic #958 (0.001): 0.013*"companions" + 0.010*"dog" + 0.009*"ape" + 0.009*"men" + 0.009*"man" + 0.008*"ancestors" + 0.008*"great" + 0.008*"preparations" + 0.008*"traveling" + 0.008*"selfish"
2017-10-30 17:41:17,235 : INFO : topic #819 (0.001): 0.012*"tom" + 0.011*"jim" + 0.009*"let" + 0.009*"cor" + 0.008*"warmed" + 0.008*"life" + 0.007*"time" + 0.006*"got" + 0.006*"come" + 0.006*"says"
2017-10-30 17:41:19,478 : INFO : topic diff=inf, rho=0.200000
2017-10-30 17:41:19,555 : INFO : PROGRESS: pass 0, dispatched chunk #11 = documents up to #24000/3443509, outstanding queue size 6
2017-10-30 17:41:21,343 : INFO : PROGRESS: pass 0, dispatched chunk #12 = documents up to #26000/3443509, outstanding queue size 6
2017-10-30 17:41:53,421 : INFO : merging changes from 4000 documents into a model of 3443509 documents
2017-10-30 17:42:04,382 : INFO : topic #395 (0.001): 0.040*"shall" + 0.012*"unto" + 0.010*"man" + 0.007*"days" + 0.007*"king" + 0.006*"came" + 0.006*"vision" + 0.006*"lord" + 0.005*"thee" + 0.005*"hand"
2017-10-30 17:42:04,383 : INFO : topic #965 (0.001): 0.013*"unto" + 0.012*"shall" + 0.012*"god" + 0.011*"man" + 0.009*"alma" + 0.008*"came" + 0.007*"went" + 0.006*"great" + 0.006*"mosiah" + 0.006*"yea"
2017-10-30 17:42:04,384 : INFO : topic #234 (0.001): 0.028*"shall" + 0.011*"lord" + 0.009*"thy" + 0.009*"unto" + 0.008*"god" + 0.007*"nephi" + 0.007*"day" + 0.006*"hath" + 0.006*"behold" + 0.006*"know"
2017-10-30 17:42:04,385 : INFO : topic #856 (0.001): 0.046*"shall" + 0.008*"president" + 0.008*"lord" + 0.007*"king" + 0.007*"day" + 0.006*"priest" + 0.005*"like" + 0.005*"chuse" + 0.005*"man" + 0.005*"let"
2017-10-30 17:42:04,386 : INFO : topic #440 (0.001): 0.007*"heaven" + 0.005*"great" + 0.004*"far" + 0.004*"time" + 0.004*"like" + 0.004*"old" + 0.004*"place" + 0.003*"feet" + 0.003*"looked" + 0.003*"deep"
2017-10-30 17:42:06,639 : INFO : topic diff=inf, rho=0.142857
2017-10-30 17:42:06,716 : INFO : PROGRESS: pass 0, dispatched chunk #13 = documents up to #28000/3443509, outstanding queue size 6
2017-10-30 17:42:08,490 : INFO : PROGRESS: pass 0, dispatched chunk #14 = documents up to #30000/3443509, outstanding queue size 6
2017-10-30 17:42:40,719 : INFO : merging changes from 4000 documents into a model of 3443509 documents
2017-10-30 17:42:53,718 : INFO : topic #466 (0.001): 0.008*"half" + 0.007*"state" + 0.007*"man" + 0.006*"came" + 0.006*"like" + 0.006*"apology" + 0.006*"lord" + 0.006*"thousand" + 0.006*"owes" + 0.005*"place"
2017-10-30 17:42:53,719 : INFO : topic #12 (0.001): 0.014*"lord" + 0.007*"unto" + 0.007*"people" + 0.007*"know" + 0.006*"day" + 0.005*"jacob" + 0.005*"time" + 0.005*"shall" + 0.005*"like" + 0.005*"god"
2017-10-30 17:42:53,721 : INFO : topic #460 (0.001): 0.029*"cook" + 0.017*"thee" + 0.013*"come" + 0.013*"blood" + 0.012*"place" + 0.012*"bid" + 0.011*"god" + 0.010*"emma" + 0.009*"shall" + 0.009*"things"
2017-10-30 17:42:53,722 : INFO : topic #6 (0.001): 0.095*"captain" + 0.038*"shall" + 0.015*"camp" + 0.012*"house" + 0.012*"pitch" + 0.012*"children" + 0.010*"man" + 0.010*"sanctuary" + 0.009*"lord" + 0.008*"unto"
2017-10-30 17:42:53,723 : INFO : topic #165 (0.001): 0.014*"shall" + 0.008*"day" + 0.008*"like" + 0.007*"man" + 0.007*"lord" + 0.006*"hath" + 0.006*"great" + 0.006*"come" + 0.005*"know" + 0.005*"way"
2017-10-30 17:42:54,143 : INFO : topic diff=inf, rho=0.111111
2017-10-30 17:42:54,228 : INFO : PROGRESS: pass 0, dispatched chunk #15 = documents up to #32000/3443509, outstanding queue size 6
2017-10-30 17:42:55,937 : INFO : PROGRESS: pass 0, dispatched chunk #16 = documents up to #34000/3443509, outstanding queue size 6
2017-10-30 17:43:27,559 : INFO : merging changes from 4000 documents into a model of 3443509 documents
2017-10-30 17:43:38,648 : INFO : topic #773 (0.001): 0.009*"god" + 0.007*"suburbs" + 0.006*"unto" + 0.006*"little" + 0.005*"lord" + 0.005*"man" + 0.005*"thy" + 0.005*"like" + 0.005*"shall" + 0.004*"know"
2017-10-30 17:43:38,649 : INFO : topic #834 (0.001): 0.018*"party" + 0.012*"man" + 0.010*"person" + 0.010*"property" + 0.009*"common" + 0.009*"point" + 0.009*"officer" + 0.008*"duty" + 0.008*"object" + 0.008*"case"
2017-10-30 17:43:38,650 : INFO : topic #124 (0.001): 0.010*"like" + 0.008*"got" + 0.007*"went" + 0.007*"mother" + 0.006*"come" + 0.006*"room" + 0.005*"mrs" + 0.005*"night" + 0.005*"long" + 0.005*"came"
2017-10-30 17:43:38,651 : INFO : topic #16 (0.001): 0.019*"king" + 0.010*"lord" + 0.010*"shall" + 0.008*"unto" + 0.008*"come" + 0.007*"man" + 0.007*"men" + 0.007*"let" + 0.006*"know" + 0.005*"thy"
2017-10-30 17:43:38,652 : INFO : topic #463 (0.001): 0.013*"lord" + 0.011*"house" + 0.009*"chest" + 0.008*"man" + 0.007*"otter" + 0.007*"went" + 0.006*"came" + 0.006*"money" + 0.005*"day" + 0.005*"shall"
2017-10-30 17:43:41,012 : INFO : topic diff=inf, rho=0.090909
Based on @ocsponge post, I tried modifying alpha and eta. I have found that in mine case the problem goes away if I set eta = 0.01 but persists if I set eta=0.001. With 2000 topics, default alpha, and eta=0.01, my topics were converging fine.
Thank you very much @TC-Rudel for additional information, now this problem can be reproduced.
Are there any updates on this issue?
@stevemarin not yet
Same here, I am getting "topic diff=inf" on the log after the second merge (running multicore).
The topic diff is 25.4 after the first merge.
What does "topic diff=inf" actually mean and what are potential causes? It would be good to understand the meaning of this better in order to come up with strategies for how to avoid this. Previous comments mentioned changing the number of topics, the eta, or maybe alpha or the number of iterations, but I do not understand how those settings are related to the topic diff? Could the vocabulary size have an influence?
What does "topic diff=inf" actually mean and what are potential causes?
This means that overflow happens somewhere (typically - division by "almost-zero" value) -> model breaks (produce inf\nans). Related issue: #2115
@johann-petrak we applied some "workaround" for this, see #2308, hope that helps
@menshikh-iv I don't think that 'workaround" will solve this problem. I've had the same problem even after my patch. I can try to explore this a bit later.
@horpto I still hope that #2308 at least reduce the number of "overflow-related" errors.
I can try to explore this a bit later.
Sounds pretty useful and nice, please go ahead when you will have time!
This issue is caused by width of dtype. First of all I have had a warning on diff = np.log(self.expElogbeta) in the second m-step: RuntimeWarning: divide by zero encountered in log. So that's why inf-s appear in output (self.expElogbeta have contained zeros). get_Elogbeta() after first blend has returned something like this:
[[ -11.186146 -13.545639 -11.4461155 ... -112.541405 -112.541405
-112.541405 ]
[ -11.8831415 -11.548369 -9.9233265 ... -112.556595 -112.556595
-112.556595 ]
[ -11.561755 -10.991329 -11.953122 ... -112.475 -112.475
-112.475 ]
...
[ -11.4945545 -11.350912 -9.209938 ... -112.61384 -112.61384
-112.61384 ]
[ -11.081068 -12.508811 -10.777531 ... -112.40563 -112.40563
-112.40563 ]
[ -11.711579 -13.1611 -13.570866 ... -112.315475 -112.315475
-112.315475 ]]
It's not obvious at a first glance (of course, all thinks that log(exp(x)) == x)) but there is a surprise:
>>> np.exp(-123)
3.817497188671175e-54
>>> np.exp(-123, dtype=np.float32)
0.0
Default dtype of LDA is np.float32. After I've changed it on np.float64 problem disappears.
@horpto do you see a way to use float64 precision only where needed (internal calculations), but keep the big parameter matrices in float32 (less RAM)?
IIRC the only reason for the float32 default was to save memory.
@piskvorky I guess, we can change diff = np.log(self.expElogbeta) line to diff = self.state.get_Elogbeta() due to invariant self.expElogbeta == np.exp(self.state.get_Elogbeta()) but it does not solve possible problem with zeros in self.expElogbeta instead of small values.
I can add PR if it's good enough suggestion and if you agree with it.
division
can u suggest what should be the value of topic_diff in general?
@horpto do you see a way to use float64 precision only where needed (internal calculations), but keep the big parameter matrices in float32 (less RAM)?
IIRC the only reason for the float32 default was to save memory.
Then is there any issue for np.float16? What happens when i changed to np.float16 because i got same thing as in np.float32.
Most helpful comment
This issue is caused by width of dtype. First of all I have had a warning on
diff = np.log(self.expElogbeta)in the second m-step:RuntimeWarning: divide by zero encountered in log. So that's whyinf-s appear in output (self.expElogbetahave contained zeros).get_Elogbeta()after first blend has returned something like this:It's not obvious at a first glance (of course, all thinks that
log(exp(x)) == x)) but there is a surprise:Default dtype of LDA is np.float32. After I've changed it on np.float64 problem disappears.