Gensim: Online LDA with infinite vocabulary

Created on 23 Jun 2014  路  5Comments  路  Source: RaRe-Technologies/gensim

Potentially quite a biggie this one and I'm fully expecting a "patches welcome" response, but: when doing true online learning over document streams it's quite nice not to have to fix the vocabulary upfront. Also nice if you want to model the long tail of vocabulary to have a model whose update steps aren't linear in the vocabulary size.

There's a recent paper Online Latent Dirichlet Allocation with In铿乶ite Vocabulary which extends the online variational inference approach from gensim's LdaModel to work in this setting, and could be a good starting point.

difficulty hard feature wishlist

Most helpful comment

There's also some Python code here: https://github.com/kzhai/InfVocLDA

All 5 comments

There's also some Python code here: https://github.com/kzhai/InfVocLDA

Nice. I am actually working on some research that definitely needs this, and was just about to do a literature search on the topic (this afternoon, too!). I would definitely be interested in working on a branch for bringing this to Gensim.

How to do it on single machine?

Potentially quite a biggie this one and I'm fully expecting a "patches welcome" response, but: when doing true online learning over document streams it's quite nice not to have to fix the vocabulary upfront. Also nice if you want to model the long tail of vocabulary to have a model whose update steps aren't linear in the vocabulary size.

There's a recent paper Online Latent Dirichlet Allocation with In铿乶ite Vocabulary which extends the online variational inference approach from gensim's LdaModel to work in this setting, and could be a good starting point.

Memory out of memory issue if i have huge vocab with existing LDA in gensim. is this resolve that issue?

@gauravkoradiya no, not related. Please stop hijacking unrelated issues. If you have some question, articulate it properly and use the mailing list.

Was this page helpful?
0 / 5 - 0 ratings