Gensim: Link to common datasets

Created on 18 Jun 2016 · 9Comments · Source: RaRe-Technologies/gensim

There's a bunch of datasets and even trained models, that are suitable as gensim input.

Collect them and create and promote a page that links to these resources.

Example:

GloVe vectors by NLP stanford: http://nlp.stanford.edu/data/glove.840B.300d.zip
various LSI/LDA/word2vec models trained on Wikipedia (I think I saw English, German, Spanish)

difficulty easy documentation wishlist

Source

piskvorky

Most helpful comment

Here's another resource. I'm still looking for a doc2vec
https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models

panamantis on 30 Jun 2016

👍2

All 9 comments

Here's another resource. I'm still looking for a doc2vec
https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models

panamantis on 30 Jun 2016

👍2

Hi @panamantis Thanks for the link. Did you come across any pre-trained doc2vec models?

tmylk on 6 Oct 2016

I m checking pretrained word2vec and topicmodelling models mentioned in https://github.com/ai-ku/wvec
and
http://www.pdhillon.com/code.html

joyjeni on 16 Oct 2016

Hey! I found the following pre-trained word2vec resources to be relevant as well.
https://github.com/alexandres/lexvec
http://cistern.cis.lmu.de/meta-emb/
https://github.com/icoxfog417/fastTextJapaneseTutorial
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes

Two pre-trained doc2vec models, one for 'English Wikipedia' and another for 'Associated Press News' , have been provided here : https://github.com/jhlau/doc2vec

chinmayapancholi13 on 8 Mar 2017

👍1

More pre-trained word2vec models from @akutuzov

http://ltr.uio.no/semvec/en/about#models

tmylk on 9 Mar 2017

@tmylk the preferred link to the WebVectors service has changed:
http://ltr.uio.no/semvec/ is deprecated, the correct URL now is http://vectors.nlpl.eu/explore/embeddings/

akutuzov on 15 Mar 2017

Will be resolved in #1492, #1453

menshikh-iv on 2 Oct 2017

Resolved in #1705

menshikh-iv on 14 Nov 2017

@menshikh-iv which of the resources above (from @akutuzov , @chinmayapancholi13 , @joyjeni , @panamantis ) are already included? Any plans to include others (where relevant)? Thanks.

piskvorky on 14 Nov 2017

Was this page helpful?

0 / 5 - 0 ratings