There's a bunch of datasets and even trained models, that are suitable as gensim input.
Collect them and create and promote a page that links to these resources.
Example:
Here's another resource. I'm still looking for a doc2vec
https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models
Hi @panamantis Thanks for the link. Did you come across any pre-trained doc2vec models?
I m checking pretrained word2vec and topicmodelling models mentioned in https://github.com/ai-ku/wvec
and
http://www.pdhillon.com/code.html
Hey! I found the following pre-trained word2vec resources to be relevant as well.
https://github.com/alexandres/lexvec
http://cistern.cis.lmu.de/meta-emb/
https://github.com/icoxfog417/fastTextJapaneseTutorial
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
http://www.socher.org/index.php/Main/ImprovingWordRepresentationsViaGlobalContextAndMultipleWordPrototypes
Two pre-trained doc2vec models, one for 'English Wikipedia' and another for 'Associated Press News' , have been provided here : https://github.com/jhlau/doc2vec
More pre-trained word2vec models from @akutuzov
@tmylk the preferred link to the WebVectors service has changed:
http://ltr.uio.no/semvec/ is deprecated, the correct URL now is http://vectors.nlpl.eu/explore/embeddings/
Will be resolved in #1492, #1453
Resolved in #1705
@menshikh-iv which of the resources above (from @akutuzov , @chinmayapancholi13 , @joyjeni , @panamantis ) are already included? Any plans to include others (where relevant)? Thanks.
Most helpful comment
Here's another resource. I'm still looking for a doc2vec
https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models