Hi ,
Thanks a lot for your fantastic tool, keep up with the good work!
I want to ask you the difference between the Google word vector library ( https://code.google.com/archive/p/word2vec/ ) and the one you use in Spacy.
Kind regards
Google's wordvec is able to generate word vectors from text. Spacy makes it easy to load these and other word vectors so that you can use them in your NLP tasks.
By default, spaCy currently loads vectors produced by the Levy and Goldberg (2014) dependency-based word2vec model but you can also load Google's word2vec or Glove vectors. Please see this blog post for more details on how to do that:
Thanks Yasser
Easiest way to load GloVe vectors is now:
import spacy
nlp = spacy.load('en', vectors='en_glove_cc_300_1m')
This will load a subset of the GloVe common crawl vectors --- it'll give you vectors for 1m words. This is a large vocabulary and you should get high coverage with this, without the crazy memory requirements of the original unpruned data.
This function isn't well documented yet, because we've only recently stabilised the API. I'll fix the blog post.
this doesn't work and throws exception:
name = 'en_glove_cc_300_1m'
def get_lang_class(name):
lang = re.split('[^a-zA-Z0-9_]', name, 1)[0]
if lang not in LANGUAGES:
raise RuntimeError('Language not supported: %s' % lang)
RuntimeError: Language not supported: en_glove_cc_300_1m
the reason is the regex should be just '_', which will work fine both for 'en' and for 'en_glove_cc_300_1m' returning the desired 'en'
However even after fixing the regex there is another exception:
name = 'en_glove_cc_300_1m', via = None
def get_package_by_name(name=None, via=None):
if name is None:
return
lang = get_lang_class(name)
try:
return sputnik.package(about.title, about.version,
name, data_path=via)
except PackageNotFoundException as e:
raise RuntimeError("Model '%s' not installed. Please run 'python -m "
"%s.download' to install latest compatible "
"model." % (name, lang.module))
RuntimeError: Model 'en_glove_cc_300_1m' not installed. Please run 'python -m >spacy.en.download' to install latest compatible model.
running "python -m spacy.en.download --force all" doesn't help
running version 0.101.0
any thoughts?
Ran into the same issue. Per @aie0's suggestion I switched lang = re.split('[^a-zA-Z0-9_]', name, 1)[0] to lang = re.split('_', name, 1)[0]. Also, I did nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors') insead of nlp = spacy.load('en', vectors='en_glove_cc_300_1m'). The extra _vectors did the trick for me.
This should all be cleaned up in 1.0 — the GloVe vectors are installed by default, and it's much easier to use different vectors.
i always get this error even after installing the 'en':
ValueError: Word vectors set to length 0. This may be because the data is not installed. If you haven't already, run
python -m spacy.en.download all
to install the data.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.