Gensim: Word2Vec ns_exponent cannot be changed from default

Created on 6 Feb 2020 · 3Comments · Source: RaRe-Technologies/gensim

Problem description

I am trying to train Word2Vec and tune the ns_exponent hyperparameter. When I initialize the model, I set ns_exponent = 0.5, but find that it has reset to the default of ns_exponent = 0.75 immediately after initializing.

I looked through the Word2Vec source code for any mentions of ns_exponent, but found no reason for the class to ignore my argument. I suspected the Vocabulary initialization may have something to do with it, but that seems to take its argument straight from the __init__. Neither do I believe that I am overriding the ns_exponent setting with one of the other parameters, because this occurs even when ns_exponent is the only one explicitly set.

Steps/code/corpus to reproduce

model = Word2Vec(ns_exponent = 0.5)
print(model.ns_exponent)

The printed output is:

0.75

and the resulting model's ns_exponent attribute is set to 0.75 as well.

Versions

Windows-10-10.0.18362-SP0
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
NumPy 1.16.0
SciPy 1.1.0
gensim 3.6.0
FAST_VERSION 0

bug

Source

coopwilliams

Most helpful comment

Thanks for the clear report! The problem can be even more compactly demonstrated:

In [1]: from gensim.models import Word2Vec                                                        
In [2]: model = Word2Vec(ns_exponent=0.1)                                                         
In [3]: model.ns_exponent                                                                         
Out[3]: 0.75

While this is a confusing bit of model state, I'm pretty sure your intended value still took effect – it's just that it was passed into a separatemodel.vocabulary.ns_exponent property, where it was consulted to build the scaled-cumulative-proportions table being used by the model. If I continue the reproduction REPL above:

In [4]: model.vocabulary.ns_exponent                                                              
Out[4]: 0.1

The code problem is that Word2Vec.__init__() isn't including the provided non-default ns_exponent value to its call to its abstract superclass's __init__, which then caches the default value into the (redundant & not-consulted) self.ns_exponent property – while having no effect on the actually-operative self.vocabulary.ns_exponent.

There's a refactor-in-progress (#2698) that will resolve this, making the model.ns_exponent the sole operative location, but in the meantime - your requested alternate value should still be taking effect, and be visible in model.vocabulary.ns_exponent, so no explicit workaround is necessary.

gojomo on 6 Feb 2020

👍2

All 3 comments

Thanks for the clear report! The problem can be even more compactly demonstrated:

In [1]: from gensim.models import Word2Vec                                                        
In [2]: model = Word2Vec(ns_exponent=0.1)                                                         
In [3]: model.ns_exponent                                                                         
Out[3]: 0.75

In [4]: model.vocabulary.ns_exponent                                                              
Out[4]: 0.1

gojomo on 6 Feb 2020

👍2

Thank you for the thorough and prompt response! This sets my heart at ease. I'll edit the issue as you suggested and use your fix.

coopwilliams on 6 Feb 2020

Fixed by #2698.

gojomo on 14 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

ldamodel does not accept csc matrix

simonm3 · 3Comments

KeyError: "word 'གུ་རུ་' not in vocabulary"

mikkokotila · 3Comments

gensim.similarities.SparseMatrixSimilarity get segmentation-fault

dancinghui · 4Comments

subtle errors in gradient descent update step in word2vec.py

Jianqiang · 3Comments

Using pre-trained word2vec models in doc2vec

bgokden · 3Comments