Gensim: Word2Vec ns_exponent cannot be changed from default

Created on 6 Feb 2020  Â·  3Comments  Â·  Source: RaRe-Technologies/gensim

Problem description

I am trying to train Word2Vec and tune the ns_exponent hyperparameter. When I initialize the model, I set ns_exponent = 0.5, but find that it has reset to the default of ns_exponent = 0.75 immediately after initializing.

I looked through the Word2Vec source code for any mentions of ns_exponent, but found no reason for the class to ignore my argument. I suspected the Vocabulary initialization may have something to do with it, but that seems to take its argument straight from the __init__. Neither do I believe that I am overriding the ns_exponent setting with one of the other parameters, because this occurs even when ns_exponent is the only one explicitly set.

Steps/code/corpus to reproduce

model = Word2Vec(ns_exponent = 0.5)
print(model.ns_exponent)

The printed output is:

0.75

and the resulting model's ns_exponent attribute is set to 0.75 as well.

Versions

Windows-10-10.0.18362-SP0
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
NumPy 1.16.0
SciPy 1.1.0
gensim 3.6.0
FAST_VERSION 0
bug

Most helpful comment

Thanks for the clear report! The problem can be even more compactly demonstrated:

In [1]: from gensim.models import Word2Vec                                                        
In [2]: model = Word2Vec(ns_exponent=0.1)                                                         
In [3]: model.ns_exponent                                                                         
Out[3]: 0.75

While this is a confusing bit of model state, I'm pretty sure your intended value still took effect – it's just that it was passed into a separatemodel.vocabulary.ns_exponent property, where it was consulted to build the scaled-cumulative-proportions table being used by the model. If I continue the reproduction REPL above:

In [4]: model.vocabulary.ns_exponent                                                              
Out[4]: 0.1

The code problem is that Word2Vec.__init__() isn't including the provided non-default ns_exponent value to its call to its abstract superclass's __init__, which then caches the default value into the (redundant & not-consulted) self.ns_exponent property – while having no effect on the actually-operative self.vocabulary.ns_exponent.

There's a refactor-in-progress (#2698) that will resolve this, making the model.ns_exponent the sole operative location, but in the meantime - your requested alternate value should still be taking effect, and be visible in model.vocabulary.ns_exponent, so no explicit workaround is necessary.

All 3 comments

Thanks for the clear report! The problem can be even more compactly demonstrated:

In [1]: from gensim.models import Word2Vec                                                        
In [2]: model = Word2Vec(ns_exponent=0.1)                                                         
In [3]: model.ns_exponent                                                                         
Out[3]: 0.75

While this is a confusing bit of model state, I'm pretty sure your intended value still took effect – it's just that it was passed into a separatemodel.vocabulary.ns_exponent property, where it was consulted to build the scaled-cumulative-proportions table being used by the model. If I continue the reproduction REPL above:

In [4]: model.vocabulary.ns_exponent                                                              
Out[4]: 0.1

The code problem is that Word2Vec.__init__() isn't including the provided non-default ns_exponent value to its call to its abstract superclass's __init__, which then caches the default value into the (redundant & not-consulted) self.ns_exponent property – while having no effect on the actually-operative self.vocabulary.ns_exponent.

There's a refactor-in-progress (#2698) that will resolve this, making the model.ns_exponent the sole operative location, but in the meantime - your requested alternate value should still be taking effect, and be visible in model.vocabulary.ns_exponent, so no explicit workaround is necessary.

Thank you for the thorough and prompt response! This sets my heart at ease. I'll edit the issue as you suggested and use your fix.

Fixed by #2698.

Was this page helpful?
0 / 5 - 0 ratings