CoherenceModel with coherence='c_v' crashes
on Windows when attempting to evaluate get_coherence().
Attempted to play around with freeze_support() as it seems like a forking issue, but doesn't seem to solve it.
PS. Will be gone for the week, please allow for a bit of time for me to respond to queries.
(https://github.com/RaRe-Technologies/gensim/files/2660747/be5c63b4eb44f9d24906fb68a2608a6a-52d47e58268f41e8ef822208610be2192f91d065.zip)
Simplified sample of code as above.
Value of coherence.
`RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.`
Windows-10-10.0.17134-SP0
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]
NumPy 1.15.4
SciPy 1.1.0
C:\Users\kokho\AppData\Local\Programs\Python\Python37\lib\site-packages\gensim\utils.py:1212: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
gensim 3.6.0
FAST_VERSION 0
@khng90 This is really multiprocessing windows issue.
To avoid that - wrap your code to main, i.e.
from gensim.models import CoherenceModel, LdaModel
from gensim.corpora.dictionary import Dictionary
def main():
texts = [
['human', 'interface', 'computer'],
['survey', 'user', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'system'],
['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
['graph', 'minors', 'trees'],
['graph', 'minors', 'survey']
]
dictionary = Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
goodLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=50, num_topics=2)
badLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=1, num_topics=2)
goodcm = CoherenceModel(model=goodLdaModel, texts=texts, corpus=corpus, dictionary=dictionary, coherence='c_v')
badcm = CoherenceModel(model=badLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')
print(badcm.get_coherence())
print(goodcm.get_coherence())
if __name__ == "__main__":
main()
I checked this code works correctly on Windows, hope that helps!
Most helpful comment
@khng90 This is really multiprocessing windows issue.
To avoid that - wrap your code to
main, i.e.I checked this code works correctly on Windows, hope that helps!