Gensim: CoherenceModel with coherence='c_v' restarts Python

Created on 9 Dec 2018  路  1Comment  路  Source: RaRe-Technologies/gensim

Description

CoherenceModel with coherence='c_v' crashes

on Windows when attempting to evaluate get_coherence().

Attempted to play around with freeze_support() as it seems like a forking issue, but doesn't seem to solve it.

PS. Will be gone for the week, please allow for a bit of time for me to respond to queries.

Steps/Code/Corpus to Reproduce


(https://github.com/RaRe-Technologies/gensim/files/2660747/be5c63b4eb44f9d24906fb68a2608a6a-52d47e58268f41e8ef822208610be2192f91d065.zip)

Simplified sample of code as above.

Expected Results


Value of coherence.

Actual Results


`RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.`

Versions


Windows-10-10.0.17134-SP0
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]
NumPy 1.15.4
SciPy 1.1.0
C:\Users\kokho\AppData\Local\Programs\Python\Python37\lib\site-packages\gensim\utils.py:1212: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
gensim 3.6.0
FAST_VERSION 0

Most helpful comment

@khng90 This is really multiprocessing windows issue.

To avoid that - wrap your code to main, i.e.

from gensim.models import CoherenceModel, LdaModel
from gensim.corpora.dictionary import Dictionary


def main():
    texts = [
        ['human', 'interface', 'computer'],
        ['survey', 'user', 'computer', 'system', 'response', 'time'],
        ['eps', 'user', 'interface', 'system'],
        ['system', 'human', 'system', 'eps'],
        ['user', 'response', 'time'],
        ['trees'],
        ['graph', 'trees'],
        ['graph', 'minors', 'trees'],
        ['graph', 'minors', 'survey']
    ]

    dictionary = Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]

    goodLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=50, num_topics=2)
    badLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=1, num_topics=2)

    goodcm = CoherenceModel(model=goodLdaModel, texts=texts, corpus=corpus, dictionary=dictionary, coherence='c_v')
    badcm = CoherenceModel(model=badLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')

    print(badcm.get_coherence())
    print(goodcm.get_coherence())

if __name__ == "__main__":
    main()

I checked this code works correctly on Windows, hope that helps!

>All comments

@khng90 This is really multiprocessing windows issue.

To avoid that - wrap your code to main, i.e.

from gensim.models import CoherenceModel, LdaModel
from gensim.corpora.dictionary import Dictionary


def main():
    texts = [
        ['human', 'interface', 'computer'],
        ['survey', 'user', 'computer', 'system', 'response', 'time'],
        ['eps', 'user', 'interface', 'system'],
        ['system', 'human', 'system', 'eps'],
        ['user', 'response', 'time'],
        ['trees'],
        ['graph', 'trees'],
        ['graph', 'minors', 'trees'],
        ['graph', 'minors', 'survey']
    ]

    dictionary = Dictionary(texts)
    corpus = [dictionary.doc2bow(text) for text in texts]

    goodLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=50, num_topics=2)
    badLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=1, num_topics=2)

    goodcm = CoherenceModel(model=goodLdaModel, texts=texts, corpus=corpus, dictionary=dictionary, coherence='c_v')
    badcm = CoherenceModel(model=badLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')

    print(badcm.get_coherence())
    print(goodcm.get_coherence())

if __name__ == "__main__":
    main()

I checked this code works correctly on Windows, hope that helps!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shubhvachher picture shubhvachher  路  4Comments

mmunozm picture mmunozm  路  3Comments

sairampillai picture sairampillai  路  3Comments

jeradf picture jeradf  路  4Comments

ahmedbhabbas picture ahmedbhabbas  路  4Comments