I'm having trouble creating my environment path and getting my LDAMallet model to run. The code I am using is copied below.
This same code executes with no problems on my coworker's PC. Not sure why that would be. I've seen this issue in other posts and have tried every possible combination of slashes (/ vs vs vs // ) and have redownloaded the mallet zip file to various locations on my PC. Nothing has worked. When I run the command string in the code in the command prompt it seems to execute just fine.
Any insight would be greatly appreciated.
import os
os.environ.update({'MALLET_HOME':r'C:\\new_mallet\\mallet-2.0.8/'})
mallet_path = 'C:\\new_mallet\\mallet-2.0.8\\bin\\mallet'
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path,corpus=corpus,num_topics=10,id2word=id2word)
CalledProcessError Traceback (most recent call last)
----> 1 ldamallet = gensim.models.wrappers.LdaMallet(mallet_path,corpus=corpus,num_topics=10,id2word=id2word)
C:ProgramDataAnaconda3libsite-packagesgensimmodelswrappersldamallet.py in __init__(self, mallet_path, corpus, num_topics, alpha, id2word, workers, prefix, optimize_interval, iterations, topic_threshold, random_seed)
129 self.random_seed = random_seed
130 if corpus is not None:
--> 131 self.train(corpus)
132
133 def finferencer(self):
C:ProgramDataAnaconda3libsite-packagesgensimmodelswrappersldamallet.py in train(self, corpus)
270
271 """
--> 272 self.convert_input(corpus, infer=False)
273 cmd = self.mallet_path + ' train-topics --input %s --num-topics %s --alpha %s --optimize-interval %s '
274 '--num-threads %s --output-state %s --output-doc-topics %s --output-topic-keys %s '
C:ProgramDataAnaconda3libsite-packagesgensimmodelswrappersldamallet.py in convert_input(self, corpus, infer, serialize_corpus)
259 cmd = cmd % (self.fcorpustxt(), self.fcorpusmallet())
260 logger.info("converting temporary corpus to MALLET format with %s", cmd)
--> 261 check_output(args=cmd, shell=True)
262
263 def train(self, corpus):
C:ProgramDataAnaconda3libsite-packagesgensimutils.py in check_output(stdout, popenargs, *kwargs)
1916 error = subprocess.CalledProcessError(retcode, cmd)
1917 error.output = output
-> 1918 raise error
1919 return output
1920 except KeyboardInterrupt:
CalledProcessError: Command 'C:new_malletmallet-2.0.8binmallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "S+" --input C:UsersCST~1.JEOAppDataLocalTempc24558_corpus.txt --output C:UsersCST~1.JEOAppDataLocalTempc24558_corpus.mallet' returned non-zero exit status 1.
````
Unless you're sure this is a bug, such "I'm having a problem" matters are better addressed at the project discussion list, https://groups.google.com/forum/#!forum/gensim, than this bug/feature-request tracker. That said, you may get a better error message by attempting to execute the exact command mentioned in the error...
C:\new_mallet\mallet-2.0.8\bin\mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\CST~1.JEO\AppData\Local\Temp\c24558_corpus.txt --output C:\Users\CST~1.JEO\AppData\Local\Temp\c24558_corpus.mallet
...outside of Python/gensim – to see what error it reports there.
Update to the error. I reinstalled Java and the code now executes without error.
Most helpful comment
Update to the error. I reinstalled Java and the code now executes without error.