Saving Word2Vec during a on_batch_end call fails because of something that looks a lot like a race condition. It looks like some internal dict within gensim is still being changed during the call to save.
Train W2V with a callback that looks like:
def on_batch_end(self, model):
current_timestamp = datetime.utcnow()
if current_timestamp - self._last_temporary_save >= timedelta(hours=1):
relative_path = get_output_path(
'PartialCheckpoint', add_kwargs=True, relative=self.base_path, epoch=self.epoch, batch=self.batch)
output_path = os.path.join(self.base_path, relative_path)
model.save(output_path)
self._last_temporary_save = current_timestamp
self.batch += 1
Model checkpoint after every hour of training
While running train:
Exception in thread Thread-15:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
callback.on_batch_end(self)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
self._save_checkpoint(model, output_path)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
model.save(path)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
super(Word2Vec, self).save(*args, **kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
compress, subname)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration
Exception in thread Thread-17:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
callback.on_batch_end(self)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
self._save_checkpoint(model, output_path)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
model.save(path)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
super(Word2Vec, self).save(*args, **kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
compress, subname)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration
Exception in thread Thread-10:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-19:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-18:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-11:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-16:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-13:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-12:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-14:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Linux-4.4.0-1062-aws-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Dec 4 2017, 14:50:18) \n[GCC 5.4.0 20160609]')
('NumPy', '1.15.1')
('SciPy', '0.19.1')
('gensim', '3.5.0')
('FAST_VERSION', 1)
Hello @jbayardo, thanks for report, I reproduced an issue (not exact, but looks similar, race condition too)
I use gensim==3.5.0 and python2.7
from gensim.models import Word2Vec
from gensim.models.callbacks import CallbackAny2Vec
from gensim.test.utils import get_tmpfile
import gensim.downloader as api
corpus = api.load("text8")
class BatchSaver(CallbackAny2Vec):
def __init__(self, path_prefix):
self.path_prefix = path_prefix
self.batch = 0
def on_batch_end(self, model):
output_path = get_tmpfile('{}_batch_{}.model'.format(self.path_prefix, self.batch))
model.save(output_path)
print("Model saved to {}".format(output_path))
self.batch += 1
bs = BatchSaver("w2v")
model = Word2Vec(corpus, iter=5, callbacks=[bs])
Expected result
Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_2.model
Model saved to /tmp/w2v_batch_3.model
...
Actual result
first variant (almost always)
Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'
Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-11:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'
Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_3.model
Model saved to /tmp/w2v_batch_4.model
Model saved to /tmp/w2v_batch_5.model
Model saved to /tmp/w2v_batch_6.model
Model saved to /tmp/w2v_batch_7.model
Model saved to /tmp/w2v_batch_8.model
Model saved to /tmp/w2v_batch_9.model
Model saved to /tmp/w2v_batch_10.model
Model saved to /tmp/w2v_batch_11.model
Model saved to /tmp/w2v_batch_12.model
Model saved to /tmp/w2v_batch_13.model
Model saved to /tmp/w2v_batch_14.model
Model saved to /tmp/w2v_batch_15.model
Model saved to /tmp/w2v_batch_16.model
Model saved to /tmp/w2v_batch_17.model
Model saved to /tmp/w2v_batch_18.model
Model saved to /tmp/w2v_batch_19.model
Model saved to /tmp/w2v_batch_20.model
Model saved to /tmp/w2v_batch_21.model
Model saved to /tmp/w2v_batch_22.model
Model saved to /tmp/w2v_batch_23.model
Model saved to /tmp/w2v_batch_24.model
Model saved to /tmp/w2v_batch_25.model
Model saved to /tmp/w2v_batch_26.model
Model saved to /tmp/w2v_batch_27.model
Model saved to /tmp/w2v_batch_28.model
Model saved to /tmp/w2v_batch_29.model
Model saved to /tmp/w2v_batch_30.model
Model saved to /tmp/w2v_batch_31.model
Model saved to /tmp/w2v_batch_32.model
Model saved to /tmp/w2v_batch_33.model
Model saved to /tmp/w2v_batch_34.model
Model saved to /tmp/w2v_batch_35.model
Model saved to /tmp/w2v_batch_36.model
Model saved to /tmp/w2v_batch_37.model
Model saved to /tmp/w2v_batch_38.model
Model saved to /tmp/w2v_batch_39.model
Model saved to /tmp/w2v_batch_40.model
Model saved to /tmp/w2v_batch_41.model
Model saved to /tmp/w2v_batch_42.model
Model saved to /tmp/w2v_batch_43.model
Model saved to /tmp/w2v_batch_44.model
Model saved to /tmp/w2v_batch_45.model
Model saved to /tmp/w2v_batch_46.model
Model saved to /tmp/w2v_batch_47.model
Model saved to /tmp/w2v_batch_48.model
Model saved to /tmp/w2v_batch_49.model
Model saved to /tmp/w2v_batch_50.model
Model saved to /tmp/w2v_batch_51.model
Model saved to /tmp/w2v_batch_52.model
Model saved to /tmp/w2v_batch_53.model
Model saved to /tmp/w2v_batch_54.model
Model saved to /tmp/w2v_batch_55.model
Model saved to /tmp/w2v_batch_56.model
Model saved to /tmp/w2v_batch_57.model
Model saved to /tmp/w2v_batch_58.model
Model saved to /tmp/w2v_batch_59.model
Model saved to /tmp/w2v_batch_60.model
Model saved to /tmp/w2v_batch_61.model
Model saved to /tmp/w2v_batch_62.model
Model saved to /tmp/w2v_batch_63.model
Model saved to /tmp/w2v_batch_64.model
Model saved to /tmp/w2v_batch_65.model
Model saved to /tmp/w2v_batch_66.model
Model saved to /tmp/w2v_batch_67.model
Model saved to /tmp/w2v_batch_68.model
Model saved to /tmp/w2v_batch_69.model
Model saved to /tmp/w2v_batch_70.model
Model saved to /tmp/w2v_batch_71.model
Model saved to /tmp/w2v_batch_72.model
Model saved to /tmp/w2v_batch_73.model
Model saved to /tmp/w2v_batch_74.model
Model saved to /tmp/w2v_batch_75.model
Model saved to /tmp/w2v_batch_76.model
Model saved to /tmp/w2v_batch_77.model
Model saved to /tmp/w2v_batch_78.model
Model saved to /tmp/w2v_batch_79.model
Model saved to /tmp/w2v_batch_80.model
Model saved to /tmp/w2v_batch_81.model
Model saved to /tmp/w2v_batch_82.model
Model saved to /tmp/w2v_batch_83.model
Model saved to /tmp/w2v_batch_84.model
Model saved to /tmp/w2v_batch_85.model
Model saved to /tmp/w2v_batch_86.model
Model saved to /tmp/w2v_batch_87.model
Model saved to /tmp/w2v_batch_88.model
Model saved to /tmp/w2v_batch_89.model
Model saved to /tmp/w2v_batch_90.model
Model saved to /tmp/w2v_batch_91.model
Model saved to /tmp/w2v_batch_92.model
Model saved to /tmp/w2v_batch_93.model
Model saved to /tmp/w2v_batch_94.model
Model saved to /tmp/w2v_batch_95.model
Model saved to /tmp/w2v_batch_96.model
Model saved to /tmp/w2v_batch_97.model
Model saved to /tmp/w2v_batch_98.model
Model saved to /tmp/w2v_batch_99.model
Model saved to /tmp/w2v_batch_100.model
second variant (happend only once)
Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'
Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'
Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_4.model
Fatal Python error: GC object already tracked
Aborted (core dumped)
For some reason, this happened to me every time I did the training. Fact to keep in mind: I was using 14 workers.
I tried to do something very similar to @jbayardo, and had a very similar problem.
I'm also saving on_batch_end, after one hour of training.
class BatchSaver(CallbackAny2Vec):
def __init__(self, path_prefix, start_time):
self.path_prefix = path_prefix
self.last_checkpoint = start_time
def on_batch_end(self, model):
cur_time = time.time()
if cur_time - self.last_checkpoint > 60 * 60:
output_path = get_tmpfile('/localdata/gustav/backup/w2v/{}_backup.model'.format(self.path_prefix))
model.save(output_path)
self.last_checkpoint = cur_time
The first time the model is saved, 5 out of 6 worker threads crashes:
...
2019-03-19 15:17:48,068: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 40836 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:17:48,598: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,603: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,604: INFO: storing np array 'vectors' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.wv.vectors.npy
2019-03-19 15:17:48,615: INFO: storing np array 'syn1' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1.npy
2019-03-19 15:17:48,730: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
2019-03-19 15:17:48,978: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:20:13,438: INFO: storing np array 'syn1neg' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1neg.npy
2019-03-19 15:20:25,566: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
2019-03-19 15:20:25,569: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 39100 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:20:40,967: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:20:40,979: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 38940 words/s, in_qsize 0, out_qsize 0
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'
2019-03-19 15:22:03,181: INFO: not storing attribute vectors_norm
2019-03-19 15:22:03,182: INFO: not storing attribute cum_table
2019-03-19 15:24:40,421: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
File "gensim/models/word2vec_inner.pyx", line 484, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'
2019-03-19 15:24:40,423: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:24:40,442: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 36581 words/s, in_qsize 10, out_qsize 1
...
After this, training continues with one worker thread until the first epoch is finished, at which point the process starts waiting for the workers that were killed during the first save.
Note that in my case the 4 threads crash because of AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors', and one from AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'
I'm using gensim 3.7.1 and python 3.6.8.
on_batch_end() is a really bad place to try a whole-model save, because it's called inside every separate worker thread, and many times per training-epoch, with absolutely no coordination with other in-progress worker-threads. Race issues using that callback are to be expected.
on_epoch_end(), instead, is only called from the single manager thread, when all worker-threads have finished their work. It's a more appropriate place for something that wants to write the whole state of an unchanging model.
In fact, I'd recommend removing the on_batch_begin() and on_batch_end() callbacks from CallbackAny2Vec entirely. They arrived in the under-reviewed #1777 PR and it's hard for me to imagine a compelling use for them, occurring as they do many essentially-random times, within each worker-thread, each training epoch. (If not removed, their doc-comments should warn that they're happening in a worker thread while lots of other worker threads are mutating the model or executing other simultaneous on_batch_end() calls.)
Also, per a question on StackOverflow, I've just noticed the batch-related callbacks have never been fired for the worker_loop code paths.
So I'd again recommend removing on_batch_begin and on_batch_end from all modes & documentation, entirely, ASAP.
Thanks for following up @gojomo. I'm marking this ticket for 4.0.0, I think it fits a major release well.
I'll do another Gensim sprint again soon, to finish 4.0.0. I haven't seen any worthwhile feedback from beta users, so the plan is to just tie up any loose ends & release.
Most helpful comment
For some reason, this happened to me every time I did the training. Fact to keep in mind: I was using 14 workers.