Gensim: Word2Vec keeps on training during on_batch_end call

Created on 12 Sep 2018  路  6Comments  路  Source: RaRe-Technologies/gensim

Description

Saving Word2Vec during a on_batch_end call fails because of something that looks a lot like a race condition. It looks like some internal dict within gensim is still being changed during the call to save.

Steps/Code/Corpus to Reproduce

Train W2V with a callback that looks like:

    def on_batch_end(self, model):
        current_timestamp = datetime.utcnow()
        if current_timestamp - self._last_temporary_save >= timedelta(hours=1):
            relative_path = get_output_path(
                'PartialCheckpoint', add_kwargs=True, relative=self.base_path, epoch=self.epoch, batch=self.batch)
            output_path = os.path.join(self.base_path, relative_path)

            model.save(output_path)
            self._last_temporary_save = current_timestamp

        self.batch += 1

Expected Results

Model checkpoint after every hour of training

Actual Results

While running train:

Exception in thread Thread-15:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
    callback.on_batch_end(self)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
    self._save_checkpoint(model, output_path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
    model.save(path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
    super(Word2Vec, self).save(*args, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
    super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
    compress, subname)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
    for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration

Exception in thread Thread-17:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 167, in _worker_loop
    callback.on_batch_end(self)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 60, in on_batch_end
    self._save_checkpoint(model, output_path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/meli_recsys/pipelines/machine_learning/meta_prod2vec/mp2v_train.py", line 90, in _save_checkpoint
    model.save(path)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1214, in save
    super(Word2Vec, self).save(*args, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 501, in save
    super(BaseAny2VecModel, self).save(fname_or_handle, **kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 682, in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_protocol=pickle_protocol)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 536, in _smart_save
    compress, subname)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/utils.py", line 592, in _save_specials
    for attrib, val in iteritems(self.__dict__):
RuntimeError: dictionary changed size during iteration

Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-19:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-18:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-13:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-12:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-14:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 771, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 497, in gensim.models.word2vec_inner.train_batch_sg
    cdef REAL_t *syn0 = <REAL_t *>(np.PyArray_DATA(model.wv.vectors))
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Versions

Linux-4.4.0-1062-aws-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Dec 4 2017, 14:50:18) \n[GCC 5.4.0 20160609]')
('NumPy', '1.15.1')
('SciPy', '0.19.1')
('gensim', '3.5.0')
('FAST_VERSION', 1)

bug difficulty medium

Most helpful comment

For some reason, this happened to me every time I did the training. Fact to keep in mind: I was using 14 workers.

All 6 comments

Hello @jbayardo, thanks for report, I reproduced an issue (not exact, but looks similar, race condition too)
I use gensim==3.5.0 and python2.7

from gensim.models import Word2Vec
from gensim.models.callbacks import CallbackAny2Vec
from gensim.test.utils import get_tmpfile
import gensim.downloader as api


corpus = api.load("text8")


class BatchSaver(CallbackAny2Vec):
     def __init__(self, path_prefix):
         self.path_prefix = path_prefix
         self.batch = 0

     def on_batch_end(self, model):
         output_path = get_tmpfile('{}_batch_{}.model'.format(self.path_prefix, self.batch))
         model.save(output_path)
         print("Model saved to {}".format(output_path))
         self.batch += 1


bs = BatchSaver("w2v")
model = Word2Vec(corpus, iter=5, callbacks=[bs])

Expected result

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_2.model
Model saved to /tmp/w2v_batch_3.model
...

Actual result

first variant (almost always)

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-11:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_3.model
Model saved to /tmp/w2v_batch_4.model
Model saved to /tmp/w2v_batch_5.model
Model saved to /tmp/w2v_batch_6.model
Model saved to /tmp/w2v_batch_7.model
Model saved to /tmp/w2v_batch_8.model
Model saved to /tmp/w2v_batch_9.model
Model saved to /tmp/w2v_batch_10.model
Model saved to /tmp/w2v_batch_11.model
Model saved to /tmp/w2v_batch_12.model
Model saved to /tmp/w2v_batch_13.model
Model saved to /tmp/w2v_batch_14.model
Model saved to /tmp/w2v_batch_15.model
Model saved to /tmp/w2v_batch_16.model
Model saved to /tmp/w2v_batch_17.model
Model saved to /tmp/w2v_batch_18.model
Model saved to /tmp/w2v_batch_19.model
Model saved to /tmp/w2v_batch_20.model
Model saved to /tmp/w2v_batch_21.model
Model saved to /tmp/w2v_batch_22.model
Model saved to /tmp/w2v_batch_23.model
Model saved to /tmp/w2v_batch_24.model
Model saved to /tmp/w2v_batch_25.model
Model saved to /tmp/w2v_batch_26.model
Model saved to /tmp/w2v_batch_27.model
Model saved to /tmp/w2v_batch_28.model
Model saved to /tmp/w2v_batch_29.model
Model saved to /tmp/w2v_batch_30.model
Model saved to /tmp/w2v_batch_31.model
Model saved to /tmp/w2v_batch_32.model
Model saved to /tmp/w2v_batch_33.model
Model saved to /tmp/w2v_batch_34.model
Model saved to /tmp/w2v_batch_35.model
Model saved to /tmp/w2v_batch_36.model
Model saved to /tmp/w2v_batch_37.model
Model saved to /tmp/w2v_batch_38.model
Model saved to /tmp/w2v_batch_39.model
Model saved to /tmp/w2v_batch_40.model
Model saved to /tmp/w2v_batch_41.model
Model saved to /tmp/w2v_batch_42.model
Model saved to /tmp/w2v_batch_43.model
Model saved to /tmp/w2v_batch_44.model
Model saved to /tmp/w2v_batch_45.model
Model saved to /tmp/w2v_batch_46.model
Model saved to /tmp/w2v_batch_47.model
Model saved to /tmp/w2v_batch_48.model
Model saved to /tmp/w2v_batch_49.model
Model saved to /tmp/w2v_batch_50.model
Model saved to /tmp/w2v_batch_51.model
Model saved to /tmp/w2v_batch_52.model
Model saved to /tmp/w2v_batch_53.model
Model saved to /tmp/w2v_batch_54.model
Model saved to /tmp/w2v_batch_55.model
Model saved to /tmp/w2v_batch_56.model
Model saved to /tmp/w2v_batch_57.model
Model saved to /tmp/w2v_batch_58.model
Model saved to /tmp/w2v_batch_59.model
Model saved to /tmp/w2v_batch_60.model
Model saved to /tmp/w2v_batch_61.model
Model saved to /tmp/w2v_batch_62.model
Model saved to /tmp/w2v_batch_63.model
Model saved to /tmp/w2v_batch_64.model
Model saved to /tmp/w2v_batch_65.model
Model saved to /tmp/w2v_batch_66.model
Model saved to /tmp/w2v_batch_67.model
Model saved to /tmp/w2v_batch_68.model
Model saved to /tmp/w2v_batch_69.model
Model saved to /tmp/w2v_batch_70.model
Model saved to /tmp/w2v_batch_71.model
Model saved to /tmp/w2v_batch_72.model
Model saved to /tmp/w2v_batch_73.model
Model saved to /tmp/w2v_batch_74.model
Model saved to /tmp/w2v_batch_75.model
Model saved to /tmp/w2v_batch_76.model
Model saved to /tmp/w2v_batch_77.model
Model saved to /tmp/w2v_batch_78.model
Model saved to /tmp/w2v_batch_79.model
Model saved to /tmp/w2v_batch_80.model
Model saved to /tmp/w2v_batch_81.model
Model saved to /tmp/w2v_batch_82.model
Model saved to /tmp/w2v_batch_83.model
Model saved to /tmp/w2v_batch_84.model
Model saved to /tmp/w2v_batch_85.model
Model saved to /tmp/w2v_batch_86.model
Model saved to /tmp/w2v_batch_87.model
Model saved to /tmp/w2v_batch_88.model
Model saved to /tmp/w2v_batch_89.model
Model saved to /tmp/w2v_batch_90.model
Model saved to /tmp/w2v_batch_91.model
Model saved to /tmp/w2v_batch_92.model
Model saved to /tmp/w2v_batch_93.model
Model saved to /tmp/w2v_batch_94.model
Model saved to /tmp/w2v_batch_95.model
Model saved to /tmp/w2v_batch_96.model
Model saved to /tmp/w2v_batch_97.model
Model saved to /tmp/w2v_batch_98.model
Model saved to /tmp/w2v_batch_99.model
Model saved to /tmp/w2v_batch_100.model

second variant (happend only once)

Model saved to /tmp/w2v_batch_0.model
Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_0.model
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/base_any2vec.py", line 164, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ivan/.virtualenvs/math/local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 773, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 663, in gensim.models.word2vec_inner.train_batch_cbow
    cum_table = <np.uint32_t *>(np.PyArray_DATA(model.vocabulary.cum_table))
AttributeError: 'Word2VecVocab' object has no attribute 'cum_table'

Model saved to /tmp/w2v_batch_1.model
Model saved to /tmp/w2v_batch_4.model
Fatal Python error: GC object already tracked
Aborted (core dumped)

For some reason, this happened to me every time I did the training. Fact to keep in mind: I was using 14 workers.

I tried to do something very similar to @jbayardo, and had a very similar problem.

I'm also saving on_batch_end, after one hour of training.

class BatchSaver(CallbackAny2Vec):
     def __init__(self, path_prefix, start_time):
         self.path_prefix = path_prefix
         self.last_checkpoint = start_time

     def on_batch_end(self, model):
         cur_time = time.time()
         if cur_time - self.last_checkpoint > 60 * 60:
             output_path = get_tmpfile('/localdata/gustav/backup/w2v/{}_backup.model'.format(self.path_prefix))
             model.save(output_path)
             self.last_checkpoint = cur_time

The first time the model is saved, 5 out of 6 worker threads crashes:

...
2019-03-19 15:17:48,068: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 40836 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:17:48,598: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,603: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:17:48,604: INFO: storing np array 'vectors' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.wv.vectors.npy
2019-03-19 15:17:48,615: INFO: storing np array 'syn1' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1.npy
2019-03-19 15:17:48,730: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:17:48,978: INFO: saving Word2Vec object under /localdata/gustav/backup/w2v/word2vec_large_backup.model, separately None
2019-03-19 15:20:13,438: INFO: storing np array 'syn1neg' to /localdata/gustav/backup/w2v/word2vec_large_backup.model.trainables.syn1neg.npy
2019-03-19 15:20:25,566: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:20:25,569: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 39100 words/s, in_qsize 0, out_qsize 0
2019-03-19 15:20:40,967: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:20:40,979: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 38940 words/s, in_qsize 0, out_qsize 0
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 478, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors'

2019-03-19 15:22:03,181: INFO: not storing attribute vectors_norm
2019-03-19 15:22:03,182: INFO: not storing attribute cum_table
2019-03-19 15:24:40,421: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 211, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 819, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 530, in gensim.models.word2vec_inner.train_batch_sg
  File "gensim/models/word2vec_inner.pyx", line 484, in gensim.models.word2vec_inner.init_w2v_config
AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'

2019-03-19 15:24:40,423: INFO: saved /localdata/gustav/backup/w2v/word2vec_large_backup.model
2019-03-19 15:24:40,442: INFO: EPOCH 1 - PROGRESS: at 2.34% examples, 36581 words/s, in_qsize 10, out_qsize 1
...

After this, training continues with one worker thread until the first epoch is finished, at which point the process starts waiting for the workers that were killed during the first save.

Note that in my case the 4 threads crash because of AttributeError: 'Word2VecKeyedVectors' object has no attribute 'vectors', and one from AttributeError: 'Word2VecTrainables' object has no attribute 'syn1'

I'm using gensim 3.7.1 and python 3.6.8.

on_batch_end() is a really bad place to try a whole-model save, because it's called inside every separate worker thread, and many times per training-epoch, with absolutely no coordination with other in-progress worker-threads. Race issues using that callback are to be expected.

on_epoch_end(), instead, is only called from the single manager thread, when all worker-threads have finished their work. It's a more appropriate place for something that wants to write the whole state of an unchanging model.

In fact, I'd recommend removing the on_batch_begin() and on_batch_end() callbacks from CallbackAny2Vec entirely. They arrived in the under-reviewed #1777 PR and it's hard for me to imagine a compelling use for them, occurring as they do many essentially-random times, within each worker-thread, each training epoch. (If not removed, their doc-comments should warn that they're happening in a worker thread while lots of other worker threads are mutating the model or executing other simultaneous on_batch_end() calls.)

Also, per a question on StackOverflow, I've just noticed the batch-related callbacks have never been fired for the worker_loop code paths.

So I'd again recommend removing on_batch_begin and on_batch_end from all modes & documentation, entirely, ASAP.

Thanks for following up @gojomo. I'm marking this ticket for 4.0.0, I think it fits a major release well.

I'll do another Gensim sprint again soon, to finish 4.0.0. I haven't seen any worthwhile feedback from beta users, so the plan is to just tie up any loose ends & release.

Was this page helpful?
0 / 5 - 0 ratings