Gensim: Word2Vec and Doc2Vec do not update word embeddings if `negative` keyword is set to 0

Created on 17 Mar 2018 · 5Comments · Source: RaRe-Technologies/gensim

Description

Setting the negative keyword to 0 for Doc2Vec causes the training to not update word embeddings after the random initialisation.
This happens silently and is behavior I wasn't expecting.

Steps/Code/Corpus to Reproduce

from sklearn.datasets import fetch_20newsgroups
import pandas as pd
from gensim.models import Doc2Vec, Word2Vec
from gensim.models.doc2vec import TaggedDocument

df = pd.DataFrame(fetch_20newsgroups().data)
df[0] = df[0].str.split(' ')
documents = df.apply(lambda x: TaggedDocument(x[0], x.index), axis=1)

model1a = Doc2Vec(documents, negative=1, epochs=1)
model1b = Doc2Vec(documents, negative=0, epochs=1)
model2a = Doc2Vec(documents, negative=1, epochs=2)
model2b = Doc2Vec(documents, negative=0, epochs=2)

print('model1a:', model1a.wv.most_similar('test'))
print('model1b:', model1b.wv.most_similar('test'))
print('model2a:', model2a.wv.most_similar('test'))
print('model2b:', model2b.wv.most_similar('test'))

model1a = Word2Vec(df[0], negative=1, iter=1)
model1b = Word2Vec(df[0], negative=0, iter=1)
model2a = Word2Vec(df[0], negative=1, iter=2)
model2b = Word2Vec(df[0], negative=0, iter=2)

print('model1a:', model1a.most_similar('test'))
print('model1b:', model1b.most_similar('test'))
print('model2a:', model2a.most_similar('test'))
print('model2b:', model2b.most_similar('test'))

Results

As can be seen below, the results for the models that have a negative=0 show the same results after 1 or 2 epochs of training, where the models with negative=1 show different (and somewhat more sensible) results.
_Doc2Vec_:

model1a:
[('time', 0.9929366111755371),
 ('either', 0.9923557639122009),
 ('up', 0.9921339154243469),
 ('problem', 0.9915313720703125),
 ('being', 0.9915310144424438),
 ('getting', 0.991266131401062),
 ('group', 0.991013765335083),
 ('keeping', 0.9908334016799927),
 ('players', 0.9906938672065735),
 ('further', 0.9902615547180176)]

model1b:
[('518-393-7228', 0.4212157428264618),
 ('anyone?', 0.4167076647281647),
 ('it,', 0.3915032744407654),
 ('deliver', 0.3873376250267029),
 ('Books', 0.3643316328525543),
 ('stuck', 0.35024553537368774),
 ("o'clock", 0.34999915957450867),
 ('(Dostoevsky)', 0.34075409173965454),
 ('(Thyagi', 0.33959853649139404),
 ('MSDOS', 0.3370114862918854)]

model2a:
[('chip,', 0.9874706268310547),
 ('moves', 0.9828106164932251),
 ('board', 0.9789682626724243),
 ('adding', 0.978229820728302),
 ('express', 0.9764397144317627),
 ('sport', 0.9763677716255188),
 ('correctly', 0.9756811261177063),
 ('restricted', 0.9725382328033447),
 ('concern', 0.9719469547271729),
 ('user,', 0.9711147546768188)]

model2b:
[('518-393-7228', 0.4212157428264618),
 ('anyone?', 0.4167076647281647),
 ('it,', 0.3915032744407654),
 ('deliver', 0.3873376250267029),
 ('Books', 0.3643316328525543),
 ('stuck', 0.35024553537368774),
 ("o'clock", 0.34999915957450867),
 ('(Dostoevsky)', 0.34075409173965454),
 ('(Thyagi', 0.33959853649139404),
 ('MSDOS', 0.3370114862918854)]

_Word2Vec_:

model1a:
[('moral', 0.9974657297134399),
 ('obvious', 0.9970457553863525),
 ('high', 0.9967347979545593),
 ('ago,', 0.9966593384742737),
 ('food', 0.9964239001274109),
 ('case,', 0.996358335018158),
 ('in,', 0.9963352084159851),
 ('problems', 0.996278703212738),
 ('doctor', 0.9962717890739441),
 ('kept', 0.9961578845977783)]

model1b:
[('Fulk)\nSubject:', 0.43792209029197693),
 ('weak', 0.3926801085472107),
 ('Provine', 0.382274866104126),
 ('suspension,', 0.37375500798225403),
 ('[email protected]', 0.3638245761394501),
 ('negligible', 0.3633933365345001),
 ('frozen', 0.36065810918807983),
 ('notch', 0.35705092549324036),
 ('_|_|_', 0.3445291221141815),
 ('(Grant', 0.3377472162246704)]

model2a:
[('motorcycle', 0.9882928729057312),
 ('grounds', 0.9797140955924988),
 ('charge', 0.977377712726593),
 ('goes', 0.9750747084617615),
 ('trip', 0.9731242060661316),
 ('mark', 0.9729797840118408),
 ('needed', 0.9718480706214905),
 ('directly', 0.9717421531677246),
 ('group,', 0.9714720845222473),
 ('store', 0.971139669418335)]

model2b:
[('Fulk)\nSubject:', 0.43792209029197693),
 ('weak', 0.3926801085472107),
 ('Provine', 0.382274866104126),
 ('suspension,', 0.37375500798225403),
 ('[email protected]', 0.3638245761394501),
 ('negligible', 0.3633933365345001),
 ('frozen', 0.36065810918807983),
 ('notch', 0.35705092549324036),
 ('_|_|_', 0.3445291221141815),
 ('(Grant', 0.3377472162246704)]

Logs during training

Doc2Vec

model1a:
```2018-03-17 12:48:28,330 : INFO : collecting all words and their counts
2018-03-17 12:48:28,331 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-03-17 12:48:28,966 : INFO : PROGRESS: at example #10000, processed 3198312 words (5044963/s), 390776 word types, 1 tags
2018-03-17 12:48:29,049 : INFO : collected 427021 word types and 1 unique tags from a corpus of 11314 examples and 3593473 words
2018-03-17 12:48:29,050 : INFO : Loading a fresh vocabulary
2018-03-17 12:48:29,283 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:48:29,284 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:48:29,373 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:48:29,379 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:48:29,379 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:48:29,490 : INFO : estimated required memory for 40708 words and 100 dimensions: 52920800 bytes
2018-03-17 12:48:29,490 : INFO : resetting layer weights
2018-03-17 12:48:29,840 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=1 window=5
2018-03-17 12:48:30,848 : INFO : EPOCH 1 - PROGRESS: at 56.44% examples, 1165806 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:48:31,587 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:48:31,588 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:48:31,594 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:48:31,595 : INFO : EPOCH - 1 : training on 3593473 raw words (2068258 effective words) took 1.8s, 1180569 effective words/s
2018-03-17 12:48:31,595 : INFO : training on a 3593473 raw words (2068258 effective words) took 1.8s, 1178829 effective words/s

**model1b:**

2018-03-17 12:48:31,596 : INFO : collecting all words and their counts
2018-03-17 12:48:31,597 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-03-17 12:48:32,172 : INFO : PROGRESS: at example #10000, processed 3198312 words (5569233/s), 390776 word types, 1 tags
2018-03-17 12:48:32,252 : INFO : collected 427021 word types and 1 unique tags from a corpus of 11314 examples and 3593473 words
2018-03-17 12:48:32,253 : INFO : Loading a fresh vocabulary
2018-03-17 12:48:32,413 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:48:32,414 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:48:32,506 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:48:32,511 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:48:32,512 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:48:32,557 : INFO : estimated required memory for 40708 words and 100 dimensions: 36637600 bytes
2018-03-17 12:48:32,557 : INFO : resetting layer weights
2018-03-17 12:48:32,910 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=0 window=5
2018-03-17 12:48:33,913 : INFO : EPOCH 1 - PROGRESS: at 66.48% examples, 1398264 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:48:34,396 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:48:34,403 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:48:34,409 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:48:34,412 : INFO : EPOCH - 1 : training on 3593473 raw words (2067706 effective words) took 1.5s, 1378469 effective words/s
2018-03-17 12:48:34,412 : INFO : training on a 3593473 raw words (2067706 effective words) took 1.5s, 1376358 effective words/s

**model2a:**

2018-03-17 12:48:34,413 : INFO : collecting all words and their counts
2018-03-17 12:48:34,415 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-03-17 12:48:35,073 : INFO : PROGRESS: at example #10000, processed 3198312 words (4869749/s), 390776 word types, 1 tags
2018-03-17 12:48:35,152 : INFO : collected 427021 word types and 1 unique tags from a corpus of 11314 examples and 3593473 words
2018-03-17 12:48:35,153 : INFO : Loading a fresh vocabulary
2018-03-17 12:48:35,452 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:48:35,453 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:48:35,563 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:48:35,568 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:48:35,569 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:48:35,686 : INFO : estimated required memory for 40708 words and 100 dimensions: 52920800 bytes
2018-03-17 12:48:35,687 : INFO : resetting layer weights
2018-03-17 12:48:36,044 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=1 window=5
2018-03-17 12:48:37,052 : INFO : EPOCH 1 - PROGRESS: at 42.69% examples, 886813 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:48:38,052 : INFO : EPOCH 1 - PROGRESS: at 98.46% examples, 1013776 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:48:38,090 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:48:38,091 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:48:38,102 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:48:38,102 : INFO : EPOCH - 1 : training on 3593473 raw words (2067777 effective words) took 2.1s, 1006127 effective words/s
2018-03-17 12:48:39,107 : INFO : EPOCH 2 - PROGRESS: at 57.09% examples, 1186646 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:48:39,865 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:48:39,873 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:48:39,874 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:48:39,875 : INFO : EPOCH - 2 : training on 3593473 raw words (2067767 effective words) took 1.8s, 1168711 effective words/s
2018-03-17 12:48:39,875 : INFO : training on a 7186946 raw words (4135544 effective words) took 3.8s, 1079693 effective words/s

**model2b:**

2018-03-17 12:48:39,876 : INFO : collecting all words and their counts
2018-03-17 12:48:39,878 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2018-03-17 12:48:40,616 : INFO : PROGRESS: at example #10000, processed 3198312 words (4332917/s), 390776 word types, 1 tags
2018-03-17 12:48:40,697 : INFO : collected 427021 word types and 1 unique tags from a corpus of 11314 examples and 3593473 words
2018-03-17 12:48:40,698 : INFO : Loading a fresh vocabulary
2018-03-17 12:48:40,982 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:48:40,985 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:48:41,106 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:48:41,111 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:48:41,112 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:48:41,156 : INFO : estimated required memory for 40708 words and 100 dimensions: 36637600 bytes
2018-03-17 12:48:41,157 : INFO : resetting layer weights
2018-03-17 12:48:41,517 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=0 window=5
2018-03-17 12:48:42,525 : INFO : EPOCH 1 - PROGRESS: at 65.36% examples, 1362872 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:48:43,161 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:48:43,166 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:48:43,169 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:48:43,170 : INFO : EPOCH - 1 : training on 3593473 raw words (2067697 effective words) took 1.7s, 1252184 effective words/s
2018-03-17 12:48:44,173 : INFO : EPOCH 2 - PROGRESS: at 55.19% examples, 1149188 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:48:44,820 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:48:44,825 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:48:44,828 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:48:44,829 : INFO : EPOCH - 2 : training on 3593473 raw words (2067746 effective words) took 1.7s, 1248269 effective words/s
2018-03-17 12:48:44,829 : INFO : training on a 7186946 raw words (4135443 effective words) took 3.3s, 1248589 effective words/s


##### Word2Vec
**model1a**:

2018-03-17 12:54:55,113 : INFO : collecting all words and their counts
2018-03-17 12:54:55,114 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2018-03-17 12:54:55,710 : INFO : PROGRESS: at sentence #10000, processed 3198312 words, keeping 390776 word types
2018-03-17 12:54:55,789 : INFO : collected 427021 word types from a corpus of 3593473 raw words and 11314 sentences
2018-03-17 12:54:55,789 : INFO : Loading a fresh vocabulary
2018-03-17 12:54:55,943 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:54:55,943 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:54:56,036 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:54:56,041 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:54:56,042 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:54:56,154 : INFO : estimated required memory for 40708 words and 100 dimensions: 52920400 bytes
2018-03-17 12:54:56,154 : INFO : resetting layer weights
2018-03-17 12:54:56,501 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=1 window=5
2018-03-17 12:54:57,505 : INFO : EPOCH 1 - PROGRESS: at 78.98% examples, 1626661 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:54:57,768 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:54:57,769 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:54:57,772 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:54:57,773 : INFO : EPOCH - 1 : training on 3593473 raw words (2056644 effective words) took 1.3s, 1620306 effective words/s
2018-03-17 12:54:57,773 : INFO : training on a 3593473 raw words (2056644 effective words) took 1.3s, 1618042 effective words/s

**model1b**:

2018-03-17 12:54:57,780 : INFO : collecting all words and their counts
2018-03-17 12:54:57,782 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2018-03-17 12:54:58,335 : INFO : PROGRESS: at sentence #10000, processed 3198312 words, keeping 390776 word types
2018-03-17 12:54:58,412 : INFO : collected 427021 word types from a corpus of 3593473 raw words and 11314 sentences
2018-03-17 12:54:58,413 : INFO : Loading a fresh vocabulary
2018-03-17 12:54:58,703 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:54:58,703 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:54:58,791 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:54:58,796 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:54:58,797 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:54:58,841 : INFO : estimated required memory for 40708 words and 100 dimensions: 36637200 bytes
2018-03-17 12:54:58,842 : INFO : resetting layer weights
2018-03-17 12:54:59,194 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=0 window=5
2018-03-17 12:55:00,184 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:55:00,189 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:55:00,189 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:55:00,190 : INFO : EPOCH - 1 : training on 3593473 raw words (2057464 effective words) took 1.0s, 2069005 effective words/s
2018-03-17 12:55:00,190 : INFO : training on a 3593473 raw words (2057464 effective words) took 1.0s, 2064908 effective words/s

**model2a**:

2018-03-17 12:55:00,198 : INFO : collecting all words and their counts
2018-03-17 12:55:00,200 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2018-03-17 12:55:00,760 : INFO : PROGRESS: at sentence #10000, processed 3198312 words, keeping 390776 word types
2018-03-17 12:55:00,835 : INFO : collected 427021 word types from a corpus of 3593473 raw words and 11314 sentences
2018-03-17 12:55:00,836 : INFO : Loading a fresh vocabulary
2018-03-17 12:55:01,001 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:55:01,001 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:55:01,102 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:55:01,108 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:55:01,108 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:55:01,215 : INFO : estimated required memory for 40708 words and 100 dimensions: 52920400 bytes
2018-03-17 12:55:01,215 : INFO : resetting layer weights
2018-03-17 12:55:01,583 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=1 window=5
2018-03-17 12:55:02,588 : INFO : EPOCH 1 - PROGRESS: at 70.71% examples, 1471948 words/s, in_qsize 5, out_qsize 0
2018-03-17 12:55:02,957 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:55:02,958 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:55:02,960 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:55:02,961 : INFO : EPOCH - 1 : training on 3593473 raw words (2056026 effective words) took 1.4s, 1494852 effective words/s
2018-03-17 12:55:03,970 : INFO : EPOCH 2 - PROGRESS: at 78.28% examples, 1614596 words/s, in_qsize 6, out_qsize 0
2018-03-17 12:55:04,240 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:55:04,241 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:55:04,244 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:55:04,245 : INFO : EPOCH - 2 : training on 3593473 raw words (2057234 effective words) took 1.3s, 1611148 effective words/s
2018-03-17 12:55:04,245 : INFO : training on a 7186946 raw words (4113260 effective words) took 2.7s, 1545423 effective words/s

**model2b**:

2018-03-17 12:55:04,255 : INFO : collecting all words and their counts
2018-03-17 12:55:04,257 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2018-03-17 12:55:04,810 : INFO : PROGRESS: at sentence #10000, processed 3198312 words, keeping 390776 word types
2018-03-17 12:55:04,882 : INFO : collected 427021 word types from a corpus of 3593473 raw words and 11314 sentences
2018-03-17 12:55:04,882 : INFO : Loading a fresh vocabulary
2018-03-17 12:55:05,177 : INFO : min_count=5 retains 40708 unique words (9% of original 427021, drops 386313)
2018-03-17 12:55:05,177 : INFO : min_count=5 leaves 3082977 word corpus (85% of original 3593473, drops 510496)
2018-03-17 12:55:05,278 : INFO : deleting the raw counts dictionary of 427021 items
2018-03-17 12:55:05,283 : INFO : sample=0.001 downsamples 32 most-common words
2018-03-17 12:55:05,284 : INFO : downsampling leaves estimated 2056839 word corpus (66.7% of prior 3082977)
2018-03-17 12:55:05,329 : INFO : estimated required memory for 40708 words and 100 dimensions: 36637200 bytes
2018-03-17 12:55:05,329 : INFO : resetting layer weights
2018-03-17 12:55:05,674 : INFO : training model with 3 workers on 40708 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=0 window=5
2018-03-17 12:55:06,562 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:55:06,566 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:55:06,567 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:55:06,567 : INFO : EPOCH - 1 : training on 3593473 raw words (2056953 effective words) took 0.9s, 2308747 effective words/s
2018-03-17 12:55:07,450 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-03-17 12:55:07,451 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-03-17 12:55:07,458 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-03-17 12:55:07,459 : INFO : EPOCH - 2 : training on 3593473 raw words (2056387 effective words) took 0.9s, 2313723 effective words/s
2018-03-17 12:55:07,462 : INFO : training on a 7186946 raw words (4113340 effective words) took 1.8s, 2300980 effective words/s


#### Versions

Python 3.6.3 (default, Oct 3 2017, 21:45:48)
[GCC 7.2.0]
NumPy 1.14.1
SciPy 1.0.0
gensim 3.4.0
FAST_VERSION 1
```

bug difficulty easy documentation

Source

swierh

Most helpful comment

I agree it could be surprising, and there should be a warning or exception when this error is made. (The chief hint currently is the near-instantaneous training. You might get a similar fast-but-useless result if setting window=0 or size=0 or min_count or sample at some extreme values that drop all/almost-all words.)

But note there's a level at which this behavior makes logical sense: with zero negative examples with which to do negative-sampling, and with hierarchical-softmax not enabled (left at its default hs=0 value), there is no backprop-correction method specified, and thus all 'training' is necessarily a no-op. The user is getting what they (mistakenly) requested: an initialized model with no backprop-learning method configured.