When I run the vectors_fast_text.py example code, I encounter a problem where the new vocab words are not being added to the vocabulary. I can see that the words and vectors look fine as I read them in, e.g.
...
leicht
[ 0.36014 0.18184 -0.11139 -0.12497 -0.13684 ...]
besitz
[-0.04354 0.15431 -0.13097 -0.23924001 -0.14453 ... ]
...
but after each iteration, the vocab size remains at 173, all of which seem to be pre-defined terms, maybe for the tokenizer? First few: '眉.', 'XD', ';-D', '>:o', ':-)', '<3', ' ', '8)', '\\n', 'c.', ':>'
I first ran into this problem using other code and could reproduce with the example. Is this a problem with nlp.vocab.set_vector?
Linking to @honnibal's workaround in the Prodigy forum: https://support.prodi.gy/t/working-with-languages-not-yet-supported-by-spacy/206/11
Thanks @ahalterman -- wish I'd read this a little sooner! I got back from Australia last week, and promptly got sick. Air travel...
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Thanks @ahalterman -- wish I'd read this a little sooner! I got back from Australia last week, and promptly got sick. Air travel...