Per user report on SO, neither assignment to a bracketed-access (as would be implemented by __setitem__()) nor use of the add() method will successfully mutate a Doc2VecKeyedVectors object.
Looking closer, it seems the superclass __setItem__() passes through to superclass add(), which was only ever implemented for word-centric sets of vectors – consulting/updating properties like .vocab that only exist as empty values in Doc2VecKeyedVectors because of the currently confused inheritance created by #1777.
As an addition to the SO post, I want to add new documents to the model.
It seems this should be done with the add() method, but since this is not working I figured the following work-around out:
model = Doc2Vec.load(PATH_to_model)
# Add vector and identifier to original values
model.docvecs.vectors_docs = np.vstack([model.docvecs.vectors_docs, new_vec])
model.docvecs.index2entity.append(new_identifier)
# Test if new document is included
model.docvecs.most_similar(positive = [new_vec])
Calling the most_similar() method returns results including this new document, also after saving and loading the model. So it seems to work.
My question is whether this is a 'correct' way of working around this bug, or if I am missing something.
@ThijsKranenburg - If it works for your purposes, it's good enough! Note though you've not yet done enough to look-up the new vectors by identifier – that's also require adding entries to the model.docvecs.doctags dict. And the possible effects of such a workaround on any further training are unclear.
Most helpful comment
@ThijsKranenburg - If it works for your purposes, it's good enough! Note though you've not yet done enough to look-up the new vectors by identifier – that's also require adding entries to the
model.docvecs.doctagsdict. And the possible effects of such a workaround on any further training are unclear.