Motivated by the SO question: https://stackoverflow.com/questions/55768598/interpret-the-doc2vec-vectors-clusters-representation/55779049#55779049
Doc2Vec could plausibly have a function that's reverse-inference: take a doc-vector, return a (ranked) list of words most-predicted by that input vector. It'd work highly analogously to Word2Vec.predict_output_word(). Such a list of words might be useful as a sort-of summary or label for a doc-vector.
Hi @gojomo Is it yet to be done ? can I take it up?
@saraswatmks Yes, and further, you don't have to ask permission: a good PR submission will be reviewed/welcomed even without having declared your interest first. It's more important to "show working code" than "declare interest".
@gojomo Just to be sure, I have multiple implementations for this in mind:
Option 1:
Option 2:
What do you think ? Which one we should go with ? Or if there's any other way this can be done please let me know.
Re: Option 1 – It's not "most similar" words which are needed here. Rather, it's "most predicted". The logic and behavior should be highly analogous to Word2Vec.predict_output_word(), except taking a single vector rather than a list-of-words.
(Option 2 is impossible, as the model includes no record of the words which occurred in that doc-id.)
Most helpful comment
@saraswatmks Yes, and further, you don't have to ask permission: a good PR submission will be reviewed/welcomed even without having declared your interest first. It's more important to "show working code" than "declare interest".