Hi, I am having a weird issue where when I pass the exact same text in the following function -
gensim.summarization.keywords(text1, ratio=0.9, pos_filter=('NP')).split("\n")
and get two different result set for exact same parameters when I run it multiple times. The output should be same for a particular text.
How is it possible that it's excluding /including few phrase extracts over a few iteration?
Below it shows the difference - ['data'] vs ['static data'] and ['dynamic'] was not fetched in the second iter run at all.
Attached a screenshot for reference. Any guidance will be appreciated.

import gensim
text1 = 'The method according to claim3, wherein the step of collecting further comprises: receiving the static data in the management data through a notification about change of the at least one cloud server being reported by a protocol agent which is configured to collect the management data from the at least one cloud server; and requesting and receiving the dynamic data in the management data from the protocol agent.'
phrase_token=gensim.summarization.keywords(text1, ratio=0.9, pos_filter=('NP')).split("\n")
phrase_token
Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.16.4
SciPy 1.2.1
gensim 3.7.3
FAST_VERSION 1
I can reproduce this issue.
@piskvorky @menshikh-iv Is summarization supposed to be deterministic?
This was a student / contributed project, unfortunately I'm not familiar with the algo or code at all.
I think given the momentum behind deprecating and eventually removing summarisation, we can sweep this one under the rug, right @piskvorky?
I'm not familiar with the module so hard to make the call.
If it's something useful to users, I'd prefer to fix it. IIRC the summarization algo was standard (blog post). But if it's one of the badly-motivated-badly-executed student projects, then yeah, let's cut it.
gensim.summarization.keywords is not deterministic due to non-determinism of eig(s) in numpy (for example https://github.com/numpy/numpy/issues/6378).
okay, so unless it is updated, using summarization can lead to different results. Are there any similar techniques, to generate summary phrases (probably only noun phrases) from long texts that I can test instead? I guess using nltk regex parser to find sub leaves labels as 'NP' words in a sentence and then joining them to get phrases can be a workaround? Appreciate all the assistance.
@JayeetaP We use github tickets for error reports only, so I think your questions are out of scope for this ticket. Could you please ask on the mailing list instead?
Most helpful comment
gensim.summarization.keywordsis not deterministic due to non-determinism ofeig(s)in numpy (for example https://github.com/numpy/numpy/issues/6378).