I'm using the BM25 implementation and there is currently no way of defining its hyperparameters.
It would be nice to tune them according to the corpus being used.
Thank you in advance
CC @Witiko 鈥撀燾an you comment?
The summarization.bm25 module has the PARAM_K1, PARAM_B, and EPSILON global variables, which you can tune.
As far as I can tell, there is currently no way to instantiate several BM25 models with different parameters. You can work around that by tuning the parameters in parallel using process-based parallelism (every process will have separate global variables).
Oh I see
I was expecting something more like a regular initialization
BM25(corpus, k1=1.5, b=0.75, episilon=0.25)
Any reasons for not being like this?
I'm going to use the global-variable-setting for now
Thank you for the help and thank you for the quick response too
It would be useful to add k1, b, and epsilon as model parameters, which would default to the values of PARAM_K1, PARAM_B, and EPSILON for backwards compatibility. The parameters would also need to be added to the get_bm25_weights and iter_bm25_bow functions in the BM25 module. This should be a fairly straightforward change to implement.
Moreover, it would also be useful to make BM25 a subclass of TransformerABC, so that indexing produces BOW documents with adjusted weights, where the inner product with a query BOW produces a BM25 score. Alternatively, the BM25 weighting could be added to the existing TfidfModel class. This would help streamline working with document vector space models in Gensim.
@marcelo-dalmeida are you able to implement @Witiko 's suggestion and open a PR?
Sure
I may take a little while to stop and actually do it, but sure can do.
I hope to start this week, at least the first part
@marcelo-dalmeida, has there been any progress? If you have any questions, don't hesitate to ask. If you don't plan to tackle the issue, let me know and I will make a PR for this.
@Witiko, I do plan to do it
I am going to stop today to do it, after work (around 19:00 UTC-03:00)
Hi @Witiko
I`m having some trouble with the unit test part

Can you give me some guidance?
Thank you
I am sorry to say that I don't have any experience with getting unit tests working on a Windows box. However, CI services will run the unit tests for you when you submit the PR, so this is perhaps not a concern.
I summited the first part and I intend to start the second one tomorrow.
This is going to be my first contribution to an open-source project. I'm glad I can collaborate with something.
Thank you for the opportunity
I'm having some trouble with the unit test part
You should be able to run at least part of the tests. Run pip install flake8 and flake8 gensim to check your coding style. Run python -m unittest gensim/test/test_BM25.py to run a specific module with unit tests.
@Witiko Thank you for the help. I also changed to a linux setup to avoid windows setup particularities
Most helpful comment
It would be useful to add k1, b, and epsilon as model parameters, which would default to the values of PARAM_K1, PARAM_B, and EPSILON for backwards compatibility. The parameters would also need to be added to the get_bm25_weights and iter_bm25_bow functions in the BM25 module. This should be a fairly straightforward change to implement.
Moreover, it would also be useful to make BM25 a subclass of TransformerABC, so that indexing produces BOW documents with adjusted weights, where the inner product with a query BOW produces a BM25 score. Alternatively, the BM25 weighting could be added to the existing TfidfModel class. This would help streamline working with document vector space models in Gensim.