In the closed issue, the last comment mentions that it would take some work to implement. I wanted to request that this part be readdressed to include additional BM25F support.
@rmuir as far as I remember we need Lucene API changes to allow this in a non-messy way. Given the current efforts on consolidating boolean query / filter etc. is this something that becomes more reasonable to support in lucene in the future?
I don't think bm25f is worth the effort.
I'm sorry, the math makes sense, but when i test these things on _real datasets_ like what users use (not ridiculously nested xml ones or other things), and you know, i test different multi-field approaches like bm25f, disjunctionmax, naive boolean query, i don't see statistically significant differences.
So I don't want to add the complexity to lucene. Nothing against the guys that did this work, but in most cases it frankly just does not matter.
thanks man, appreciated! I agree if it's not worth the complexity then lets skip it. Closing...
It would be interesting to have some numbers demonstrating why BM25F isn't necessary.
An alternative solution to varying term frequencies across fields is described in this paper
[Robertson CIKM-2004] S. Robertson, H. Zaragoza, M. Taylor: Simple BM25 Extension to Multiple Weighted Fields. In Proc. of CIKM 2004.
http://www.hugo-zaragoza.net/academic/pdf/robertson_cikm04.pdf
Most helpful comment
It would be interesting to have some numbers demonstrating why BM25F isn't necessary.
An alternative solution to varying term frequencies across fields is described in this paper
[Robertson CIKM-2004] S. Robertson, H. Zaragoza, M. Taylor: Simple BM25 Extension to Multiple Weighted Fields. In Proc. of CIKM 2004.
http://www.hugo-zaragoza.net/academic/pdf/robertson_cikm04.pdf