Elasticsearch: include BM25F Scoring

Created on 9 Feb 2015  路  4Comments  路  Source: elastic/elasticsearch

2388 closed the issue of adding BM25 support, which is great. But based on the reports, the variant BM25F is better for documents with multiple fields (title, summary, body, etc) which is pretty common.

In the closed issue, the last comment mentions that it would take some work to implement. I wanted to request that this part be readdressed to include additional BM25F support.

won't fix

Most helpful comment

It would be interesting to have some numbers demonstrating why BM25F isn't necessary.
An alternative solution to varying term frequencies across fields is described in this paper
[Robertson CIKM-2004] S. Robertson, H. Zaragoza, M. Taylor: Simple BM25 Extension to Multiple Weighted Fields. In Proc. of CIKM 2004.

http://www.hugo-zaragoza.net/academic/pdf/robertson_cikm04.pdf

All 4 comments

@rmuir as far as I remember we need Lucene API changes to allow this in a non-messy way. Given the current efforts on consolidating boolean query / filter etc. is this something that becomes more reasonable to support in lucene in the future?

I don't think bm25f is worth the effort.

I'm sorry, the math makes sense, but when i test these things on _real datasets_ like what users use (not ridiculously nested xml ones or other things), and you know, i test different multi-field approaches like bm25f, disjunctionmax, naive boolean query, i don't see statistically significant differences.

So I don't want to add the complexity to lucene. Nothing against the guys that did this work, but in most cases it frankly just does not matter.

thanks man, appreciated! I agree if it's not worth the complexity then lets skip it. Closing...

It would be interesting to have some numbers demonstrating why BM25F isn't necessary.
An alternative solution to varying term frequencies across fields is described in this paper
[Robertson CIKM-2004] S. Robertson, H. Zaragoza, M. Taylor: Simple BM25 Extension to Multiple Weighted Fields. In Proc. of CIKM 2004.

http://www.hugo-zaragoza.net/academic/pdf/robertson_cikm04.pdf

Was this page helpful?
0 / 5 - 0 ratings