The ability to normalize the relevance scores of documents on a per query basis would be incredibly useful. It would enable someone to meaningfully combine multiple queries in a bool query, without a single clause overtaking the others. This would be a huge improvement over boosting the individual queries.
I could see this being done via:
I personally would prefer option 2, but recognize option 3 would allow individuals to design their own normalization functions to best suit their needs.
Note:
I'm not the only one looking for this functionality. There is a StackOverflow discussion with 20+ votes on attempts (none successful) to achieve this here.
This is not something that can be done easily. This feature request makes the assumption that we are able to qualify how good a match is, which is not the case. This page gives some background about the issue https://wiki.apache.org/lucene-java/ScoresAsPercentages.
One way to work-around the issue could be to use constant_score queries, or the upcoming boolean similarity, which generate predictable scores.
But I agree with you we need to work on making it easier to combine full-text relevance with other sources of relevance. This is something that has been researched (in particular with geo) and we need to make progress on exposing better ways to combine full-text scores with other kinds of scores.
We just discussed it in FixitFriday and decided to close it in favour of a new issue: #23850.
Most helpful comment
This is not something that can be done easily. This feature request makes the assumption that we are able to qualify how good a match is, which is not the case. This page gives some background about the issue https://wiki.apache.org/lucene-java/ScoresAsPercentages.
One way to work-around the issue could be to use
constant_scorequeries, or the upcomingbooleansimilarity, which generate predictable scores.But I agree with you we need to work on making it easier to combine full-text relevance with other sources of relevance. This is something that has been researched (in particular with geo) and we need to make progress on exposing better ways to combine full-text scores with other kinds of scores.