Currently we can use a bool query to combine the result of different queriers as sum or dismax which does max. But sometimes people might want to combine the results of several queries in different ways, for example: http://stackoverflow.com/questions/31755642/how-can-i-multiply-the-score-of-two-queries-together-in-elasticsearch
function_score
could be changed to enable multiply, min, and average.
Currently function_score
can only make use of the score of one query and then combine that with some functions. But that could be changed by replacing the filter for each function with a query and then give access to the individual scores of these queries inside the functions
Something like this (not even trying to name stuff...):
POST _search
{
"query": {
"function_score": {
"query": {
// here be a query or not, resulting score can be used later via _score
},
"functions": [
{
"query": {
// another fancy query, maybe even another function_score?, resulting score can be used via _xyz_score
},
"boost_mode": "multiply", // need this here now too because we need to know how to combine the function with the _xyz_score
"script_score": {
"script": "_xyz_score * _score"
}
},
{
"query": {
// even more query!, resulting score can be used via _xyz_score
},
"boost_mode": "sum", // need this here now too because we need to know how to combine the function with the _xyz_score
"script_score": {
"script": "_xyz_score * doc['b'].value"
}
},
...
],
"score_mode": "multiply"
}
}
}
I think there was an issue about that already somewhere but I cannot find it.
This might relate to https://github.com/elastic/elasticsearch/issues/10049 because people could then make arbitrary complicated combinations using scripts and nesting function score queries. It would be crude though.
Just discussed this in fixit friday and now we think it should be differently structured, more like in #10049:
Each function produces a variable which can be named with some parameter (var_name?).
We add an additional option scrore_mode: script
that has the results of the functions as variables. The final score is then the result of the script.
In addition, we need a different function query_function
which returns the result of a query. We thought that the above approach (make the filters we have now together with functions score) would be confusing and convolute stuff too much.
Something like:
POST _search
{
"query": {
"function_score": {
"query": {
// same as before, score will be accessible via _score
},
"functions": [
{
"query_function": {
"query": {
// here be any query, can also be function_score
},
"var_name": "score_a"
}
},
{
"random_score": {
"var_name": "score_a",
...
}
},
...
],
"score_mode": "script",
"combine_script": "score_a * score_b + _score"
}
}
}
Cool!
I'm the OP of #10049 and #17820 - both seem to be satisfied by the proposed solution, so looking forward to this implementation.
I guess best would be to split this in two: 1. implement query function and 2. implement custom combine. I'll start working on this unless anyone else calls dibs.
@JnBrymn-EB and I discussed a little about the combine script parts and we thought that we should probably change the above syntax. The variable name per function could be on the same level as the filter, weight and function instead of being a parameter inside the function definition because each function score can be assigned to a variable just like every function can have a weight or a filter. Also, the script should probably follow the same script syntax we have elsewhere. The query would then look like this:
POST _search
{
"query": {
"function_score": {
"query": {
// same as before, score will be accessible via _score
},
"functions": [
{
"query_function": {
"query": {
// here be any query, can also be function_score
}
},
"var_name": "score_a",
"filter": {
// some filter
}
},
{
"random_score": {
...
},
"var_name": "score_a",
"weight": 3.33
},
...
],
"score_mode": "script",
"combine_script": {
"lang": "groovy",
"inline": "score_a * score_b + _score"
}
}
}
}
I'd suggest changing query_function
to query_score
, and combine_script
to score_script
. otherwise looks great!
I'm building the combine part as we speak. Should we go with var_name
as stated above or should we use _name
as I've seen in other places?
We settled for var_name
.
In addition, another question came up: A function might be associated with a filter that does not match. What value do we assign to the variable in this case? I have the feeling we need a default value here. Something like:
...
"functions": [
{
"script_variable": {
"name": "score_a",
"default": 123
},
"filter": {
// some filter
},
"field_value_factor": {...}
}
....
Could we add a missing
field here just with the field_value_factor and make it default to 0 for the sake of a score_script
? We'd have to be careful not to affect existing functionality like score_mode=avg
which just assumes that the value doesn't exist. -- It might be a bit misleading.
Maybe another take would be adding a default_vals
key to the combine_script
that would enumerate the value of each clause that might be missing.
I'd go with missing
, and in fact we should probably apply this to all functions (this has come up before). I'm wondering if the change to score_mode:avg
is a problem?
Just to be clear: I meant to add a default if the "filter"
doesn't not match. In case the field is missing it would still be up to the function to decide what to do.
I'll explain in more detail what I mean.
We have two cases:
In the first case, we have three functions that have to deal with it: field_value_factor
(takes a missing
parameter and if the value is missing uses that instead of an actual value), decay_function
(assumes the value is perfectly at the origin
, which has greatly annoyed many users and might change, see https://github.com/elastic/elasticsearch/issues/18892) and script_score
where everyone has to adjust the script to deal with it.
In the second case currently function_score
acts for this document as if the function would not exist at all.
I was only talking about 2., filter not matching.
We could add a score_missing
or default
parameter that would do the following: If the filter for a function does not match then we always return this value.
This would have also the advantage that it would allow everyone to control not only input to individual functions in case field is missing (with the missing
parameter) but also to control the output like so:
"function_score": {
"functions": [
{
"filter": {
"exists": {
"field": "age"
}
},
"field_value_factor": {
"field": "age",
"modifier": "ln"
},
"score_missing": 5
}
]
}
Also, it would allow people to control what score_mode: avg
means in case a filter is not matching, which is awkward right now.
For example in this case:
"function_score": {
"score_mode": "avg",
"functions": [
{
"filter": {
"term": {
"skill": "codes_java"
}
},
"weight": 5,
"score_missing": 0
},
{
"filter": {
"term": {
"skill": "speaks_human"
}
},
"weight": 2,
"score_missing": 0
}
]
}
in case the term codes_java
is not in field skills
, the score would be computed as (0+2)/(5+2)
instead of just 2/2
which is the default right now and might not be desirable.
For the script_combine
we should then enforce that this parameter exists if a function is associated with a filter.
I would not call it missing
because I at least might mix that up with the missing
in case the field does not exist in the doc.
This makes sense to me. What about calling it no_match_score
or default_score
? I think I prefer the former because it is more explicit.
Any timeline for this feature ? when is it going to be released?
@mckinnovations https://github.com/elastic/elasticsearch/pull/19710
Have query_score
or query_function
keywords been added?
I am trying to compute max scores for docs from two queries, one calculating field_value_factor
for updated
field from a child doc and other is a field_value_factor
for parent's updated
value. So doc score I need is max(child.updated, doc.updated)
. I see no way to tell elasticsearch to return such max updated
currently.
I think you can use dis_max query of 2 function_score queries
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html
But the question about timeline for query_score is really important.
Guys, do you plan to implement this feature?
This is becoming more important for upcoming work at Eventbrite.
We've been rethinking this approach. Apparently, according to research, the best way to combine scores is to add them together (which the bool query does, now that coordination and query norm are gone).
So we're looking at better ways of exposing primitives for incorporating non-textual scores into the overall score.
Closing in favour of https://github.com/elastic/elasticsearch/issues/23850
"coordination and query norm are gone" - do you have any documentation on that @clintongormley ?
@JnBrymn-EB they've been removed in Lucene 7 https://issues.apache.org/jira/browse/LUCENE-7347
query coordination was a hack to make TF/IDF work better in the face of poor TF saturation, and query norm (i believe) was essentially a failed experiment to try to make the scores from different queries comparable.
with those removed, the bool query now just does a simple sum, and boosting clauses is a much simpler calculation than before.
Fascinating! I'll have to soak this in.
For any one else struggling with the absence of a way to multiply scores directly, note that it is possible to take logarithms using function_score
/script_score
or using a modifier
field. The addition of logarithms is equivalent to multiplication for scoring.
@marcusklaas, I understand the math of logarithms( score=A * B * C
sorts the same as score=log(A * B * C)
and log(A * B * C) = log(A) + log(B) + log(C))
) but I'm unclear how this helps. Do you have an example? For instance - if I want to multiply field values 3 fields together, then I would just use score_mode=multiply
. But if I wanted to make an interesting combination of field values like A*B + C
then the logarithm trick doesn't help me because that isn't a bunch of products.
And if you want to get to arbitrary polynomials of the fields and if you want to incorporate the text score in the mix, then all the more what do I do?
@JnBrymn-EB
I agree. The Math.log method doesn't help in the case you presented. It only helps when you want to just _multiply_ scores from several different queries.
For reference for future readers, the example we used in our company:
{
"query": {
"bool": {
"must": {
[
{
"function_score": {
"query": someQueryA,
"script_score": {
"source": "Math.log(_score)"
}
}
},
{
"function_score": {
"query": someQueryB,
"script_score": {
"source": "Math.log(_score)"
}
}
}
]
}
}
}
}
This is a neat example @PeledYuval . I really hadn't through through how I'd implement text score multiplication. I might use this at some point. But I think you'll agree that it's awkward and inflexible. (Can't do A*B+C
.)
Now the thing I would really like to see is with the introduction of script score query, it would be spectacular if I could use the _name
d query clause functionality to refer to the scores for those clauses inside of a script score query and combine them as I please.
Something like:
{
"query" : {
"script_score" : {
"query" : {
"bool" : {
"should": [
{"match": { "message": "elasticsearch" }, "_name": "A"}
{"match": { "message.trigrammed": "elasticsearch" }, "_name": "B"}
]
}
},
"script" : {
"source" : "_subscore['A'] + _subscore['B'] + 0.1*_subscore['A']*_subscore['B']"
}
}
}
}
Most helpful comment
This is a neat example @PeledYuval . I really hadn't through through how I'd implement text score multiplication. I might use this at some point. But I think you'll agree that it's awkward and inflexible. (Can't do
A*B+C
.)Now the thing I would really like to see is with the introduction of script score query, it would be spectacular if I could use the
_name
d query clause functionality to refer to the scores for those clauses inside of a script score query and combine them as I please.Something like: