Elasticsearch: Add option to combine several query scores with multiply or other

Created on 15 Mar 2016 · 28Comments · Source: elastic/elasticsearch

Currently we can use a bool query to combine the result of different queriers as sum or dismax which does max. But sometimes people might want to combine the results of several queries in different ways, for example: http://stackoverflow.com/questions/31755642/how-can-i-multiply-the-score-of-two-queries-together-in-elasticsearch

function_score could be changed to enable multiply, min, and average.
Currently function_score can only make use of the score of one query and then combine that with some functions. But that could be changed by replacing the filter for each function with a query and then give access to the individual scores of these queries inside the functions

Something like this (not even trying to name stuff...):

POST _search
{
  "query": {
    "function_score": {
      "query": {
        // here be a query or not, resulting score can be used later via _score
      },
      "functions": [
        {
          "query": {
            // another fancy query, maybe even another function_score?, resulting score can be used via _xyz_score
          }, 
          "boost_mode": "multiply", // need this here now too because we need to know how to combine the function with the _xyz_score
          "script_score": {
            "script": "_xyz_score * _score"
          }
        },
        {
          "query": {
            // even more query!, resulting score can be used via _xyz_score
          }, 
          "boost_mode": "sum", // need this here now too because we need to know how to combine the function with the _xyz_score
          "script_score": {
            "script": "_xyz_score * doc['b'].value"
          }
        },
        ...
      ],
      "score_mode": "multiply"
    }
  }
}

I think there was an issue about that already somewhere but I cannot find it.
This might relate to https://github.com/elastic/elasticsearch/issues/10049 because people could then make arbitrary complicated combinations using scripts and nesting function score queries. It would be crude though.

:SearcSearch >feature discuss help wanted

Source

brwe

👍11

Most helpful comment

This is a neat example @PeledYuval . I really hadn't through through how I'd implement text score multiplication. I might use this at some point. But I think you'll agree that it's awkward and inflexible. (Can't do A*B+C.)

Now the thing I would really like to see is with the introduction of script score query, it would be spectacular if I could use the _named query clause functionality to refer to the scores for those clauses inside of a script score query and combine them as I please.

Something like:

{
    "query" : {
        "script_score" : {
            "query" : {
                "bool" : {
                    "should": [
                        {"match": { "message": "elasticsearch" }, "_name": "A"}
                        {"match": { "message.trigrammed": "elasticsearch" }, "_name": "B"}
                    ]
                }
            },
            "script" : {
                "source" : "_subscore['A'] + _subscore['B'] + 0.1*_subscore['A']*_subscore['B']"
            }
        }
     }
}

JnBrymn on 30 Aug 2019

❤7

All 28 comments

Just discussed this in fixit friday and now we think it should be differently structured, more like in #10049:

Each function produces a variable which can be named with some parameter (var_name?).

We add an additional option scrore_mode: script that has the results of the functions as variables. The final score is then the result of the script.

In addition, we need a different function query_function which returns the result of a query. We thought that the above approach (make the filters we have now together with functions score) would be confusing and convolute stuff too much.

Something like:

POST _search
{
  "query": {
    "function_score": {
      "query": {
        // same as before, score will be accessible via _score
      },
      "functions": [
        {
          "query_function": {
            "query": {
              // here be any query, can also be function_score
            },
            "var_name": "score_a"
          }
        },
        {
          "random_score": {
            "var_name": "score_a",
            ...
          }
        },
        ...
      ],
      "score_mode": "script",
      "combine_script": "score_a * score_b + _score"
    }
  }
}

brwe on 18 Mar 2016

❤2

Cool!

babadofar on 27 Mar 2016

I'm the OP of #10049 and #17820 - both seem to be satisfied by the proposed solution, so looking forward to this implementation.

synhershko on 24 Apr 2016

I guess best would be to split this in two: 1. implement query function and 2. implement custom combine. I'll start working on this unless anyone else calls dibs.

brwe on 26 May 2016

@JnBrymn-EB and I discussed a little about the combine script parts and we thought that we should probably change the above syntax. The variable name per function could be on the same level as the filter, weight and function instead of being a parameter inside the function definition because each function score can be assigned to a variable just like every function can have a weight or a filter. Also, the script should probably follow the same script syntax we have elsewhere. The query would then look like this:

POST _search
{
  "query": {
    "function_score": {
      "query": {
        // same as before, score will be accessible via _score
      },
      "functions": [
        {
          "query_function": {
            "query": {
              // here be any query, can also be function_score
            }
          },
          "var_name": "score_a",
          "filter": {
               // some filter
          }
        },
        {
          "random_score": {
            ...
          },
          "var_name": "score_a",
          "weight": 3.33
        },
        ...
      ],
      "score_mode": "script",
      "combine_script": {
          "lang": "groovy",
          "inline": "score_a * score_b + _score"
      }
    }
  }
}

brwe on 30 May 2016

I'd suggest changing query_function to query_score, and combine_script to score_script. otherwise looks great!

clintongormley on 1 Jun 2016

👍1

I'm building the combine part as we speak. Should we go with var_name as stated above or should we use _name as I've seen in other places?

JnBrymn-EB on 6 Jun 2016

We settled for var_name.

In addition, another question came up: A function might be associated with a filter that does not match. What value do we assign to the variable in this case? I have the feeling we need a default value here. Something like:

...
"functions": [
        {
          "script_variable": {
             "name": "score_a",
             "default": 123
          },
          "filter": {
               // some filter
          },
          "field_value_factor": {...}
        }
....

brwe on 28 Jun 2016

Could we add a missing field here just with the field_value_factor and make it default to 0 for the sake of a score_script? We'd have to be careful not to affect existing functionality like score_mode=avg which just assumes that the value doesn't exist. -- It might be a bit misleading.

Maybe another take would be adding a default_vals key to the combine_script that would enumerate the value of each clause that might be missing.

JnBrymn-EB on 28 Jun 2016

I'd go with missing, and in fact we should probably apply this to all functions (this has come up before). I'm wondering if the change to score_mode:avg is a problem?

clintongormley on 1 Jul 2016

Just to be clear: I meant to add a default if the "filter" doesn't not match. In case the field is missing it would still be up to the function to decide what to do.

brwe on 1 Jul 2016

I'll explain in more detail what I mean.
We have two cases:

the field is missing in the document
the filter associated with the function does not match

In the first case, we have three functions that have to deal with it: field_value_factor (takes a missing parameter and if the value is missing uses that instead of an actual value), decay_function (assumes the value is perfectly at the origin, which has greatly annoyed many users and might change, see https://github.com/elastic/elasticsearch/issues/18892) and script_score where everyone has to adjust the script to deal with it.

In the second case currently function_score acts for this document as if the function would not exist at all.

I was only talking about 2., filter not matching.

We could add a score_missing or default parameter that would do the following: If the filter for a function does not match then we always return this value.

This would have also the advantage that it would allow everyone to control not only input to individual functions in case field is missing (with the missing parameter) but also to control the output like so:

"function_score": {
      "functions": [
        {
          "filter": {
            "exists": {
              "field": "age"
            }
          },
          "field_value_factor": {
            "field": "age",
            "modifier": "ln"
          },
          "score_missing": 5 
        }
      ]
    }

Also, it would allow people to control what score_mode: avg means in case a filter is not matching, which is awkward right now.

For example in this case:

"function_score": {
      "score_mode": "avg", 
      "functions": [
        {
          "filter": {
            "term": {
              "skill": "codes_java"
            }
          },
          "weight": 5, 
          "score_missing": 0
        },
        {
          "filter": {
            "term": {
              "skill": "speaks_human"
            }
          },
          "weight": 2, 
          "score_missing": 0
        }
      ]
    }

in case the term codes_java is not in field skills, the score would be computed as (0+2)/(5+2) instead of just 2/2 which is the default right now and might not be desirable.

For the script_combine we should then enforce that this parameter exists if a function is associated with a filter.

I would not call it missing because I at least might mix that up with the missing in case the field does not exist in the doc.

brwe on 1 Jul 2016

This makes sense to me. What about calling it no_match_score or default_score? I think I prefer the former because it is more explicit.

clintongormley on 4 Jul 2016

Any timeline for this feature ? when is it going to be released?

mckinnovations on 18 Jul 2016

@mckinnovations https://github.com/elastic/elasticsearch/pull/19710

JnBrymn-EB on 1 Aug 2016

👍2

Have query_score or query_function keywords been added?
I am trying to compute max scores for docs from two queries, one calculating field_value_factor for updated field from a child doc and other is a field_value_factor for parent's updated value. So doc score I need is max(child.updated, doc.updated). I see no way to tell elasticsearch to return such max updated currently.

serj-p on 2 Mar 2017

I think you can use dis_max query of 2 function_score queries
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html

erebus1 on 2 Mar 2017

But the question about timeline for query_score is really important.

erebus1 on 2 Mar 2017

Guys, do you plan to implement this feature?

erebus1 on 30 Mar 2017

This is becoming more important for upcoming work at Eventbrite.

JnBrymn-EB on 30 Mar 2017

We've been rethinking this approach. Apparently, according to research, the best way to combine scores is to add them together (which the bool query does, now that coordination and query norm are gone).

So we're looking at better ways of exposing primitives for incorporating non-textual scores into the overall score.

Closing in favour of https://github.com/elastic/elasticsearch/issues/23850

clintongormley on 31 Mar 2017

"coordination and query norm are gone" - do you have any documentation on that @clintongormley ?

JnBrymn-EB on 31 Mar 2017

@JnBrymn-EB they've been removed in Lucene 7 https://issues.apache.org/jira/browse/LUCENE-7347

query coordination was a hack to make TF/IDF work better in the face of poor TF saturation, and query norm (i believe) was essentially a failed experiment to try to make the scores from different queries comparable.

with those removed, the bool query now just does a simple sum, and boosting clauses is a much simpler calculation than before.

clintongormley on 3 Apr 2017

Fascinating! I'll have to soak this in.

JnBrymn-EB on 3 Apr 2017

For any one else struggling with the absence of a way to multiply scores directly, note that it is possible to take logarithms using function_score/script_score or using a modifier field. The addition of logarithms is equivalent to multiplication for scoring.

marcusklaas on 26 Apr 2018

@marcusklaas, I understand the math of logarithms( score=A * B * C sorts the same as score=log(A * B * C) and log(A * B * C) = log(A) + log(B) + log(C))) but I'm unclear how this helps. Do you have an example? For instance - if I want to multiply field values 3 fields together, then I would just use score_mode=multiply. But if I wanted to make an interesting combination of field values like A*B + C then the logarithm trick doesn't help me because that isn't a bunch of products.

And if you want to get to arbitrary polynomials of the fields and if you want to incorporate the text score in the mix, then all the more what do I do?

JnBrymn-EB on 30 Apr 2018

@JnBrymn-EB
I agree. The Math.log method doesn't help in the case you presented. It only helps when you want to just _multiply_ scores from several different queries.

For reference for future readers, the example we used in our company:

{
  "query": {
    "bool": {
      "must": {
        [
          {
            "function_score": {
              "query": someQueryA,
              "script_score": {
                "source": "Math.log(_score)"            
              }
            }
          },
          {
            "function_score": {
              "query": someQueryB,
              "script_score": {
                "source": "Math.log(_score)"            
              }
            }
          }
        ]
      }
    }
  }
}

PeledYuval on 27 Aug 2019

Something like:

{
    "query" : {
        "script_score" : {
            "query" : {
                "bool" : {
                    "should": [
                        {"match": { "message": "elasticsearch" }, "_name": "A"}
                        {"match": { "message.trigrammed": "elasticsearch" }, "_name": "B"}
                    ]
                }
            },
            "script" : {
                "source" : "_subscore['A'] + _subscore['B'] + 0.1*_subscore['A']*_subscore['B']"
            }
        }
     }
}

JnBrymn on 30 Aug 2019

❤7

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Docker-compose.yml Is not working for elasticserch

DhairyashilBhosale · 3Comments

Implement tests similar to bats tests for windows

brwe · 3Comments

More Lucene suggesters

clintongormley · 3Comments

Should range aggregations support the `missing` option?

jpountz · 3Comments

Add support for role assumption in s3 repository

malpani · 3Comments