Elasticsearch: Deprecate Function Score Query in favour of Script Score Query

Created on 3 Jun 2019  Β·  32Comments  Β·  Source: elastic/elasticsearch

We would like to deprecate Function Score Query in 7.x 8.0 and completely remove it starting from 8.0 9.0 (increasing versions as we are far deep in 7.x already).

The new script score query will replace it.
Here we describe how a function score query can be replaced with a script score query.

We would like to know from the community about cases when something was possible to do with Function Score Query, but the new Script Score Query can't address it.

:SearcRanking Search

Most helpful comment

@mayya-sharipova Can we please sync via a Zoom call on the topic? (maybe even with some other devs involved on your side) It would be cool to explain everything this way πŸ™
I would like to have this functions declaration possibility apart of the bool queries as the Lucene optimizations are regarding the word analysis and text matching scoring that I would like to use as one of the factors. With this in mind I will have full control over the ranking on my side with a bunch of many other cofactors apart from the mere text matching / bool query matching.

All 32 comments

Pinging @elastic/es-search

We have identified 2 function_score query functionalities that are missing in the script_score query

  1. score_mode –first, applying the score from the function with the 1st matching filter (wonder how prevalent is this use-case)
  2. _explain explanation of score calculation doesn't work in script_score query.

For score_mode first, we had a discussion before.

A proposal is to have a new type of compound query that has an ability to combine scores from different queries:

{
  "query": {
    "compound": {
      "queries": [
        {
          "match": {
            "message": "elasticsearch"
          }
        },
        {
          "match": {
            "author": "shay banon"
          }
        }
      ],
      "score_mode": "first"
    }
  }
}

score_mode has the same values and definitions as in Function Score Query

score_mode | definition
-- | --
sum | scores are summed (default)
multiply | scores are multiplied
avg | scores are averaged
first | the first function that has a matching filter is applied
max | maximum score is used
min | minimum score is used

Or as an alternative we can have a script compound query where scores can be freely combined using script:

{
  "query": {
    "script_compound": {
      "queries": [
        {"match": {
          "message": "elasticsearch"
        }},
        {"match": {
          "author": "shay banon"
        }}
      ],
      "script": {
        "source": "_scores[0] + _scores[1]"
      }
    }
  }
}

But for the first mode, we would still need to implement a type of in-order query, as it is difficult to implement this logic through a script. A possible API for in_order query:

{
  "query": {
    "in_order": {
      "queries": [
        "match": {
          "message": "elasticsearch"
        },
        "match": {
          "author": "shay banon"
        }
      ],
      "score_mode" : "first"
    }
  }
}

Not sure what other score_modes are useful here.

@mayya-sharipova I am not 100% sure if these changes won't break our usage of ElasticSearch, so I've prepared the following comparison - could you assure it is proper feature translation to the new query model, please? Thank you for any tip!

Actual request (ES 6.6)

{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
              "params": {
                "ids": [1, 2]
              }
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "terms": {
              "location.city_id": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

New request (ES 7.x / ES 8.0 compatible)

_[especially not sure about how to pass filter scoring behaviour... πŸ€”]_

{
  "explain": true,
  "query": {
    "script_compound": {
      "queries": [
        {
          "script_score": {
            "query": {
              "match_all": {}
            },
            "script": {
              "lang": "painless",
              "source": "return (doc['ids'].containsAll(params.ids) ? 1 : 0) * params.weight;",
              "params": {
                "ids": [1, 2],
                "weight": 65
              }
            }
          }
        },
        {
          "script_score": {
            "query": {
              "match": {
                "terms": {
                  "location.city_id": [
                    "1"
                  ]
                }
              }
            },
            "script": {
              "lang": "painless",
              "source": "return params._score * params.weight;",
              "params": {
                "weight": 35
              }
            }
          }
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

PS

I recommend one little fix of the proposed query (as for now it isn't JSON format compliant):
Instead of this:

Or as an alternative we can have a script compound query where scores can be freely combined using script:

{
  "query": {
    "script_compound": {
      "queries": [
        "match": {
          "message": "elasticsearch"
        },
        "match": {
          "author": "shay banon"
        }
      ],
      "script": {
        "source": "_scores[0] + _scores[1]"
      }
    }
  }
}

it should looks like this:
Or as an alternative we can have a script compound query where scores can be freely combined using script:
js { "query": { "script_compound": { "queries": [ { "match": { "message": "elasticsearch" }, }, { "match": { "author": "shay banon" } } ], "script": { "source": "_scores[0] + _scores[1]" } } } }

@mayya-sharipova Please give us any tip on the topic, thanks πŸ™πŸ™‚

@lrynek Sorry for a late reply, I've been away.
Thank you for your tips, indeed with a new query type we need to cover all existing functionalities of function_score query before we can deprecate it.
And you are right, script doesn't allow to filter functions. For now, the plan is to investigate the implementation of compound query without script, so your query could be translated to something like this:

{
  "query": {
    "compound": {
      "queries": [
        {
          "script_score": {
            "script": {
              "source": "return doc['ids'].containsAll(params.ids) ? 65 : 0;",
              "params": {
                "ids": [1,2]
              }
            }
          }
        },
        {
          "terms": {
            "location.city_id": [
              "1"
            ],
            "boost": 35
          }
        }
      ],
      "score_mode": "sum"
    }
  }
}

@mayya-sharipova Thank you for the response πŸ‘ It's now clear for me.
Please share any resource regarding this future implementation as me and my coworkers are very interested in being updated on the topic. Thank you! πŸ™‚

@Irynek we plan to add a new compound query to handle some of the unique functionalities that the function_score query provides. However, looking at your example I don't think you need any replacement here. When the scores of sub-queries are summed together a plain bool query would work fine:
{ "query": { "bool": { "filter": { "match_all": {} } }, "should": [ { "script_score": { "script": { "source": "return doc['ids'].containsAll(params.ids) ? 65 : 0;", "params": { "ids": [ 1, 2 ] } } } }, { "terms": { "location.city_id": [ "1" ], "boost": 35 } } ] } }
The case that we want to handle in the new query is the first mode that picks the score of the first matching query. This is not possible to achieve with a bool query so we need a replacement.
There are other cases but they are esoteric like for instance if you want to multiply the score of multiple queries instead of using a sum like the bool query is doing.

@jimczi Thanks for the comment πŸ‘ But we precisely will need to implement such _esoteric_ use cases as multiplying score of multiple queries... 😏 So are you going to get rid of those possibilities? πŸ€” 😨

@jimczi @mayya-sharipova Kindly reminder about my previous question :slightly_smiling_face: :point_up:

@mayya-sharipova @jimczi Hello - I'm a happy user of elasticsearch and the current script_score functionality. I have a use case related to compound / multiple script scores and thought I'd share it here along with some ideas. I hope this is the correct GitHub issue and is on-topic - let me know if not and I'll be happy to move/hide this comment.

I'm using script_score to handle a situation where multiple sort criteria are involved. For example: I'd like results to appear sorted based on criteria A, and then use criteria B as a tie-breaker.

To help explain this with a concrete use case: my documents are recipes - and I'd like to sort based on factors including number of matched ingredients (desc), and then recipe rating (desc).

Currently I use script_score to produce a single output value to achieve this -- it calculates an integer value A (count of ingredients matched) and a decimal value B which is normalized to the range 0...1 (float) based on the recipe rating (code ref).

The script score returns the sum of A+B -- so 'three matched ingredients on a recipe with a rating of 4 stars' becomes 3.4 in the document _score, and will rank above 'three matched ingredients on a recipe of 2 stars', for example, thanks to the sort order.

This works, although in an ideal world I'd prefer to calculate and refer to the two distinct values separately. Combining them and restricting B to 0...1 makes the intent of the calculation and sorting less clear.

One idea I had was to align with the 'sort by multiple fields' approach which works for 'native' document fields (as in this example sorting by post_date, user, ...), by using named outputs (similar to the way that aggregations can be named).

If users could define 'named script queries' then those could be referenced in the sort parameter - and it'd also make it easier to use a mix of sort orders (asc, desc, ...) on the different scripted outputs.

"script_scores": {
  "calculation_a": {
    "script": "return ...",
    ...
  },
  "calculation_b": {
    "script": "return ...",
    ...
  }
  "_score": {  # the document score -- i.e. same as the current script_score
    "script": "return ...",
    ...
  }
},
"sort": [
  {"calculation_a": "asc"},  # alternative: "_score.calculation_a" ?
  {"calculation_b": "desc"},
  ...
]

As far as I can tell, this isn't currently possible, but I'd be glad to be corrected if there's a way to use multiple script outputs at the moment. Either way I thought I'd share some thoughts from working through this problem. Thanks for your time and work on ES!

@lrynek

But we precisely will need to implement such esoteric use cases as multiplying score of multiple queries... 😏 So are you going to get rid of those possibilities?

We are not going to get rid of possibilities of function_score query unless we have other alternative ways to implement them.

@jayaddison Thank you for sharing your use-case, it is very interesting.
I will bring your proposal to my team for a discussion.
I can see how it could this could be used with a compound query we are investigating.

{
  "query": {
    "compound": {
      "queries": [
        {
          "match": {
            "message": "elasticsearch"
          }
        },
        {
          "script_score": {
            ....
          }
        }
      ],
      "score_mode": "first"
    }
  },
  "sort" : [
    { scores[0] : "asc" },
    { scores[1] : "desc" }
  ]
}

@lrynek We have a discussion within the team, and found a behaviour where scores from multiple queries need to be multiplied to be really esoteric, and not the behaviour that we would like to encourage our users. That's why we have decided for now only to implement first behaviour of function_score query, and drop all other esoteric behaviours (multiply, min, and avg).

Nevertheless, we would like still to learn more about use case where you need to multiple scores from multiple queries, in case we missed something. We would appreciate if you share more details about your multiply use case. Thanks a lot.

@mayya-sharipova Thanks for your answer! πŸ‘ ... and sorry for the delayed response on my side 😏

In the meanwhile we had decided to use sum value of score_mode configuration in our function_score queries, so I would like to assure that future changes in ElasticSearch API won't affect this usage we have now πŸ‘‰ see the example query we do // and its explanation from ES:

Example

Request

Expand request body

{
  "size": 1,
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "match_all": {}
            }
          ]
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "0.89",
              "params": {
                "param_1": [],
                "param_2": [],
                "param_3": []
              }
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "term": {
              "location.city_id": 1
            }
          },
          "weight": 35
        },
        {
          "field_value_factor": {
            "field": "factors.some_indexed_factor_value",
            "missing": 0
          },
          "weight": 18
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

Response

Expand response explanation JSON

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 53280,
    "max_score": 57.85,
    "hits": [
      {
        "_shard": "__SHARD__",
        "_node": "cp9Uzt8pQUeI9BGiS1eu2Q",
        "_index": "__INDEX__",
        "_type": "_doc",
        "_id": "__ID__",
        "_score": 57.85,
        "_source": {},
        "_explanation": {
          "value": 57.85,
          "description": "sum of:",
          "details": [
            {
              "value": 57.85,
              "description": "min of:",
              "details": [
                {
                  "value": 57.85,
                  "description": "function score, score mode [sum]",
                  "details": [
                    {
                      "value": 57.85,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 0.89,
                          "description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='0.89', options={}, params={param_2=[], param_3=[], param_1=[]}}\" and parameters: \n{param_2=[], param_3=[], param_1=[]}",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "_score: ",
                              "details": [
                                {
                                  "value": 1.0,
                                  "description": "*:*",
                                  "details": []
                                }
                              ]
                            }
                          ]
                        },
                        {
                          "value": 65.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 0.0,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 0.0,
                          "description": "field value function: none(doc['factors.some_indexed_factor_value'].value?:0.0 * factor=1.0)",
                          "details": []
                        },
                        {
                          "value": 18.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    }
                  ]
                },
                {
                  "value": 3.4028235E38,
                  "description": "maxBoost",
                  "details": []
                }
              ]
            },
            {
              "value": 0.0,
              "description": "match on required clause, product of:",
              "details": [
                {
                  "value": 0.0,
                  "description": "# clause",
                  "details": []
                },
                {
                  "value": 1.0,
                  "description": "DocValuesFieldExistsQuery [field=_primary_term]",
                  "details": []
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Needs

So I mostly would like you to:

  1. Preserve the ability of sum for all scores being involved in a query
  2. Keep this ability of explicit weight API for each of the scoring functions if possible (as it is more elegant to pass explicitly the weight from a script that generates the ElasticSearch query)
  3. Add a functionality of particular scores retrieval from the explanation (without a need of recursive searches through the nested explanation JSON // with all those value, description, details triplets πŸ˜‰) as stated in my feature request proposal.

Are those above☝️requirements possible to sustain in the brand new approach you propose here? (it's important for us as a company πŸ™).

If you have a possibility / wanto to sync via Zoom remote call, please propose a schedule, so I will be able to explain everything into detail πŸ™‚

Thanks for any further insights on the topic πŸ™‚

@lrynek Thank for posting your query and explanations. Just wanted to let you know that I read and aware of your post. I will get back to you with an answer when I have something concrete, there are still some things we want to discuss within the search team.

Another use case for function score query. Our application constructs Elasticsearch queries using multiple components. Each component may implement its own scoring method. Multiple scoring functions are defined and combined with function score query. It helps isolate the implementation of each function. With script query all functions have to be implemented in single script. It doesn't support modularity of the code.

@yuri-lab Thanks for your comment.

Multiple scoring functions are defined and combined with function score query.

You can define multiple functions in painless script as well, for example:

{
  "query": {
    "script_score" : {
      "query" : {
        "match_all" : {}
      },
      "script" : {
        "source": """
          long myFun1(long x) { x * 10 }
          long myFun2(long x) { x * 100 }
          return myFun1(doc['field1'].value) + myFun2(doc['field2'].value)
        """
      }
    }
  }

The only limitation of script_score query in comparison with function_score query is that you can't apply separate filters for the functions in script_score query. For this, you would need to write more complex bool query.

Does this satisfy your requirement?

@lrynek I have thought about your query, and I can see that we can implement it through a bool and script_score queries, something like this:

{
  "query" : {
    "bool" : {
      "should" : [
        {
          "constant_score" : {
            "filter" : {
              "term" : { "location.city_id": 1}
            },
            "boost" : 35
          }
        },
        {
          "script_score" : {
            "query" : {"match_all": {}},
            "script": {
              "source": """
                0.89* 65 + 18 * doc['some_indexed_factor_value'].value
              """,
              "params": {
                <params>
              }
            }
          }
        }
      ]
    }
  }
}

About your needs:

  • sum. bool query sums the scores of its clauses. In script score you can write any scoring formula you want.
  • weight. Many ES queries (including script_score and bool) support boost param.
  • explanation. What we have right now is that for each query, you can provide a name , and the search response will include for each hit the matched_queries it matched on. There also a feature to provide a custom script explanation. This doesn't completely address your explanation request, but may be later we can consider including name queries in explanations as well.

@yuri-lab I cannot agree more with you - it's also our use case πŸ‘

@mayya-sharipova I can see the possibility of transferring and reproducing our current query needs into the bool and script_score combinations but for me it degrades the experience of the new approach versus the one we are used to... 😞
The explicit and consistent approach in function_score is far better IMHO... Not sure what is the reasoning behind deprecating it πŸ€·β€β™‚οΈ

To give you even more examples from the real usage, where we are not manipulating ElasticSearch requests directly but via other language components that adds up their own scoring factors (it is very modular, reusable and scalable at the same time // here in PHP each factor generator class is defined as a service that can depend on other services in order to build/generate the final particular function_score query function):


PHP Factor builders / generators classes:

namespace SearchBridge\Factor;

use SearchEngine\Domain\ValueObject\Query;

final class FirstFactor implements FactorInterface
{
    public function key(): string
    {
        return 'first_factor';
    }

    public function definition(Query $query): array
    {
        return [
            'script_score' => [
                'script' => [
                    'lang'   => 'painless',
                    'source' => $this->scriptSource(),
                    'params' => $this->scriptParams($query),
                ],
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

final class SecondFactor implements FactorInterface
{
    public function key(): string
    {
        return 'second_factor';
    }

    public function definition(Query $query): array
    {
        $cityIds = $this->cityResolver->resolve($query);

        if ($cityIds->isEmpty())
        {
            return [];
        }

        return [
            'filter' => [
                'terms' => [
                    'location.city_id' => $cityIds->all(),
                ],
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

final class ThirdFactor implements FactorInterface
{
    public function key(): string
    {
        return 'third_factor';
    }

    public function definition(Query $query): array
    {
        return [
            'field_value_factor' => [
                'field' => 'factors.some_indexed_factor_value',
                'missing' => 0,
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

With the approach you suggest, my team will be forced to get rid of this very good architecture and to parse parts of ElasticSearch query (scripts, etc.) as strings and inject here or there specific logic. For me it is unacceptable // will force me to _(sarcasm starts)_ sort of programmer's seppuku... _(sarcasm ended πŸ˜‰πŸ˜„ πŸ―βš”οΈ)_

@mayya-sharipova Conceptually yes, but in practice this approach requires string manipulation to construct source for the script, which is error-prone. Also it goes against best practice recommended by Elasticsearch to keep script source static and change only script parameters.

We have discussed this issue again within the team, and the conclusion is that we would like ES users to use bool query instead of function_score query to combine queries/functions. There has been a lot of optimizations done for bool queries on the Lucene side to make them smarter and more efficient, which function_score query doesn't have.

@mayya-sharipova Can we please sync via a Zoom call on the topic? (maybe even with some other devs involved on your side) It would be cool to explain everything this way πŸ™
I would like to have this functions declaration possibility apart of the bool queries as the Lucene optimizations are regarding the word analysis and text matching scoring that I would like to use as one of the factors. With this in mind I will have full control over the ranking on my side with a bunch of many other cofactors apart from the mere text matching / bool query matching.

We have a ecommerce store where use case of combining score of two factors:-

  1. lucene score of text matching/bool query matching
  2. Other factors like 'newnesss' , numerical factors like 'number of images' of products ,'number. of words in name/decription' and some other numerical factors.

Right now they are used in a very clean manner using 'function_score query'. Based on the discussion above it looks like 'script score query' will result in too much complexity.

@mayya-sharipova : What are the specific advantagesΒ  you and your team see in deprecating function score query in favour of script score query?

We have a solution in place which is very similar to that of AakashMittal.

We've got two crucial points:

  1. The Score of the document gets multiplied by the factors like 'newness', individual boost score, etc. (boostMode = MULTIPLY). The factors themselves get combined by weighted average (scoreMode = AVG)
    At first we had these factors added to the score, but the impact on the final score was varying too much throughout the different requests (too less on high scores, too much on low scores).

  2. Some factors are only applied, if the documents meet the (sub-)query of the function score factor. E.g. Only give a boost, if document X belongs to category XY.

Our greatest concern is whether this is still possible when migrating to script score query?

@AakashMittal thanks for providing your use-case. All these factors can be combined using script_score, and I can't see much complexity in this except of trouble of rewriting queries.

: What are the specific advantages you and your team see in deprecating function score query in favour of script score query?

It is a quite bulky query, and difficult to reason about. It has a number of bugs for edge-cases and un-intuitive behaviours (how weights get propagated, behaviour in nested context etc). We want to replace with a simple script_score and bool queries that we have put and putting a lot of work in optimizing.

@webemat thank you for providing your use-case.

The Score of the document gets multiplied by the factors like 'newness', individual boost score, etc. (boostMode = MULTIPLY).

You can do this combination through script_score query.

Some factors are only applied, if the documents meet the (sub-)query of the function score factor. E.g. Only give a boost, if document X belongs to category XY.

Depending what exactly are these factors, some of them can be implemented through a script_score query. In this example a doc score will get a boost depending on a category it belongs.

{
  "query": {
    "script_score": {
      "query": {
        "match": {
          "message": "apple"
        }
      },
      "script": {
        "source": "_score * params.getOrDefault(doc["category"].value, 1)",
        "params": {
          "books": 10,
          "computers": 100,
          "food": 1
        }
      }
    }
  }
}

We also would like to emphasize that we have not found much evidence from literature where it is useful to multiply scores from two textual queries, that's why we encourage ES users to combine queries by summing scores through bool queries.

@mayya-sharipova can you please refer also to my latest comment? Thank you! :slightly_smiling_face:

@mayya-sharipova I currently use function_score to multiply the values of a standard query (with should, must, boosts, etc) and a custom query (involving a dense_vector dotProduct). How can I replicate that with script_score? I see your suggestion about putting the custom script in a bool, but bool sums the results whereas I need them multiplied.

@timforr

I currently use function_score to multiply the values of a standard query (with should, must, boosts, etc) and a custom query (involving a dense_vector dotProduct). How can I replicate that with script_score?

How does your second, "custom query" involving dot product look like? It uses script_score, doesn't it? Then your script can return _score * dotProduct(...), you don't need function score for that.

@telendt Great, thank you, that does indeed work.

Was this page helpful?
0 / 5 - 0 ratings