elasticsearch 🚀 - Deprecate Function Score Query in favour of Script Score Query

Pinging @elastic/es-search

elasticmachine on 3 Jun 2019

We have identified 2 function_score query functionalities that are missing in the script_score query

score_mode –first, applying the score from the function with the 1st matching filter (wonder how prevalent is this use-case)
_explain explanation of score calculation doesn't work in script_score query.

mayya-sharipova on 23 Jul 2019

For score_mode first, we had a discussion before.

A proposal is to have a new type of compound query that has an ability to combine scores from different queries:

{
  "query": {
    "compound": {
      "queries": [
        {
          "match": {
            "message": "elasticsearch"
          }
        },
        {
          "match": {
            "author": "shay banon"
          }
        }
      ],
      "score_mode": "first"
    }
  }
}

score_mode has the same values and definitions as in Function Score Query

mayya-sharipova on 26 Aug 2019

Or as an alternative we can have a script compound query where scores can be freely combined using script:

{
  "query": {
    "script_compound": {
      "queries": [
        {"match": {
          "message": "elasticsearch"
        }},
        {"match": {
          "author": "shay banon"
        }}
      ],
      "script": {
        "source": "_scores[0] + _scores[1]"
      }
    }
  }
}

But for the first mode, we would still need to implement a type of in-order query, as it is difficult to implement this logic through a script. A possible API for in_order query:

{
  "query": {
    "in_order": {
      "queries": [
        "match": {
          "message": "elasticsearch"
        },
        "match": {
          "author": "shay banon"
        }
      ],
      "score_mode" : "first"
    }
  }
}

Not sure what other score_modes are useful here.

mayya-sharipova on 26 Aug 2019

@mayya-sharipova I am not 100% sure if these changes won't break our usage of ElasticSearch, so I've prepared the following comparison - could you assure it is proper feature translation to the new query model, please? Thank you for any tip!

Actual request (ES 6.6)

{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
              "params": {
                "ids": [1, 2]
              }
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "terms": {
              "location.city_id": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

New request (ES 7.x / ES 8.0 compatible)

_[especially not sure about how to pass filter scoring behaviour... 🤔]_

{
  "explain": true,
  "query": {
    "script_compound": {
      "queries": [
        {
          "script_score": {
            "query": {
              "match_all": {}
            },
            "script": {
              "lang": "painless",
              "source": "return (doc['ids'].containsAll(params.ids) ? 1 : 0) * params.weight;",
              "params": {
                "ids": [1, 2],
                "weight": 65
              }
            }
          }
        },
        {
          "script_score": {
            "query": {
              "match": {
                "terms": {
                  "location.city_id": [
                    "1"
                  ]
                }
              }
            },
            "script": {
              "lang": "painless",
              "source": "return params._score * params.weight;",
              "params": {
                "weight": 35
              }
            }
          }
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

PS

I recommend one little fix of the proposed query (as for now it isn't JSON format compliant):
Instead of this:

Or as an alternative we can have a script compound query where scores can be freely combined using script:
{
  "query": {
    "script_compound": {
      "queries": [
        "match": {
          "message": "elasticsearch"
        },
        "match": {
          "author": "shay banon"
        }
      ],
      "script": {
        "source": "_scores[0] + _scores[1]"
      }
    }
  }
}
it should looks like this:
Or as an alternative we can have a script compound query where scores can be freely combined using script:
js { "query": { "script_compound": { "queries": [ { "match": { "message": "elasticsearch" }, }, { "match": { "author": "shay banon" } } ], "script": { "source": "_scores[0] + _scores[1]" } } } }

lrynek on 1 Oct 2019

👍6

@mayya-sharipova Please give us any tip on the topic, thanks 🙏🙂

lrynek on 17 Oct 2019

@lrynek Sorry for a late reply, I've been away.
Thank you for your tips, indeed with a new query type we need to cover all existing functionalities of function_score query before we can deprecate it.
And you are right, script doesn't allow to filter functions. For now, the plan is to investigate the implementation of compound query without script, so your query could be translated to something like this:

{
  "query": {
    "compound": {
      "queries": [
        {
          "script_score": {
            "script": {
              "source": "return doc['ids'].containsAll(params.ids) ? 65 : 0;",
              "params": {
                "ids": [1,2]
              }
            }
          }
        },
        {
          "terms": {
            "location.city_id": [
              "1"
            ],
            "boost": 35
          }
        }
      ],
      "score_mode": "sum"
    }
  }
}

mayya-sharipova on 23 Oct 2019

👍3

@mayya-sharipova Thank you for the response 👍 It's now clear for me.
Please share any resource regarding this future implementation as me and my coworkers are very interested in being updated on the topic. Thank you! 🙂

lrynek on 29 Oct 2019

👍1

@Irynek we plan to add a new compound query to handle some of the unique functionalities that the function_score query provides. However, looking at your example I don't think you need any replacement here. When the scores of sub-queries are summed together a plain bool query would work fine:
{ "query": { "bool": { "filter": { "match_all": {} } }, "should": [ { "script_score": { "script": { "source": "return doc['ids'].containsAll(params.ids) ? 65 : 0;", "params": { "ids": [ 1, 2 ] } } } }, { "terms": { "location.city_id": [ "1" ], "boost": 35 } } ] } }
The case that we want to handle in the new query is the first mode that picks the score of the first matching query. This is not possible to achieve with a bool query so we need a replacement.
There are other cases but they are esoteric like for instance if you want to multiply the score of multiple queries instead of using a sum like the bool query is doing.

jimczi on 30 Oct 2019

👍2

@jimczi Thanks for the comment 👍 But we precisely will need to implement such _esoteric_ use cases as multiplying score of multiple queries... 😏 So are you going to get rid of those possibilities? 🤔 😨

lrynek on 4 Nov 2019

@jimczi @mayya-sharipova Kindly reminder about my previous question :slightly_smiling_face: :point_up:

lrynek on 4 Jan 2020

@mayya-sharipova @jimczi Hello - I'm a happy user of elasticsearch and the current script_score functionality. I have a use case related to compound / multiple script scores and thought I'd share it here along with some ideas. I hope this is the correct GitHub issue and is on-topic - let me know if not and I'll be happy to move/hide this comment.

I'm using script_score to handle a situation where multiple sort criteria are involved. For example: I'd like results to appear sorted based on criteria A, and then use criteria B as a tie-breaker.

To help explain this with a concrete use case: my documents are recipes - and I'd like to sort based on factors including number of matched ingredients (desc), and then recipe rating (desc).

Currently I use script_score to produce a single output value to achieve this -- it calculates an integer value A (count of ingredients matched) and a decimal value B which is normalized to the range 0...1 (float) based on the recipe rating (code ref).

The script score returns the sum of A+B -- so 'three matched ingredients on a recipe with a rating of 4 stars' becomes 3.4 in the document _score, and will rank above 'three matched ingredients on a recipe of 2 stars', for example, thanks to the sort order.

This works, although in an ideal world I'd prefer to calculate and refer to the two distinct values separately. Combining them and restricting B to 0...1 makes the intent of the calculation and sorting less clear.

One idea I had was to align with the 'sort by multiple fields' approach which works for 'native' document fields (as in this example sorting by post_date, user, ...), by using named outputs (similar to the way that aggregations can be named).

If users could define 'named script queries' then those could be referenced in the sort parameter - and it'd also make it easier to use a mix of sort orders (asc, desc, ...) on the different scripted outputs.

"script_scores": {
  "calculation_a": {
    "script": "return ...",
    ...
  },
  "calculation_b": {
    "script": "return ...",
    ...
  }
  "_score": {  # the document score -- i.e. same as the current script_score
    "script": "return ...",
    ...
  }
},
"sort": [
  {"calculation_a": "asc"},  # alternative: "_score.calculation_a" ?
  {"calculation_b": "desc"},
  ...
]

As far as I can tell, this isn't currently possible, but I'd be glad to be corrected if there's a way to use multiple script outputs at the moment. Either way I thought I'd share some thoughts from working through this problem. Thanks for your time and work on ES!

jayaddison on 20 Jan 2020

👍2

@lrynek

But we precisely will need to implement such esoteric use cases as multiplying score of multiple queries... 😏 So are you going to get rid of those possibilities?

We are not going to get rid of possibilities of function_score query unless we have other alternative ways to implement them.

mayya-sharipova on 29 Jan 2020

🎉1 👍1

@jayaddison Thank you for sharing your use-case, it is very interesting.
I will bring your proposal to my team for a discussion.
I can see how it could this could be used with a compound query we are investigating.

{
  "query": {
    "compound": {
      "queries": [
        {
          "match": {
            "message": "elasticsearch"
          }
        },
        {
          "script_score": {
            ....
          }
        }
      ],
      "score_mode": "first"
    }
  },
  "sort" : [
    { scores[0] : "asc" },
    { scores[1] : "desc" }
  ]
}

mayya-sharipova on 29 Jan 2020

@lrynek We have a discussion within the team, and found a behaviour where scores from multiple queries need to be multiplied to be really esoteric, and not the behaviour that we would like to encourage our users. That's why we have decided for now only to implement first behaviour of function_score query, and drop all other esoteric behaviours (multiply, min, and avg).

Nevertheless, we would like still to learn more about use case where you need to multiple scores from multiple queries, in case we missed something. We would appreciate if you share more details about your multiply use case. Thanks a lot.

mayya-sharipova on 18 Feb 2020

👍1

@mayya-sharipova Thanks for your answer! 👍 ... and sorry for the delayed response on my side 😏

In the meanwhile we had decided to use sum value of score_mode configuration in our function_score queries, so I would like to assure that future changes in ElasticSearch API won't affect this usage we have now 👉 see the example query we do // and its explanation from ES:

Example

Request

Expand request body

{
  "size": 1,
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "match_all": {}
            }
          ]
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "0.89",
              "params": {
                "param_1": [],
                "param_2": [],
                "param_3": []
              }
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "term": {
              "location.city_id": 1
            }
          },
          "weight": 35
        },
        {
          "field_value_factor": {
            "field": "factors.some_indexed_factor_value",
            "missing": 0
          },
          "weight": 18
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

Response

Expand response explanation JSON

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 53280,
    "max_score": 57.85,
    "hits": [
      {
        "_shard": "__SHARD__",
        "_node": "cp9Uzt8pQUeI9BGiS1eu2Q",
        "_index": "__INDEX__",
        "_type": "_doc",
        "_id": "__ID__",
        "_score": 57.85,
        "_source": {},
        "_explanation": {
          "value": 57.85,
          "description": "sum of:",
          "details": [
            {
              "value": 57.85,
              "description": "min of:",
              "details": [
                {
                  "value": 57.85,
                  "description": "function score, score mode [sum]",
                  "details": [
                    {
                      "value": 57.85,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 0.89,
                          "description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='0.89', options={}, params={param_2=[], param_3=[], param_1=[]}}\" and parameters: \n{param_2=[], param_3=[], param_1=[]}",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "_score: ",
                              "details": [
                                {
                                  "value": 1.0,
                                  "description": "*:*",
                                  "details": []
                                }
                              ]
                            }
                          ]
                        },
                        {
                          "value": 65.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 0.0,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 0.0,
                          "description": "field value function: none(doc['factors.some_indexed_factor_value'].value?:0.0 * factor=1.0)",
                          "details": []
                        },
                        {
                          "value": 18.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    }
                  ]
                },
                {
                  "value": 3.4028235E38,
                  "description": "maxBoost",
                  "details": []
                }
              ]
            },
            {
              "value": 0.0,
              "description": "match on required clause, product of:",
              "details": [
                {
                  "value": 0.0,
                  "description": "# clause",
                  "details": []
                },
                {
                  "value": 1.0,
                  "description": "DocValuesFieldExistsQuery [field=_primary_term]",
                  "details": []
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Needs

So I mostly would like you to:

Preserve the ability of sum for all scores being involved in a query
Keep this ability of explicit weight API for each of the scoring functions if possible (as it is more elegant to pass explicitly the weight from a script that generates the ElasticSearch query)
Add a functionality of particular scores retrieval from the explanation (without a need of recursive searches through the nested explanation JSON // with all those value, description, details triplets 😉) as stated in my feature request proposal.

Are those above☝️requirements possible to sustain in the brand new approach you propose here? (it's important for us as a company 🙏).

If you have a possibility / wanto to sync via Zoom remote call, please propose a schedule, so I will be able to explain everything into detail 🙂

Thanks for any further insights on the topic 🙂

lrynek on 8 Apr 2020

👍9

@lrynek Thank for posting your query and explanations. Just wanted to let you know that I read and aware of your post. I will get back to you with an answer when I have something concrete, there are still some things we want to discuss within the search team.

mayya-sharipova on 20 Apr 2020

👍1

Another use case for function score query. Our application constructs Elasticsearch queries using multiple components. Each component may implement its own scoring method. Multiple scoring functions are defined and combined with function score query. It helps isolate the implementation of each function. With script query all functions have to be implemented in single script. It doesn't support modularity of the code.

yuri-lab on 21 Apr 2020

👍2

@yuri-lab Thanks for your comment.

Multiple scoring functions are defined and combined with function score query.

You can define multiple functions in painless script as well, for example:

{
  "query": {
    "script_score" : {
      "query" : {
        "match_all" : {}
      },
      "script" : {
        "source": """
          long myFun1(long x) { x * 10 }
          long myFun2(long x) { x * 100 }
          return myFun1(doc['field1'].value) + myFun2(doc['field2'].value)
        """
      }
    }
  }

The only limitation of script_score query in comparison with function_score query is that you can't apply separate filters for the functions in script_score query. For this, you would need to write more complex bool query.

Does this satisfy your requirement?

mayya-sharipova on 23 Apr 2020

@lrynek I have thought about your query, and I can see that we can implement it through a bool and script_score queries, something like this:

{
  "query" : {
    "bool" : {
      "should" : [
        {
          "constant_score" : {
            "filter" : {
              "term" : { "location.city_id": 1}
            },
            "boost" : 35
          }
        },
        {
          "script_score" : {
            "query" : {"match_all": {}},
            "script": {
              "source": """
                0.89* 65 + 18 * doc['some_indexed_factor_value'].value
              """,
              "params": {
                <params>
              }
            }
          }
        }
      ]
    }
  }
}

About your needs:

sum. bool query sums the scores of its clauses. In script score you can write any scoring formula you want.
weight. Many ES queries (including script_score and bool) support boost param.
explanation. What we have right now is that for each query, you can provide a name , and the search response will include for each hit the matched_queries it matched on. There also a feature to provide a custom script explanation. This doesn't completely address your explanation request, but may be later we can consider including name queries in explanations as well.

mayya-sharipova on 23 Apr 2020

😕1

@yuri-lab I cannot agree more with you - it's also our use case 👍

@mayya-sharipova I can see the possibility of transferring and reproducing our current query needs into the bool and script_score combinations but for me it degrades the experience of the new approach versus the one we are used to... 😞
The explicit and consistent approach in function_score is far better IMHO... Not sure what is the reasoning behind deprecating it 🤷‍♂️

To give you even more examples from the real usage, where we are not manipulating ElasticSearch requests directly but via other language components that adds up their own scoring factors (it is very modular, reusable and scalable at the same time // here in PHP each factor generator class is defined as a service that can depend on other services in order to build/generate the final particular function_score query function):

PHP Factor builders / generators classes:

namespace SearchBridge\Factor;

use SearchEngine\Domain\ValueObject\Query;

final class FirstFactor implements FactorInterface
{
    public function key(): string
    {
        return 'first_factor';
    }

    public function definition(Query $query): array
    {
        return [
            'script_score' => [
                'script' => [
                    'lang'   => 'painless',
                    'source' => $this->scriptSource(),
                    'params' => $this->scriptParams($query),
                ],
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

final class SecondFactor implements FactorInterface
{
    public function key(): string
    {
        return 'second_factor';
    }

    public function definition(Query $query): array
    {
        $cityIds = $this->cityResolver->resolve($query);

        if ($cityIds->isEmpty())
        {
            return [];
        }

        return [
            'filter' => [
                'terms' => [
                    'location.city_id' => $cityIds->all(),
                ],
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

final class ThirdFactor implements FactorInterface
{
    public function key(): string
    {
        return 'third_factor';
    }

    public function definition(Query $query): array
    {
        return [
            'field_value_factor' => [
                'field' => 'factors.some_indexed_factor_value',
                'missing' => 0,
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

With the approach you suggest, my team will be forced to get rid of this very good architecture and to parse parts of ElasticSearch query (scripts, etc.) as strings and inject here or there specific logic. For me it is unacceptable // will force me to _(sarcasm starts)_ sort of programmer's seppuku... _(sarcasm ended 😉😄 🏯⚔️)_

lrynek on 24 Apr 2020

👍13

@mayya-sharipova Conceptually yes, but in practice this approach requires string manipulation to construct source for the script, which is error-prone. Also it goes against best practice recommended by Elasticsearch to keep script source static and change only script parameters.

yuri-lab on 24 Apr 2020

👍2

We have discussed this issue again within the team, and the conclusion is that we would like ES users to use bool query instead of function_score query to combine queries/functions. There has been a lot of optimizations done for bool queries on the Lucene side to make them smarter and more efficient, which function_score query doesn't have.

mayya-sharipova on 5 May 2020

👀1

@mayya-sharipova Can we please sync via a Zoom call on the topic? (maybe even with some other devs involved on your side) It would be cool to explain everything this way 🙏
I would like to have this functions declaration possibility apart of the bool queries as the Lucene optimizations are regarding the word analysis and text matching scoring that I would like to use as one of the factors. With this in mind I will have full control over the ranking on my side with a bunch of many other cofactors apart from the mere text matching / bool query matching.

lrynek on 12 May 2020

👍17

We have a ecommerce store where use case of combining score of two factors:-

lucene score of text matching/bool query matching
Other factors like 'newnesss' , numerical factors like 'number of images' of products ,'number. of words in name/decription' and some other numerical factors.

Right now they are used in a very clean manner using 'function_score query'. Based on the discussion above it looks like 'script score query' will result in too much complexity.

@mayya-sharipova : What are the specific advantages you and your team see in deprecating function score query in favour of script score query?

AakashMittal on 23 May 2020

👍3

We have a solution in place which is very similar to that of AakashMittal.

We've got two crucial points:

The Score of the document gets multiplied by the factors like 'newness', individual boost score, etc. (boostMode = MULTIPLY). The factors themselves get combined by weighted average (scoreMode = AVG)
At first we had these factors added to the score, but the impact on the final score was varying too much throughout the different requests (too less on high scores, too much on low scores).
Some factors are only applied, if the documents meet the (sub-)query of the function score factor. E.g. Only give a boost, if document X belongs to category XY.

Our greatest concern is whether this is still possible when migrating to script score query?

webemat on 25 May 2020

👍3

@AakashMittal thanks for providing your use-case. All these factors can be combined using script_score, and I can't see much complexity in this except of trouble of rewriting queries.

: What are the specific advantages you and your team see in deprecating function score query in favour of script score query?

It is a quite bulky query, and difficult to reason about. It has a number of bugs for edge-cases and un-intuitive behaviours (how weights get propagated, behaviour in nested context etc). We want to replace with a simple script_score and bool queries that we have put and putting a lot of work in optimizing.

mayya-sharipova on 4 Jun 2020

@webemat thank you for providing your use-case.

The Score of the document gets multiplied by the factors like 'newness', individual boost score, etc. (boostMode = MULTIPLY).

You can do this combination through script_score query.

Some factors are only applied, if the documents meet the (sub-)query of the function score factor. E.g. Only give a boost, if document X belongs to category XY.

Depending what exactly are these factors, some of them can be implemented through a script_score query. In this example a doc score will get a boost depending on a category it belongs.

{
  "query": {
    "script_score": {
      "query": {
        "match": {
          "message": "apple"
        }
      },
      "script": {
        "source": "_score * params.getOrDefault(doc["category"].value, 1)",
        "params": {
          "books": 10,
          "computers": 100,
          "food": 1
        }
      }
    }
  }
}

We also would like to emphasize that we have not found much evidence from literature where it is useful to multiply scores from two textual queries, that's why we encourage ES users to combine queries by summing scores through bool queries.

mayya-sharipova on 4 Jun 2020

@mayya-sharipova can you please refer also to my latest comment? Thank you! :slightly_smiling_face:

lrynek on 4 Jun 2020

@mayya-sharipova I currently use function_score to multiply the values of a standard query (with should, must, boosts, etc) and a custom query (involving a dense_vector dotProduct). How can I replicate that with script_score? I see your suggestion about putting the custom script in a bool, but bool sums the results whereas I need them multiplied.

timforr on 23 Jun 2020

👍1

@timforr

I currently use function_score to multiply the values of a standard query (with should, must, boosts, etc) and a custom query (involving a dense_vector dotProduct). How can I replicate that with script_score?

How does your second, "custom query" involving dot product look like? It uses script_score, doesn't it? Then your script can return _score * dotProduct(...), you don't need function score for that.

telendt on 23 Jun 2020

👍1

@telendt Great, thank you, that does indeed work.

timforr on 23 Jun 2020

Elasticsearch: Deprecate Function Score Query in favour of Script Score Query

Most helpful comment

All 32 comments