Elasticsearch: Function score functions filter executed in query context

Created on 6 Nov 2018  路  16Comments  路  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version):

Version: 6.4.2, Build: default/deb/04711c2/2018-09-26T13:34:09.098244Z, JVM: 1.8.0_191

Description of the problem including expected versus actual behavior:

When executing function score query with a bunch of function, each with its filter clause I would expect them (the filter queries) to be executed in filter context, but they seem to be executed in query context instead.

Steps to reproduce:

Step 1: Index 2 documents:

curl -H 'Content-Type: application/x-ndjson' -XPOST localhost:9200/_bulk --data-binary '
{"index":{"_index":"test","_type":"_doc","_id":"1"}}
{"tags":["a", "b"]}
{"index":{"_index":"test","_type":"_doc","_id":"2"}}
{"tags":["a", "c"]}
'

Step 2: Prepare query

{
  "query": {
    "function_score": {
      "functions": [
    {
      "filter": {
        "bool": {
          "must":   {"match": {"tags": "a"}},
          "should": {"match": {"tags": "c"}}
        }
      },
      "weight": 100
    }
      ]
    }
  }
}

Step 3: Observe the results

{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 100,
    "hits": [
      {
        "_index": "test",
        "_type": "_doc",
        "_id": "2",
        "_score": 100,
        "_source": {
          "tags": [
            "a",
            "c"
          ]
        }
      },
      {
        "_index": "test",
        "_type": "_doc",
        "_id": "1",
        "_score": 100,
        "_source": {
          "tags": [
            "a",
            "b"
          ]
        }
      }
    ]
  }
}

Analysis

Both documents get 100 score, which means that the (only) function from function_score applied there. Which means that the bool query from its filter clause matched both documents. Which -- according to documentation -- means that the bool query has been executed in query context:

SHOULD: The clause (query) should appear in the matching document. If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match. In this case these clauses are only used to influence the score. If the bool query is a filter context or has neither must or filter then at least one of the should queries must match a document for it to match the bool query. This behavior may be explicitly controlled by settings the minimum_should_match parameter.

Since the bool query has a must clause ({"match": {"tags": "a"}}) that matches both documents, the behaviour of should depends on execution context. According to the quoted text in filter context there's an implicit minimum_should_match=1. The fact that both documents matched suggests that it got executed in query context though. But that seems counter intuitive to:

Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.

Also, this fragment from the docs says:

The scores produced by the filtering query of each function do not matter.

Which would suggest filter context.

Final words

I'm not 100% sure whether it's really an implementation bug, documentation bug or me not understanding the docs and would be happy if someone could help me figure that out.

:SearcRanking >docs

Most helpful comment

To be clear about what I meant in this comment, when running the query in the original issue, the builder for the boolean query sees context.isFilter() == false and leaves minimumShouldMatch set to default, which is no required optional clauses in the lucene BooleanQuery

All 16 comments

Pinging @elastic/es-search-aggs

It looks like the queries given in a filter for a function scores are actually executed in a query context, which is consistent with the behavior you're seeing here. It's definitely reasonable to infer that they would be executed in a filter context given the parameter name.

It also makes sense that they would be in a filter context considering their scores are not used, but there may be another consideration here I'm not aware of. @mayya-sharipova do you think filter queries here should be executed in a filter context? If not it seems like we should make it more clear in the docs that they're in a query context.

The filter is correctly executed in the filter context, what happens here is that we set the default score to 1 for functions that don't define a [score functions]:(https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#score-functions)
This is why a weight of 100 produces a score equals to 100. You can set weight to 0 if you want to disable the score entirely in the function.
However I agree that the documentation should indicate the default score for functions that contains a filter and no score-function.

Pinging @elastic/es-docs

@jimczi : I'm sorry but I don't follow. Yes, weight of 100 results in score of 100. But only on those documents that match the filter, right? If that's the case then why does document _id=1 matches?
It clearly doesn't match this filter:

{
  "query": {
    "bool": {
      "filter": {
    "bool": {
      "must":   {"match": {"tags": "a"}},
      "should": {"match": {"tags": "c"}}
    }
      }
    }
  }
}

Sorry I don't follow either, document _id=1 matches the filter because it contains the mandatory clause (must):
{"tags":["a", "b"]}
The should clause is not mandatory so the document matches with "must": {"match": {"tags": "a"}}.

@jimczi

The should clause is not mandatory

It is in the filter context (implicit minimum_should_match=1), it's not in the query context. Could you please try to run the last query I shared?

I think I might have confused you with my initial answer regarding filter contexts. What I meant is that it is executed in a filter context because we don't take into account the score produced by the inner query. That's the only difference between a filter context and a query context. The part about minimum_should_match implicitly set in filter context is just wrong and I'll be happy to fix this in the documentation if you can point me where it is (I couldn't find it).
A filter in a bool query is just a way to disable the score in the sub queries but in terms of matching it is equivalent to a must clause.

@jimczi
Naturally. It's here:

[...] If the bool query is a <>
or has neither must or filter then at least one of the should queries
must match a document for it to match the bool query

And here:

If this query is used in a filter context and it has should
clauses then at least one should clause is required to match.

(And probably in other places as well.)

To be clear about what I meant in this comment, when running the query in the original issue, the builder for the boolean query sees context.isFilter() == false and leaves minimumShouldMatch set to default, which is no required optional clauses in the lucene BooleanQuery

Yes thanks @andyb-elastic and sorry @telendt I completely skipped the fact that we have a different behavior for the boolean query in es. This is not the case at the Lucene level where I checked. I still don't understand why we need to change minimumShouldMatch in a filter context. I think this is due to an old behavior in an old version but I don't think it is required anymore. The query/filter context should be handled at the Lucene level only so I'll work on a pr to see if we can remove this buggy behavior in es. Bottom line is that should clauses in a filter clause should behave the same if put in a non-filtering clause (must).

@jimczi Thanks, it makes sense. So the plan is just to change BoolQueryBuilder so that filter context doesn't set it up minimumShouldMatch = "1"? We don't need any changes for function score query, right?

Also if we get rid of setting minimumShouldMatch = "1", it would not make sense to have should clause inside any filter context, as this filter will be useless, as all documents will be returned any way.

So the plan is just to change BoolQueryBuilder so that filter context doesn't set it up minimumShouldMatch = "1"? We don't need any changes for function score query, right?

I opened https://github.com/elastic/elasticsearch/pull/35354 to completely remove the filter context.

Also if we get rid of setting minimumShouldMatch = "1", it would not make sense to have should clause inside any filter context, as this filter will be useless, as all documents will be returned any way.

True but I find very confusing to set a different minimumShouldMatch based on the context. That's not how Lucene does so we should be consistent here. In Lucene should clauses are ignored in a filter context only if they are mixed with other required clauses, otherwise at least one clause should match.
IMHO setting a different minimum should match in es seems arbitrary and makes the interpretation of should clauses difficult for users (and devs that are not aware of this behavior ;) ).

IMHO setting a different minimum should match in es seems arbitrary and makes the interpretation of should clauses difficult for users (and devs that are not aware of this behavior ;) ).

I can confirm that. I understand Lucene's implicit minimumShouldMatch=1 when only should clauses are present -- in any other case I prefer setting minimumShouldMatch explicitly.

In this particular case the problem popped up when one other dev in my team moved boolean query from filter (that relied on this behaviour) and pasted it into function's filter. We were surprised to see that different set of documents got boosted from the set of documents previously returned by the filter.

It looks that you added this behaviour to BoolQueryBuilder on top of Lucene's default. My complain here is more about of lack of consistency between regular filter and function filter but I don't mind if you drop your concept of filter context and fall back to Lucene's default behaviour. It's just that it's a breaking change and many users don't like that (which you are probably perfectly aware of).

It's just that it's a breaking change and many users don't like that (which you are probably perfectly aware of).

I share this concern which is why we'd make this change in a major release (7.0) if we decide that it is the right thing to do.

The filter is correctly executed in the filter context, what happens here is that we set the default score to 1 for functions that don't define a [score functions]:(https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#score-functions)
This is why a weight of 100 produces a score equals to 100. You can set weight to 0 if you want to disable the score entirely in the function.
However I agree that the documentation should indicate the default score for functions that contains a filter and no score-function.

@jimczi Hi Jim, I was wondering if we could reset the default score (which is 1) to any value that we could config (like 0). And if we could do that, then how? Thanks a lot!

Was this page helpful?
0 / 5 - 0 ratings