Elasticsearch: Bool 'must' query with "minimum_should_match": "1", then result is empty

Created on 29 Nov 2016  路  11Comments  路  Source: elastic/elasticsearch

Elasticsearch version:

$ curl localhost:9200
{
  "name" : "xXsJYfj",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "dIRPk3KwQJ-B0bHGPEI5FA",
  "version" : {
    "number" : "5.0.1",
    "build_hash" : "080bb47",
    "build_date" : "2016-11-11T22:08:49.812Z",
    "build_snapshot" : false,
    "lucene_version" : "6.2.1"
  },
  "tagline" : "You Know, for Search"
}

Plugins installed: []

JVM version:

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

OS version: xubuntu 16.04 64bit

Description of the problem including expected versus actual behavior:
I try to use three 'match' query nested in one 'must' query, every match query 'match' is occurred, if don't set "minimum_should_match": "1", the result is fine, but, if set "minimum_should_match": "1", then the result is empty, I don't think this option should effect 'must' query result.

Steps to reproduce:

  1. build a demo index
curl -X DELETE 'localhost:9200/i/'

curl -XPUT http://localhost:9200/i/ -d' 
{
    "index" : {
    }
}'

curl -XPOST http://localhost:9200/i/blog/_mapping -d'
{
    "blog": {
        "properties": {
            "prefix": {
                "type": "string", 
                "analyzer": "default"
            },
            "name": {
                "type": "string", 
                "analyzer": "default"
            },
            "original": {
                "type": "string",
                "analyzer": "default"
            }
        }
    }
}'

curl -XPOST http://localhost:9200/_bulk?pretty -d'
{ "index":  { "_index": "i", "_type": "blog"}}
{ "prefix": "this is a test", "name": "this is a test", "original": "this is a test"}
{ "index":  { "_index": "i", "_type": "blog"}}
{ "prefix": "this is another test", "name": "this is another test", "original": "this is another test"}
'
  1. search the index with "minimum_should_match": "1"
curl -XGET http://localhost:9200/i/blog/_search?pretty -d '
{
    "from" : 0,
    "size" : 10,
    "query": {
        "bool": {
            "minimum_should_match": "1",
            "must": [
                {
                    "match": {
                        "name": {
                            "query": "test",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "a",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "this",
                            "type": "phrase"
                        }
                    }
                }
            ]
        }
    }
}
'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}
  1. search the index without "minimum_should_match": "1"
curl -XGET http://localhost:9200/i/blog/_search?pretty -d '
{
    "from" : 0,
    "size" : 10,
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name": {
                            "query": "test",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "a",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "this",
                            "type": "phrase"
                        }
                    }
                }
            ]
        }
    }
}
'

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0577903,
    "hits" : [
      {
        "_index" : "i",
        "_type" : "blog",
        "_id" : "AVit2k-adyQR6_lrinhz",
        "_score" : 1.0577903,
        "_source" : {
          "prefix" : "this is a test",
          "name" : "this is a test",
          "original" : "this is a test"
        }
      }
    ]
  }
}
:SearcSearch >bug discuss

Most helpful comment

I still think the current behaviour is better than the old one, ignoring parameters because we think they were put there by mistake feels wrong to me. If we want minimum_should_match to depend on the number of should clauses, we can use the conditional syntax? For instance based on the above use-case and query, I think the query should look like this:

{
    "from" : 0,
    "size" : 10,
    "query": {
        "bool": {
            "minimum_should_match": "0<1",
            "must": [
                {
                    "match": {
                        "name": {
                            "query": "test",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "a",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "this",
                            "type": "phrase"
                        }
                    }
                }
            ]
        }
    }
}

This will require 0 matching should clauses if the query has no should clauses and 1 otherwise.

All 11 comments

But you are using must clauses here not should clauses. Why this?

@dadoonet I use golang elastic.v3 client to build the DSL query, at some cases there is no 'should' clause under the bool query, I know I can check the query first, then, set "minimum_should_match" or not, but I think maybe elasticsearch service can do this itself. in elasticsearch version 2.3 this is fine.

I can indeed reproduce it:

DELETE test
PUT test
POST test/blog/_bulk
{ "index":  { }}
{ "name": "this is a test" }
{ "index":  { }}
{ "name": "this is another test" }
GET test/blog/_search
{
    "query": {
        "bool": {
            "minimum_should_match": "1",
            "must": [
                {
                    "match": {
                        "name": {
                            "query": "test",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "a",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "this",
                            "type": "phrase"
                        }
                    }
                }
            ]
        }
    }
}

Not sure what is the solution here but may be we should reject the query if minimum_should_match is used without any should clause?
@jpountz WDYT?

This is the result of this change https://github.com/elastic/elasticsearch/pull/15571 which was opened because of https://github.com/elastic/elasticsearch/issues/15521.

minimum_should_match used to be adjusted based on how many should clauses there were, and now it isn't. @jpountz though this was a bug (https://github.com/elastic/elasticsearch/issues/15521#issuecomment-165777211) but I'm not sure I agree with him. To satisfy correctness in an edge case, it's made working with min-should-match harder for the ordinary case.

I still think the current behaviour is better than the old one, ignoring parameters because we think they were put there by mistake feels wrong to me. If we want minimum_should_match to depend on the number of should clauses, we can use the conditional syntax? For instance based on the above use-case and query, I think the query should look like this:

{
    "from" : 0,
    "size" : 10,
    "query": {
        "bool": {
            "minimum_should_match": "0<1",
            "must": [
                {
                    "match": {
                        "name": {
                            "query": "test",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "a",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "name": {
                            "query": "this",
                            "type": "phrase"
                        }
                    }
                }
            ]
        }
    }
}

This will require 0 matching should clauses if the query has no should clauses and 1 otherwise.

@jpountz much better solution. Closing

@djschny and I ran into an interesting variation of this issue during the training today. Here is the repro. I am not sure about query_string requests, but I was totally expecting the match query to fail with some meaningful error instead of silently returning no errors.

curl -XDELETE "localhost:9200/test?pretty"
curl -XPUT "localhost:9200/test/doc/1?pretty&refresh" -d '{
  "foo": "bar baz"
}'
curl -XGET "localhost:9200/test/_search?pretty" -d '{
  "query": {
    "match": {
      "foo": {
        "query": "bar baz",
        "operator": "and",
        "minimum_should_match": 1
      }
    }
  }
}'
curl -XGET "localhost:9200/test/_search?pretty" -d '{
  "query": {
    "query_string": {
      "query": "bar baz",
      "default_operator": "and",
      "minimum_should_match": 1
    }
  }
}'
curl -XGET "localhost:9200/test/_search?pretty" -d '{
  "query": {
    "query_string": {
      "query": "+bar +baz",
      "minimum_should_match": 1
    }
  }
}'

The same applies here - using minimum_should_match with 0<1 or 80% both work correctly. I'd say that using a percentage for matching against a query string makes much more sense than specifying an absolute value. I think what @jpountz said still stands.

What could be improved is the documentation.

@jpountz do you think this change (which basically says that using minimum_should_match: [value_greater_than_0] with only must statements will match no documents) should be documented somehow? It is not clear from the three docs I looked at - bool documentation, query_string documentation (here the minimum_should_match combined with default_operator: and will lead to the same result), minimum_should_match own page - that this is a tricky combination that users should be aware of.

I think that we need to improve the documentation. There is no indication of this change, and looking at the current behavior it changed since 2.x.

Add a mention of this to the docs please 馃憤 Got this when updating our 2.x elastic to 6.x.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

abtpst picture abtpst  路  3Comments

rpalsaxena picture rpalsaxena  路  3Comments

matthughes picture matthughes  路  3Comments

dawi picture dawi  路  3Comments

rjernst picture rjernst  路  3Comments