Elasticsearch version:
$ curl localhost:9200
{
"name" : "xXsJYfj",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "dIRPk3KwQJ-B0bHGPEI5FA",
"version" : {
"number" : "5.0.1",
"build_hash" : "080bb47",
"build_date" : "2016-11-11T22:08:49.812Z",
"build_snapshot" : false,
"lucene_version" : "6.2.1"
},
"tagline" : "You Know, for Search"
}
Plugins installed: []
JVM version:
$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
OS version: xubuntu 16.04 64bit
Description of the problem including expected versus actual behavior:
I try to use three 'match' query nested in one 'must' query, every match query 'match' is occurred, if don't set "minimum_should_match": "1", the result is fine, but, if set "minimum_should_match": "1", then the result is empty, I don't think this option should effect 'must' query result.
Steps to reproduce:
curl -X DELETE 'localhost:9200/i/'
curl -XPUT http://localhost:9200/i/ -d'
{
"index" : {
}
}'
curl -XPOST http://localhost:9200/i/blog/_mapping -d'
{
"blog": {
"properties": {
"prefix": {
"type": "string",
"analyzer": "default"
},
"name": {
"type": "string",
"analyzer": "default"
},
"original": {
"type": "string",
"analyzer": "default"
}
}
}
}'
curl -XPOST http://localhost:9200/_bulk?pretty -d'
{ "index": { "_index": "i", "_type": "blog"}}
{ "prefix": "this is a test", "name": "this is a test", "original": "this is a test"}
{ "index": { "_index": "i", "_type": "blog"}}
{ "prefix": "this is another test", "name": "this is another test", "original": "this is another test"}
'
curl -XGET http://localhost:9200/i/blog/_search?pretty -d '
{
"from" : 0,
"size" : 10,
"query": {
"bool": {
"minimum_should_match": "1",
"must": [
{
"match": {
"name": {
"query": "test",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "a",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "this",
"type": "phrase"
}
}
}
]
}
}
}
'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
curl -XGET http://localhost:9200/i/blog/_search?pretty -d '
{
"from" : 0,
"size" : 10,
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "test",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "a",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "this",
"type": "phrase"
}
}
}
]
}
}
}
'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0577903,
"hits" : [
{
"_index" : "i",
"_type" : "blog",
"_id" : "AVit2k-adyQR6_lrinhz",
"_score" : 1.0577903,
"_source" : {
"prefix" : "this is a test",
"name" : "this is a test",
"original" : "this is a test"
}
}
]
}
}
But you are using must clauses here not should clauses. Why this?
@dadoonet I use golang elastic.v3 client to build the DSL query, at some cases there is no 'should' clause under the bool query, I know I can check the query first, then, set "minimum_should_match" or not, but I think maybe elasticsearch service can do this itself. in elasticsearch version 2.3 this is fine.
I can indeed reproduce it:
DELETE test
PUT test
POST test/blog/_bulk
{ "index": { }}
{ "name": "this is a test" }
{ "index": { }}
{ "name": "this is another test" }
GET test/blog/_search
{
"query": {
"bool": {
"minimum_should_match": "1",
"must": [
{
"match": {
"name": {
"query": "test",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "a",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "this",
"type": "phrase"
}
}
}
]
}
}
}
Not sure what is the solution here but may be we should reject the query if minimum_should_match
is used without any should
clause?
@jpountz WDYT?
This is the result of this change https://github.com/elastic/elasticsearch/pull/15571 which was opened because of https://github.com/elastic/elasticsearch/issues/15521.
minimum_should_match used to be adjusted based on how many should clauses there were, and now it isn't. @jpountz though this was a bug (https://github.com/elastic/elasticsearch/issues/15521#issuecomment-165777211) but I'm not sure I agree with him. To satisfy correctness in an edge case, it's made working with min-should-match harder for the ordinary case.
I still think the current behaviour is better than the old one, ignoring parameters because we think they were put there by mistake feels wrong to me. If we want minimum_should_match
to depend on the number of should clauses, we can use the conditional syntax? For instance based on the above use-case and query, I think the query should look like this:
{
"from" : 0,
"size" : 10,
"query": {
"bool": {
"minimum_should_match": "0<1",
"must": [
{
"match": {
"name": {
"query": "test",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "a",
"type": "phrase"
}
}
},
{
"match": {
"name": {
"query": "this",
"type": "phrase"
}
}
}
]
}
}
}
This will require 0 matching should clauses if the query has no should clauses and 1 otherwise.
@jpountz much better solution. Closing
@djschny and I ran into an interesting variation of this issue during the training today. Here is the repro. I am not sure about query_string
requests, but I was totally expecting the match query to fail with some meaningful error instead of silently returning no errors.
curl -XDELETE "localhost:9200/test?pretty"
curl -XPUT "localhost:9200/test/doc/1?pretty&refresh" -d '{
"foo": "bar baz"
}'
curl -XGET "localhost:9200/test/_search?pretty" -d '{
"query": {
"match": {
"foo": {
"query": "bar baz",
"operator": "and",
"minimum_should_match": 1
}
}
}
}'
curl -XGET "localhost:9200/test/_search?pretty" -d '{
"query": {
"query_string": {
"query": "bar baz",
"default_operator": "and",
"minimum_should_match": 1
}
}
}'
curl -XGET "localhost:9200/test/_search?pretty" -d '{
"query": {
"query_string": {
"query": "+bar +baz",
"minimum_should_match": 1
}
}
}'
The same applies here - using minimum_should_match
with 0<1
or 80%
both work correctly. I'd say that using a percentage for matching against a query string makes much more sense than specifying an absolute value. I think what @jpountz said still stands.
What could be improved is the documentation.
@jpountz do you think this change (which basically says that using minimum_should_match: [value_greater_than_0]
with only must
statements will match no documents) should be documented somehow? It is not clear from the three docs I looked at - bool
documentation, query_string
documentation (here the minimum_should_match
combined with default_operator: and
will lead to the same result), minimum_should_match
own page - that this is a tricky combination that users should be aware of.
I think that we need to improve the documentation. There is no indication of this change, and looking at the current behavior it changed since 2.x.
Add a mention of this to the docs please 馃憤 Got this when updating our 2.x elastic to 6.x.
Most helpful comment
I still think the current behaviour is better than the old one, ignoring parameters because we think they were put there by mistake feels wrong to me. If we want
minimum_should_match
to depend on the number of should clauses, we can use the conditional syntax? For instance based on the above use-case and query, I think the query should look like this:This will require 0 matching should clauses if the query has no should clauses and 1 otherwise.