I noticed then when using a named query within a nested query, the top-level field matched_fields contains the wrong names.
It looks like something is evaluated in the top-level context instead of within the nested context.
Reprex
(with 2 different examples within):
Suppose I have the following index, with this document:
PUT newindex
{
"mappings": {
"properties": {
"root": {
"type": "nested",
"properties": {
"foo": {
"type": "keyword"
},
"bar": {
"type": "keyword"
}
}
}
}
}
}
PUT newindex/_doc/1
{
"root": [
{
"foo": "gdhjkl",
"bar": "gdhjkl2"
},
{
"foo": "not_filled"
}
]
}
And I want to write a query that returns "incomplete" docs, i.e. those with not all values present for foo and bar, or where foo contains the value not_filled:
GET newindex/_search
{
"query": {
"nested": {
"path": "root",
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "root.foo"
}
}
],
"_name": "no foo"
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "root.bar"
}
}
],
"_name": "no bar"
}
},
{
"match": {
"root.foo": {
"query": "not_filled",
"_name": "foo has wrong value"
}
}
}
]
}
},
"inner_hits": {}
}
}
}
Actual output:
{
...
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"root" : [
{
"foo" : "gdhjkl",
"bar" : "gdhjkl2"
},
{
"foo" : "not_filled"
}
]
},
"matched_queries" : [
"no bar",
"no foo"
],
"inner_hits" : {
"root" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "root",
"offset" : 1
},
"_score" : 0.6931472,
"_source" : {
"foo" : "not_filled"
},
"matched_queries" : [
"foo has wrong value",
"no bar"
]
... (closing brackets/braces)...
Expected output
In the top-level matched_queries-field, I'd expect either nothing (as the queries only hit nested documents, nothing on the top-level), or ["no bar", "foo has wrong value"] (order is not really relevant)
In this case, the name no foo is returned, even though this query did not give me a hit.
But at the same time, the name foo has wrong value is not returned, even though it did cause a hit.
Note that the names in the inner_hits-section are as expected
On the top-level field matched_queries, I'd either:
[]. Any matched queries from a nested query can then be shown within the inner_hits-section (as they are now as well)matched_queries-field in the inner_hits-section.My impression is that the query itself is executed correctly, but after that, the named queries are evaluated seperately, only looking at the document as a whole (where the fields root.foo and root.bar are missing, as they are not really part of the bare document, and on the other hand the match-query on foo fails)
Fix
For option 1-output, we could simply ignore any named queries within nested-queries when computing the top-level matched_queries-field.
Option 2-output may be a bit more complicated, but we could compute the option1-set of names, then later combine then with the names from the inner_hits (eventually removing duplicates)
System details:
Elasticsearch version 7.3.1 (and also seen on 6.8.2)
JVM version (java -version): 1.8.0_221
OS version: Windows 10 (64-bit)
Update
I've been trying some more things, and I've found some even worse behaviour.
When there are multiple nested fields, and when querying on both, some of the names of the query-part on A end up in the inner hits of field B.
I'm sorry this will be quite a lot of code, but I couldn't get it much more minimal while still showing what I mean.
The problem is right at the bottom, where you see the queries no field4 and no field5 mentioned in the inner hits from the root-field.
Note the difference between the 2 parts: the named queries from otherroot do leak over to the part from root, but not the other way around. I'm asuming order may have something to do with that, but I haven't tested that hypothesis
New mapping
PUT newindex
{
"mappings": {
"properties": {
"root": {
"type": "nested",
"properties": {
"foo": {
"type": "keyword"
},
"bar": {
"type": "keyword"
}
}
},
"baz": {
"type": "keyword"
},
"otherroot": {
"type": "nested",
"properties": {
"field4": {
"type": "keyword"
},
"field5": {
"type": "keyword"
}
}
}
}
}
}
New document
PUT newindex/_doc/1
{
"root": [
{
"foo": "gdhjkl",
"bar": "gdhjkl2"
},
{
"foo": "not_filled"
}
],
"baz": "someval",
"otherroot": [
{
"field4": "fwuvesd",
"field5": "gbsduil"
},
{
"field4": "dnbfjskl"
}
]
}
Query
GET newindex/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "root",
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "root.foo"
}
}
],
"_name": "no foo"
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "root.bar"
}
}
],
"_name": "no bar"
}
},
{
"match": {
"root.foo": {
"query": "not_filled",
"_name": "foo has wrong value"
}
}
}
]
}
},
"inner_hits": {}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "baz"
}
}
],
"_name": "no baz"
}
},
{
"nested": {
"path": "otherroot",
"query": {
"bool": {
"should": [
{
"bool": {
"must_not": [
{
"exists": {
"field": "otherroot.field4"
}
}
],
"_name": "no field4"
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "otherroot.field5"
}
}
],
"_name": "no field5"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Output
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"root" : [
{
"foo" : "gdhjkl",
"bar" : "gdhjkl2"
},
{
"foo" : "not_filled"
}
],
"baz" : "someval",
"otherroot" : [
{
"field4" : "fwuvesd",
"field5" : "gbsduil"
},
{
"field4" : "dnbfjskl"
}
]
},
"matched_queries" : [
"no bar",
"no foo",
"no field5",
"no field4"
],
"inner_hits" : {
"otherroot" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "otherroot",
"offset" : 1
},
"_score" : 0.0,
"_source" : {
"field4" : "dnbfjskl"
},
"matched_queries" : [
"no field5"
]
}
]
}
},
"root" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "newindex",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "root",
"offset" : 1
},
"_score" : 0.6931472,
"_source" : {
"foo" : "not_filled"
},
"matched_queries" : [
"foo has wrong value",
"no bar",
"no field5",
"no field4"
]
}
]
}
}
}
}
]
}
}
Pinging @elastic/es-search
Hey, my team just encountered this exact bug.
We would love some input on this from the elastic team.
Thank you
Most helpful comment
Hey, my team just encountered this exact bug.
We would love some input on this from the elastic team.
Thank you