Elasticsearch version : 6.3.1 and below
JVM version : 1.8.0_171
OS version : Ubuntu 16.04 LTS
Expected behaviours :
correct sort. The family with id=2 should get a sort value of 30 in the example below.
Problem description :
faulty sort when querying on a 3-levels nested objects model, and sorting parent objects on a field from the lower level. In the example below, family with id=2 is getting a sort value of 10 while it should be 30 (the value 10 doesn't even appear in the document with id=2).
Steps to reproduce:
Create index
PUT tree
{ "settings": {"number_of_shards": 1,"number_of_replicas": 0 } }
Put mapping
PUT tree/family/_mapping
{"properties":{"name":{"type":"keyword"},"members":{"type":"nested","properties":{"firstname":{"type":"keyword"},"color":{"type":"keyword"},"levels":{"type":"nested","properties":{"strength":{"type":"integer"}}}}}}}
Insert data (bulk index API)
POST _bulk
{ "index" : { "_index" : "tree", "_type" : "family", "_id" : "1" } }
{"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":10}},{"firstName":"Serge","color":"brown","levels":{"strength":15}},{"firstName":"Marie","color":"brown","levels":{"strength":20}}]}
{ "index" : { "_index" : "tree", "_type" : "family", "_id" : "2" } }
{"name":"Simpson","members":[{"firstName":"Homer","color":"brown","levels":{"strength":30}},{"firstName":"Lisa","color":"brown","levels":{"strength":40}},{"firstName":"Marge","color":"brown","levels":{"strength":60}}]}
{ "index" : { "_index" : "tree", "_type" : "family", "_id" : "3" } }
{"name":"Simpson","members":[{"firstName":"Bart","color":"yellow","levels":{"strength":70}},{"firstName":"Snowball","color":"yellow","levels":{"strength":80}},{"firstName":"Maggie","color":"yellow","levels":{"strength":90}},{"firstName":"Gandpa","color":"brown","levels":{"strength":95}}]}
Query
GET tree/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"name": {
"value": "Simpson"
}
}
},
{
"nested": {
"path" : "members",
"query": {
"bool" : {
"filter" : [
{
"term" : {
"members.color" : {
"value" : "brown"
}
}
}
]
}
}
}
}
]
}
},
"sort": [
{
"members.levels.strength": {
"order": "asc",
"nested": {
"path": "members",
"filter": {
"term" : {
"members.color" : {
"value" : "brown"
}
}
},
"nested": {
"path": "members.levels"
}
}
}
}
]
}
Results
``
{
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "tree",
"_type": "family",
"_id": "2",
"_score": null,
"_source": {
"name": "Simpson",
"members": [
{
"firstName": "Homer",
"color": "brown",
"levels": {
"strength": 30
}
},
{
"firstName": "Lisa",
"color": "brown",
"levels": {
"strength": 40
}
},
{
"firstName": "Marge",
"color": "brown",
"levels": {
"strength": 60
}
}
]
},
"sort": [
10
]
},
...
]
}
}
Note that the result of the query above is correct if the index API was used instead of the bulk API, using the commands below :
POST tree/family
{"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":10}},{"firstName":"Serge","color":"brown","levels":{"strength":15}},{"firstName":"Marie","color":"brown","levels":{"strength":20}}]}
POST tree/family
{"name":"Simpson","members":[{"firstName":"Homer","color":"brown","levels":{"strength":30}},{"firstName":"Lisa","color":"brown","levels":{"strength":40}},{"firstName":"Marge","color":"brown","levels":{"strength":60}}]}
POST tree/family
{"name":"Simpson","members":[{"firstName":"Bart","color":"yellow","levels":{"strength":70}},{"firstName":"Snowball","color":"yellow","levels":{"strength":80}},{"firstName":"Maggie","color":"yellow","levels":{"strength":90}},{"firstName":"Gandpa","color":"brown","levels":{"strength":95}}]}
_See following discussion : https://discuss.elastic.co/t/issue-sorting-nested-documents-indexed-via-bulk/139164_
Pinging @elastic/es-search-aggs
@JulienColin thanks for raising this here, and thanks for the great reproduction. I was able to see similiar behaviour locally on 6.3.0.
For anybody interested in reproducing this quickly on console I put the parts of the above reproduction together in a nice to copy&paste Console script here: https://gist.github.com/cbuescher/f9c8c2132d2667d3e907a6283d3f171a
Whats indeed weird is that in the case of bulk indexing, the sort-value for document "2" seems to get picked up from the smallest "strength"-value in document "1". If I e.g. change this to {"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":12}}]} in the bulk example, I get "12" as the sort value of doc "2" in the response
This might be related to the problem under discussion in https://github.com/elastic/elasticsearch/issues/31554
Not quite the same (no missing fields), but similar symptoms: wrong sort values getting picked up.
Hello , thank you @cbuescher for your help, and @polyfractal for pointing out the similarities.
Though I am not sure the issues are the same, as mine is reproductible 100% of the times, whereas it seems to reproduce only under very precise circumstances in #31554 . Anyway, it would be interesting to see if the fix proposed there fixes this issue as well.
Most helpful comment
This might be related to the problem under discussion in https://github.com/elastic/elasticsearch/issues/31554
Not quite the same (no missing fields), but similar symptoms: wrong sort values getting picked up.