Elasticsearch: Broken sort on multiple-level nested documents

Created on 17 Jul 2018  路  5Comments  路  Source: elastic/elasticsearch

Elasticsearch version : 6.3.1 and below

JVM version : 1.8.0_171

OS version : Ubuntu 16.04 LTS

Expected behaviours :
correct sort. The family with id=2 should get a sort value of 30 in the example below.

Problem description :
faulty sort when querying on a 3-levels nested objects model, and sorting parent objects on a field from the lower level. In the example below, family with id=2 is getting a sort value of 10 while it should be 30 (the value 10 doesn't even appear in the document with id=2).

Steps to reproduce:

  1. Create index
    PUT tree { "settings": {"number_of_shards": 1,"number_of_replicas": 0 } }

  2. Put mapping
    PUT tree/family/_mapping {"properties":{"name":{"type":"keyword"},"members":{"type":"nested","properties":{"firstname":{"type":"keyword"},"color":{"type":"keyword"},"levels":{"type":"nested","properties":{"strength":{"type":"integer"}}}}}}}

  3. Insert data (bulk index API)
    POST _bulk { "index" : { "_index" : "tree", "_type" : "family", "_id" : "1" } } {"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":10}},{"firstName":"Serge","color":"brown","levels":{"strength":15}},{"firstName":"Marie","color":"brown","levels":{"strength":20}}]} { "index" : { "_index" : "tree", "_type" : "family", "_id" : "2" } } {"name":"Simpson","members":[{"firstName":"Homer","color":"brown","levels":{"strength":30}},{"firstName":"Lisa","color":"brown","levels":{"strength":40}},{"firstName":"Marge","color":"brown","levels":{"strength":60}}]} { "index" : { "_index" : "tree", "_type" : "family", "_id" : "3" } } {"name":"Simpson","members":[{"firstName":"Bart","color":"yellow","levels":{"strength":70}},{"firstName":"Snowball","color":"yellow","levels":{"strength":80}},{"firstName":"Maggie","color":"yellow","levels":{"strength":90}},{"firstName":"Gandpa","color":"brown","levels":{"strength":95}}]}

  4. Query
    GET tree/_search { "query": { "bool": { "filter": [ { "term": { "name": { "value": "Simpson" } } }, { "nested": { "path" : "members", "query": { "bool" : { "filter" : [ { "term" : { "members.color" : { "value" : "brown" } } } ] } } } } ] } }, "sort": [ { "members.levels.strength": { "order": "asc", "nested": { "path": "members", "filter": { "term" : { "members.color" : { "value" : "brown" } } }, "nested": { "path": "members.levels" } } } } ] }

  5. Results
    ``
    {
    "hits": {
    "total": 2,
    "max_score": null,
    "hits": [
    {
    "_index": "tree",
    "_type": "family",
    "_id": "2",
    "_score": null,
    "_source": {
    "name": "Simpson",
    "members": [
    {
    "firstName": "Homer",
    "color": "brown",
    "levels": {
    "strength": 30
    }
    },
    {
    "firstName": "Lisa",
    "color": "brown",
    "levels": {
    "strength": 40
    }
    },
    {
    "firstName": "Marge",
    "color": "brown",
    "levels": {
    "strength": 60
    }
    }
    ]
    },
    "sort": [
    10
    ]
    },
    ...
    ]
    }
    }

    ``

Note that the result of the query above is correct if the index API was used instead of the bulk API, using the commands below :
POST tree/family {"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":10}},{"firstName":"Serge","color":"brown","levels":{"strength":15}},{"firstName":"Marie","color":"brown","levels":{"strength":20}}]} POST tree/family {"name":"Simpson","members":[{"firstName":"Homer","color":"brown","levels":{"strength":30}},{"firstName":"Lisa","color":"brown","levels":{"strength":40}},{"firstName":"Marge","color":"brown","levels":{"strength":60}}]} POST tree/family {"name":"Simpson","members":[{"firstName":"Bart","color":"yellow","levels":{"strength":70}},{"firstName":"Snowball","color":"yellow","levels":{"strength":80}},{"firstName":"Maggie","color":"yellow","levels":{"strength":90}},{"firstName":"Gandpa","color":"brown","levels":{"strength":95}}]}

_See following discussion : https://discuss.elastic.co/t/issue-sorting-nested-documents-indexed-via-bulk/139164_

:SearcSearch >bug

Most helpful comment

This might be related to the problem under discussion in https://github.com/elastic/elasticsearch/issues/31554

Not quite the same (no missing fields), but similar symptoms: wrong sort values getting picked up.

All 5 comments

Pinging @elastic/es-search-aggs

@JulienColin thanks for raising this here, and thanks for the great reproduction. I was able to see similiar behaviour locally on 6.3.0.
For anybody interested in reproducing this quickly on console I put the parts of the above reproduction together in a nice to copy&paste Console script here: https://gist.github.com/cbuescher/f9c8c2132d2667d3e907a6283d3f171a

Whats indeed weird is that in the case of bulk indexing, the sort-value for document "2" seems to get picked up from the smallest "strength"-value in document "1". If I e.g. change this to {"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":12}}]} in the bulk example, I get "12" as the sort value of doc "2" in the response

This might be related to the problem under discussion in https://github.com/elastic/elasticsearch/issues/31554

Not quite the same (no missing fields), but similar symptoms: wrong sort values getting picked up.

Hello , thank you @cbuescher for your help, and @polyfractal for pointing out the similarities.
Though I am not sure the issues are the same, as mine is reproductible 100% of the times, whereas it seems to reproduce only under very precise circumstances in #31554 . Anyway, it would be interesting to see if the fix proposed there fixes this issue as well.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rjernst picture rjernst  路  3Comments

dadoonet picture dadoonet  路  3Comments

jasontedor picture jasontedor  路  3Comments

abtpst picture abtpst  路  3Comments

brwe picture brwe  路  3Comments