Elasticsearch: Sorting based on parent/child relationship

Created on 19 Apr 2013  路  72Comments  路  Source: elastic/elasticsearch

Currently there is no way to sort documents based on parent child relation. E.g.
Sorting a doc based on child doc field or the opposite.

:SearcSearch >feature help wanted high hanging fruit

Most helpful comment

+1

For now, In my situation (sorting by parent.field), I'm working with function_score.

GET /index/child/_search
{
  "query": {
    "has_parent": {
      "parent_type": "parent",
      "score_mode": "score", 
      "query": {
        "function_score": {
          "script_score": {
            "script": "_score * doc['id'].value"
          }
        }
      }
    }
  }
} 

All 72 comments

@kul At the moment this isn't possible. This feature will be added in the near future.

For now you use @clintongormley's excellent workaround: http://stackoverflow.com/questions/14504180/elasticsearch-sorting-parents-through-child-values/14519947#14519947

This workaround allows you to sort on child values by using custom_score as child query.

oh wow! if i can specify a query, it mean limitless possibilities for sorting using nested has_child/has_parent clause.

Thanks

Really looking forward to this feature - we use parent/child relationships extensively and right now have to copy children values on parent object to sort on them. Will give the work-around a try but hopefully we'll see this in the .90 series too ;) Thank you!

Is it true that the work-around requires you to leveraged nested mappings instead of a true/parent child relationship for the sorting to work? thanks!

On 5/14/2013 10:52 AM, Grant Gochnauer wrote:

Is it true that the work-around requires you to leveraged nested
mappings instead of a true/parent child relationship for the sorting
to work? thanks!

http://www.elasticsearch.org/guide/reference/query-dsl/has-child-query/
"The |has_child| also has scoring support from version |0.20.2|. The
supported score types are |max|, |sum|, |avg| or |none|"

Without having seen Clinton's post, but having discussed it a bit on the
list, what I was looking for was the youngest child, so I used "max" to
good effect to find parents with the youngest child ("newest parents").
Originally I was playing with top_children, but has_children was what I
really needed. The field that becomes my score is the date of a child file.

The problem that a score is a Float, so you can have round off problems
when you convert a 64-bit Date long into a 32-bit Float. This round off
can loose seconds, more often milliseconds. Since my "children" are
actually file instances. I couldn't come up with any brilliant formula
to stay away from the round off, because two files can have dates very
close together that are not resolvable in the digits of a float.

If the score was a double I would be able use ~16 (base 10) digits of
accuracy to better effect and rarely have round off of dates, so I hope
someone changes how a score is stored in the entire Elastic Search and
Lucene infrastructure from a Float to a Double :) That is an easy
change isn't it? :)

-Paul

Thanks for the reply P-Hill... We are developing an API that allows for an arbitrary sort on child fields which are different depending on who is leveraging our API. In other words, without built in support for sorting on child document fields, we aren't able to use the custom score very well.

Thanks

+2
My example is just one case where a pretty simple thing like a datetime
doesn't actually work to send through as a score. I'm glad this is coming.
If somehow a result set of parents has a field from a matched child
field, it would seem this could lead to other requested features like
returning the one max/min/avg value or even a list of matching value
(forget sorting them). Since there seemed to be many requests for
various things related to knowledge about the actual child matches of a
parent, this should be a useful API.

On 5/15/2013 6:24 AM, Grant Gochnauer wrote:

We are developing an API that allows for an arbitrary sort on child
fields which are different depending on who is leveraging our API.

+1
Does any one know of any ways for fetching the _parent doc rather than just the _parent uid using the script field?
e.g.
"script" : "_source._parent[\"somefield\"].value"
thanks! because if this is possible sorting using parent/child would be realized even if it is not optimized.

Very important feature since it's computationally hard to update thousands of documents when all you need is update only one field in a big document and than make sorting by this field. For example contacts which have property like last contacted which changes very frequently but not whole contact. Update api doesn't solve my case since enabling _source will increase my index a lot.

+1

+1

Hope to see this natively supported in ES!

though the memory signature it leaves and the cost of compute is slightly high, this will be one of the most used feature if it comes out in ES. Eagerly looking forward to it!

+10

+1 IMHO, it's definitely one the top missing feature, along with:

Are there any plans to support this feature in the foreseeable future?

Once the refactoring in #8134 is in, this is planned to be added. Like with the current refactoring the sorting by child or parent field should be added to the new Lucene query time join first.

This is very exciting. Is there a plan of integrating this in an upcoming release, now that #8134 is solved?

@kul @martijnvg @clintongormley How do I do the workaround mentioned in http://stackoverflow.com/questions/14504180/elasticsearch-sorting-by-nested-documents-values/14519947#14519947 for a parent child relationship? In my script field what should replace "doc['locations.order'].value" to refer to the child document's field?

Thanks in advance

would really like to see this implemented

@martijnvg With #8134 in, do you see a way forward for implementing this?

@clintongormley Yes, I do see a way how this can be implemented. Similar to how the join is implemented, but instead of aggregating child scores per parent the sorting should aggregate sort values instead.

@martijnvg - Do you have an idea on when this might be implemented? I am curious because this is a blocking issue for me for using parent/child relationships, which I would otherwise much prefer over nested documents.

@jaimemarijke No, there is no effort being done yet to get this feature in.

martijnvg commented on 19 Apr 2013
@kul At the moment this isn't possible. This feature will be added _in the near future._

@martijnvg so, what means _in the near future_?

+1

+1

+1

+1

+1

This link at SO actually helped me achieving what we need.

+1

+1

+1

In my case, I don't need to wait for the the issue to be solved. Follow the workaround in 2013, something can be done as follows.

https://gist.github.com/robinloxley1/7ea7c4f37a3413b1ca16

It will work only on single field but if it's required to sort by 2 fields...

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

My use case is returning search results of parents, but sorted by the date of the most relevant child. As far as I can tell, there isn't a way to achieve this. If there is a workaround, please let me know!

+1

+1

+1

+1

For now, In my situation (sorting by parent.field), I'm working with function_score.

GET /index/child/_search
{
  "query": {
    "has_parent": {
      "parent_type": "parent",
      "score_mode": "score", 
      "query": {
        "function_score": {
          "script_score": {
            "script": "_score * doc['id'].value"
          }
        }
      }
    }
  }
} 

I don't think we can realistically make it work so I am closing it as a won't fix. When this is needed, a possible workaround is to fold sort values into the score as shown by @danipl in the above message.

@jpountz What has changed? I thought @martijnvg said he saw a way forward?

This is something that could be implemented, but it would require a lot of specialization depending on the types of the fields that are being sorted, and I don't think that would be sustainable in the long term. Moreover, I am concerned about making features that do not scale more appealing (parent/child needs to perform a linear scan in the 2nd phase of its execution in the general case).

@rpedela I did and still see a way how this could be implemented. Implementing this feature does require writing quite some code that will only be used when using has_child/has_parent queries and sorting by a field, which won't be used if sorting by _score or when other queries are used. Over time I did see that the workaround provided here is sufficient for most people wanting to sort based on a field in a child document or parent document. I think closing this issue as won't fix is jusitfied, since adding this is far from trivial and most of the time the workaround is good enough

Fair enough. Could small but complete examples (sort by parent and sort by child) be added to the docs showing how to sort using the workaround? I think without official documentation this will keep coming up based on the amount of interest.

Could small but complete examples (sort by parent and sort by child) be added to the docs showing how to sort using the workaround?

Yes!

I there any possibility to use this workaround to sort on text field?

@sop3k I can't think of any normal way to make it work on a string field, but I came up with a method which may or may not work for you depending on your application and tooling.

At index time, convert the string you want to sort into a 63-bit number and index it alongside the string field as a long. Assuming a really naive 8-bit encoding, you could get 7 characters worth of sorting accuracy, or you could pack the bits to fit more if you know the range of values you'll be storing. Then when you want to sort on the string field, sort on this number instead using the function_score technique described here.

I haven't actually tried the above, but it seems likely to work if you're staying in the ASCII range of characters. Of course, this would not work well with UTF8, and would probably not provide enough accuracy to be useful with UTF16.

Thank @dimfeld for comment. I was thinking about same solution but as you mentoin it's very limited to do so with UTF8 and UTF16.

@martijnvg

Hi, any updates for the workaround?

@martijnvg Does the workaround above work for String fields?

+1

+1

+1

+1

+1

Also need workaround for strings

+1

Was this page helpful?
0 / 5 - 0 ratings