Elasticsearch-dsl-py: Sorting bucket contents

Created on 6 Jul 2018  路  6Comments  路  Source: elastic/elasticsearch-dsl-py

Spent another afternoon failing to figure out what I thought would be a dead-simple task in elasticsearch: What is the idiom to bucket by one key and sort by another key within each bucket?

I've been reading the docs and tests here https://github.com/elastic/elasticsearch-dsl-py/blob/master/test_elasticsearch_dsl/test_aggs.py but to no avail.

Based on the test I was trying this:

 A=events.aggs.bucket('by_sn', 'terms', field='sn')

  A.bucket('start_bucket_sort', 'bucket_sort', sort=[{'start':{'order':'desc'}}])

BucketSort(sort=[{'start': {'order': 'desc'}}])
resp = A.execute()

But I keep getting

elasticsearch.exceptions.RequestError: TransportError(400, 'search_phase_execution_exception', 'No aggregation found for path [start]')

Throw me a bone here? Appreciate the help... thanks.

Most helpful comment

I am sorry but I don't really understand what you are trying to do. You cannot just sort by another field since you are not dealing with individual documents. You would instead have to order the buckets based on another aggregation, which you can do:

s = Search()
s.aggs.bucket('by_sn', 'terms', field='sn', order={'avg_order': 'desc'}).metric('avg_order', 'avg', field='order')

Hope this helps!

All 6 comments

I am sorry but I don't really understand what you are trying to do. You cannot just sort by another field since you are not dealing with individual documents. You would instead have to order the buckets based on another aggregation, which you can do:

s = Search()
s.aggs.bucket('by_sn', 'terms', field='sn', order={'avg_order': 'desc'}).metric('avg_order', 'avg', field='order')

Hope this helps!

Yeah, sorry, I just must have misguided perspective about what is supposed to be going on.

So it is just impossible to get results that consist of grouped hits? I think that is what I was expecting to be possible.

For that I believe that you are looking for the top_hits aggregation which of course takes a sort parameter.

s = Search()
s.aggs.bucket('by_sn', 'terms', field='sn').metric('hits', 'top_hits', size=10, sort=[{'order': 'desc'}])

@HonzaKral Halleluja, I think that might be it! Thanks!

Just make sure if you do something like:
s.aggs.bucket(...)..\
.metric('my_max_metric', 'max', field='fieldname')\
.bucket('latest_my_max_field', 'bucket_sort', sort=[{'latest_log':{'order':'desc'}}])

my mistake was to use
.bucket('my_max_metric', 'max', field='fieldname')
instead of
.metric('my_max_metric', 'max', field='fieldname') ..

Hi,
I want to get the count in sorted order. I have a list of dicts as data. so I adding up on keys and adding the counts.
i get data like this.
[{key: "google.com", doc_count: 2, count: {value: 7}},
{key: "0.0.0.0", doc_count: 1, count: {value: 2}},
{key: "0.030", doc_count: 1, count: {value: 5}}]

I want the result to be in sorted order. so when i add a new metric to do so i get data like

[{key: "google.com", doc_count: 2, sorted_data: {value: 5}, count: {value: 7}},
{key: "0.0.0.0", doc_count: 1, sorted_data: {value: 2}, count: {value: 2}},
{key: "0.030", doc_count: 1, sorted_data: {value: 5}, count: {value: 5}}]

which is incorrect as it is sorting before adding. I want to perform sorting to be performed on the result of the first metric.
my query
search.aggs.bucket('timeIOCS', 'nested', path='timeIOCS').bucket('aggs_data', 'terms', field='timeIOCS.name', size="100000").metric('count', 'sum', field='timeIOCS.count').metric('sorted_data', 'max', field='timeIOCS.count')

Was this page helpful?
0 / 5 - 0 ratings