Elasticsearch: Ability to retrieve in an aggregations request only pipeline aggs results

Created on 7 Jan 2016 · 9Comments · Source: elastic/elasticsearch

I have explained this matter in several other issues, but it's time to make it a ticket on its own.
Very often programmers find themselves in the need to post-process the aggregation results computed by _Elasticsearch_. Since October 2015 the pipeline aggregations are officially avaliable to everyone, so a bunch of use cases can now be handled by just crafting a more elaborate search query.

That's very good, but not enough yet, because clients need also to be off-loaded in terms of network traffic and memory. For now, they receive and are forced to load from the network reply the results of +completely uninteresting+, intermediate aggregations.

Ideally, we should have control on the aggregation level whether its results should be returned or not (a prune property accepted by all aggregations would be just fine). Alternatively, one could also prune from the results the aggregations used in pipeline aggregations via a search-wide flag called (say) prunePipelinedAggs with three possible values:

_false_: _default_, for backwards compatibility (but I would vote for _basic_ as default value)
_basic_: suppresses only the results of basic aggregations that serve as source data for pipeline aggs; the results of "unrefined" aggregations remain untouched, in the reply
_all_: suppresses the results of all aggregations (both basic and pipeline) that serve as source data for pipeline aggs

Wording may vary, but you get the idea. Particularly "basic" is not an established term.

Source

acarstoiu

Most helpful comment

@acarstoiu kindly refrain from castigating other users about their level of English.

clintongormley on 1 Jun 2016

❤5

All 9 comments

@acarstoiu Use response filtering to return just the aggs that you want. See https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#_response_filtering

clintongormley on 10 Jan 2016

Yes, I found this on my own in June last year, see this comment. But that's just a workaround, isn't it?!

acarstoiu on 14 Jan 2016

@acarstoiu Me to need this feature.
In my case the bucket is 50 Mb big, but it's no use to me; I just want the bucket size (returned by the reducer)

shuangshui on 26 May 2016

Use a pipeline aggregation and follow the @clintongormley's link.

acarstoiu on 26 May 2016

@acarstoiu thanks. this do solve my problem
but still is needs to sort the bucket and load all of them in to the server mem and return part of the result. This is indeed not a very efficient solution? dont know if i'm right

shuangshui on 28 May 2016

Off topic: please stop using a mechanical translator and learn the language. English has _by far the simplest grammar_ among the European languages, it's the best you can get in terms of simplicity (and I know the Chinese grammar is way simpler).

And now to the matter: yes, it is less than optimal, that's why this issue exists (albeit closed - @clintongormley has yet to explain why).

acarstoiu on 28 May 2016

👎2

@acarstoiu kindly refrain from castigating other users about their level of English.

clintongormley on 1 Jun 2016

❤5

Well, I did ponder whether to write that or not, but I honestly believe it helps the guy a lot more than being "politically correct", a term invented in the west. Try using a mechanical translator for _un șut în fund înseamnă un pas înainte_ :v:

acarstoiu on 2 Jun 2016

@acarstoiu @clintongormley thanks a lot for your kindness.I'll improve my English although I was not using google translator.so embarrassing.
On the topic, when doing aggregation in Es, it will sort the bucket by doc_count by default.
In my case the query usually involves 1 million distinct values to group by. So it's very slow(about 20 seconds) . I'm wondering if there is a way to turn off the ordering/sorting in aggregation, which I think will speed up my query a lot！

shuangshui on 4 Jun 2016

Was this page helpful?

0 / 5 - 0 ratings