I have explained this matter in several other issues, but it's time to make it a ticket on its own.
Very often programmers find themselves in the need to post-process the aggregation results computed by _Elasticsearch_. Since October 2015 the pipeline aggregations are officially avaliable to everyone, so a bunch of use cases can now be handled by just crafting a more elaborate search query.
That's very good, but not enough yet, because clients need also to be off-loaded in terms of network traffic and memory. For now, they receive and are forced to load from the network reply the results of +completely uninteresting+, intermediate aggregations.
Ideally, we should have control on the aggregation level whether its results should be returned or not (a prune
property accepted by all aggregations would be just fine). Alternatively, one could also prune from the results the aggregations used in pipeline aggregations via a search-wide flag called (say) prunePipelinedAggs
with three possible values:
false
_: _default_, for backwards compatibility (but I would vote for _basic
_ as default value)basic
_: suppresses only the results of basic aggregations that serve as source data for pipeline aggs; the results of "unrefined" aggregations remain untouched, in the replyall
_: suppresses the results of all aggregations (both basic and pipeline) that serve as source data for pipeline aggsWording may vary, but you get the idea. Particularly "basic" is not an established term.
@acarstoiu Use response filtering to return just the aggs that you want. See https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#_response_filtering
Yes, I found this on my own in June last year, see this comment. But that's just a workaround, isn't it?!
@acarstoiu Me to need this feature.
In my case the bucket is 50 Mb big, but it's no use to me; I just want the bucket size (returned by the reducer)
Use a pipeline aggregation and follow the @clintongormley's link.
@acarstoiu thanks. this do solve my problem
but still is needs to sort the bucket and load all of them in to the server mem and return part of the result. This is indeed not a very efficient solution? dont know if i'm right
Off topic: please stop using a mechanical translator and learn the language. English has _by far the simplest grammar_ among the European languages, it's the best you can get in terms of simplicity (and I know the Chinese grammar is way simpler).
And now to the matter: yes, it is less than optimal, that's why this issue exists (albeit closed - @clintongormley has yet to explain why).
@acarstoiu kindly refrain from castigating other users about their level of English.
Well, I did ponder whether to write that or not, but I honestly believe it helps the guy a lot more than being "politically correct", a term invented in the west. Try using a mechanical translator for _un șut în fund înseamnă un pas înainte_ :v:
@acarstoiu @clintongormley thanks a lot for your kindness.I'll improve my English although I was not using google translator.so embarrassing.
On the topic, when doing aggregation in Es, it will sort the bucket by doc_count by default.
In my case the query usually involves 1 million distinct values to group by. So it's very slow(about 20 seconds) . I'm wondering if there is a way to turn off the ordering/sorting in aggregation, which I think will speed up my query a lot!
Most helpful comment
@acarstoiu kindly refrain from castigating other users about their level of English.