Elasticsearch: Cumulative Sum update to work with Terms Agg

Created on 2 Sep 2019  路  6Comments  路  Source: elastic/elasticsearch

Currently, the Cumulative Sum aggregation only works with a bucket path that is a histogram or date histogram. There are some situations where it is desirable that the bucket path is a terms aggregation. An example of this is when you want to build a pareto chart which is what one opportunity is trying to do. This would be a great enhancement for IoT and failure analysis.

The workaround to this is to use Vega. Having done this, it is semi complicated. If this were baked into Elasticsearch (and therefore Kibana). It would be a few very simple clicks to create one. It took me multiple hours to build it in Vega.

From talking to @polyfractal there is a PR to add a gap_policy: none which relates to this. The caveat is that the pipelines would assume whatever order is used by the terms aggregation... this could be nonsensical as a derivative across terms doesn't make much sense.

CC: @polyfractal

:AnalyticAggregations >enhancement Analytics good first issue

All 6 comments

Pinging @elastic/es-analytics-geo

We chatted about this in our team meeting, and don't see any reason cumulative_sum shouldn't be able to work on terms aggs (and more generally, any agg that has some kind of ordering). We may want to throw an exception if a user sets a gap_policy when targeting terms agg, or we could possibly just ignore it since terms won't ever have "gaps" anyway.

I don't think this would be an overly large task, mostly a matter of updating what the agg is allowed to work with.

Hi @polyfractal @Alex3k I see this was opened a year back. Is this still relevant? Can I work on this? Also, I may need a little guidance.

We are also looking for this feature to use cumulative sum on terms.
Did you start developing on this? ;)

heya @ashish3805 are you still working on this? if not id like to take a shot

@Alex3k do you mind sharing what queries you want to run?
What other aggregates are we thinking of supporting aside from terms ?
What are expected queries with gap_policy?
Currently i only have a basic query

curl -X POST "localhost:9200/kibana_sample_data_flights/_search?pretty" -H 'Content-Type: application/json' -d'
{
      "aggs": {
          "countries": {
            "terms": { "field":"OriginCountry"}
          },
    "cumulative_flights": {
          "cumulative_sum": {
            "buckets_path": "countries" 
          }
        }
      }
}
'

or this

curl -X POST "localhost:9200/kibana_sample_data_flights/_search?pretty" -H 'Content-Type: application/json' -d'
{
      "aggs": {
          "countries": {
            "terms": { "field":"OriginCountry"},
        "aggs" : {
          "cumulative_flights": {
              "cumulative_sum": {
                    "buckets_path": "countries" 
              }
            }

         }
          },

      }
}
'

@polyfractal just to confirm looking at trunk, first thing id have to do is modify the conditional blocks in
PipelineAggregationBuilder.ForInsideTree.validateParentAggSequentiallyOrdered right?

Hi @polyfractal @Alex3k PTAL at related PR https://github.com/elastic/elasticsearch/pull/66241 for this feature, thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Praveen82 picture Praveen82  路  3Comments

jpountz picture jpountz  路  3Comments

dadoonet picture dadoonet  路  3Comments

martijnvg picture martijnvg  路  3Comments

clintongormley picture clintongormley  路  3Comments