Elasticsearch: Allow bucket_script to operate on SingleBucketAggregation objects

Created on 7 Nov 2015 · 6Comments · Source: elastic/elasticsearch

Currently, the bucket_script pipeline is only capable of working with MultiBucketAggregations such as histo or range. But there are a variety of use-cases where it would be nice to use it with a SingleBucket, like a filter. It fails because the BucketScriptAgg attempts to cast the InternalFilter into a MultiBucketAgg and fails.

{
   "size":0,
   "aggs":{
      "connections":{
         "filter":{ ... },
         "aggs":{
            "users":{
               "cardinality":{
                  "field":"user_id"
               }
            },
            "average_distinct":{
               "bucket_script":{
                  "buckets_path":{
                     "distinct":"users",
                     "total":"_count"
                  },
                  "script":"total / distinct"
               }
            }
         }
      }
   }
}

org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation

You can skirt the issue by using a filters bucket with a single filter, but this is definitely a hack and rather gross:

{
   "size":0,
   "aggs":{
      "connections":{
         "filters":{
            "filters":[
               { ... }
            ]
         },
         "aggs":{
            "users":{
               "cardinality":{
                  "field":"user_id"
               }
            },
            "average_distinct":{
               "bucket_script":{
                  "buckets_path":{
                     "distinct":"users",
                     "total":"_count"
                  },
                  "script":"total / distinct"
               }
            }
         }
      }
   }
}

Other than complicating the code a bit more (needs to handle both Multi and Single), I don't think there is a technical reason why we couldn't support this? I think we'd just need to implement an internal doReduce() that is overloaded for Multi / Single, and handles them as needed?

/cc @colings86

:AnalyticAggregations >bug Analytics high hanging fruit

Source

polyfractal

👍16

Most helpful comment

We should also change the bucket_selector aggregation and other sibling pipeline aggregations to work on the top level as well

colings86 on 29 Mar 2018

👍5

All 6 comments

:+1: on support for this.

It would also be great if bucketScript could work across sub-aggregations that are single buckets (e.g. filter). For example, if each document is a pageview event I'd like to calculate unique, new & returning visitor counts over a time range:

{
  "size": 0,
  "aggs": {
    "visits": {
      "filters": {
        "filters": [
          {
            "bool": {
              "must": [
                {
                  "range": {
                    "time": {
                      "gte": "2015-10-25",
                      "lt": "2015-11-01",
                      "time_zone": "UTC"
                    }
                  }
                },
                {
                  "term": {
                    "eventType": "ProfileVisit"
                  }
                }
              ]
            }
          }
        ]
      },
      "aggs": {
        "unique-visitors-total": {
          "cardinality": {
            "field": "userId"
          }
        },
        "first-time-visits": {
          "filters": {
            "filters": [
              {
                "term": {
                  "firstTimeForVisit": true
                }
              }
            ]
          },
          "aggs": {
            "first-time-visitors-total": {
              "cardinality": {
                "field": "userId"
              }
            }
          }
        },
        "returning-visitors-total": {
          "bucket_script": {
            "buckets_path": {
              "uniqueVisitorsTotal": "unique-visitors-total",
              "firstTimeVisitorsTotal": "first-time-visits>first-time-visitors-total"
            },
            "script": "uniqueVisitorsTotal - firstTimeVisitorsTotal"
          }
        }
      }
    }
  }
}

I can't even use the gross filters hack for this:

buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]

I think this is because first-time-visitors-total is inside a filters aggregation, so Elasticsearch thinks it has multiple bucket values, even though there's only 1 filter.

Maybe there's some way to re-organize this query to work around current limitations, but I can't find it and am pretty stuck. :cry:

It seems like this should work though if the above query just used filter aggregations, since there would just be a single value for both unique-visitors-total and first-time-visits>first-time-visitors-total.

zcox on 23 Nov 2015

Another example use case is clickthrough rate:

        "impressions": {
          "filter": {
            "term": {
              "eventType": "Impression"
            }
          }
        },
        "clickthroughs": {
          "filter": {
            "term": {
              "eventType": "Clickthrough"
            }
          }
        },
        "clickthrough-rate": {
          "bucket_script": {
            "buckets_path": {
              "clickthroughCount": "clickthroughs._count",
              "impressionCount": "impressions._count"
            },
            "script": "clickthroughCount / impressionCount"
          }
        }

zcox on 24 Nov 2015

@elastic/es-search-aggs

colings86 on 13 Mar 2018

If we decide to implement this we should also allow the bucket_script aggregation to be used at the top level as requested in https://github.com/elastic/elasticsearch/issues/14656

colings86 on 13 Mar 2018

We should also change the bucket_selector aggregation and other sibling pipeline aggregations to work on the top level as well

colings86 on 29 Mar 2018

👍5

So I realize that this is a very old issue, but I ran across it trying to find a solution for this. Adapted the example above from a different use case, but should work:

{
  "first-time-visitors-total-single": {
    "sum_bucket": {
      "buckets_path": "first-time-visits>first-time-visitors-total"
    }
  },
  "returning-visitors-total": {
    "bucket_script": {
      "buckets_path": {
        "uniqueVisitorsTotal": "unique-visitors-total",
        "firstTimeVisitorsTotal": "first-time-visits>first-time-visitors-total-single"
      },
      "script": "uniqueVisitorsTotal - firstTimeVisitorsTotal"
    }
  }
}

Basically, you can convert a filters aggregation with a single filter into a single-value bucket via a pointless sum aggregation and use that.

Hope that helps.