Currently, the bucket_script pipeline is only capable of working with MultiBucketAggregations such as histo or range. But there are a variety of use-cases where it would be nice to use it with a SingleBucket, like a filter. It fails because the BucketScriptAgg attempts to cast the InternalFilter into a MultiBucketAgg and fails.
{
"size":0,
"aggs":{
"connections":{
"filter":{ ... },
"aggs":{
"users":{
"cardinality":{
"field":"user_id"
}
},
"average_distinct":{
"bucket_script":{
"buckets_path":{
"distinct":"users",
"total":"_count"
},
"script":"total / distinct"
}
}
}
}
}
}
org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation
You can skirt the issue by using a filters bucket with a single filter, but this is definitely a hack and rather gross:
{
"size":0,
"aggs":{
"connections":{
"filters":{
"filters":[
{ ... }
]
},
"aggs":{
"users":{
"cardinality":{
"field":"user_id"
}
},
"average_distinct":{
"bucket_script":{
"buckets_path":{
"distinct":"users",
"total":"_count"
},
"script":"total / distinct"
}
}
}
}
}
}
Other than complicating the code a bit more (needs to handle both Multi and Single), I don't think there is a technical reason why we couldn't support this? I think we'd just need to implement an internal doReduce() that is overloaded for Multi / Single, and handles them as needed?
/cc @colings86
:+1: on support for this.
It would also be great if bucketScript could work across sub-aggregations that are single buckets (e.g. filter). For example, if each document is a pageview event I'd like to calculate unique, new & returning visitor counts over a time range:
{
"size": 0,
"aggs": {
"visits": {
"filters": {
"filters": [
{
"bool": {
"must": [
{
"range": {
"time": {
"gte": "2015-10-25",
"lt": "2015-11-01",
"time_zone": "UTC"
}
}
},
{
"term": {
"eventType": "ProfileVisit"
}
}
]
}
}
]
},
"aggs": {
"unique-visitors-total": {
"cardinality": {
"field": "userId"
}
},
"first-time-visits": {
"filters": {
"filters": [
{
"term": {
"firstTimeForVisit": true
}
}
]
},
"aggs": {
"first-time-visitors-total": {
"cardinality": {
"field": "userId"
}
}
}
},
"returning-visitors-total": {
"bucket_script": {
"buckets_path": {
"uniqueVisitorsTotal": "unique-visitors-total",
"firstTimeVisitorsTotal": "first-time-visits>first-time-visitors-total"
},
"script": "uniqueVisitorsTotal - firstTimeVisitorsTotal"
}
}
}
}
}
}
I can't even use the gross filters hack for this:
buckets_path must reference either a number value or a single value numeric metric aggregation, got: java.lang.Object[]
I think this is because first-time-visitors-total is inside a filters aggregation, so Elasticsearch thinks it has multiple bucket values, even though there's only 1 filter.
Maybe there's some way to re-organize this query to work around current limitations, but I can't find it and am pretty stuck. :cry:
It seems like this should work though if the above query just used filter aggregations, since there would just be a single value for both unique-visitors-total and first-time-visits>first-time-visitors-total.
Another example use case is clickthrough rate:
"impressions": {
"filter": {
"term": {
"eventType": "Impression"
}
}
},
"clickthroughs": {
"filter": {
"term": {
"eventType": "Clickthrough"
}
}
},
"clickthrough-rate": {
"bucket_script": {
"buckets_path": {
"clickthroughCount": "clickthroughs._count",
"impressionCount": "impressions._count"
},
"script": "clickthroughCount / impressionCount"
}
}
@elastic/es-search-aggs
If we decide to implement this we should also allow the bucket_script aggregation to be used at the top level as requested in https://github.com/elastic/elasticsearch/issues/14656
We should also change the bucket_selector aggregation and other sibling pipeline aggregations to work on the top level as well
So I realize that this is a very old issue, but I ran across it trying to find a solution for this. Adapted the example above from a different use case, but should work:
{
"first-time-visitors-total-single": {
"sum_bucket": {
"buckets_path": "first-time-visits>first-time-visitors-total"
}
},
"returning-visitors-total": {
"bucket_script": {
"buckets_path": {
"uniqueVisitorsTotal": "unique-visitors-total",
"firstTimeVisitorsTotal": "first-time-visits>first-time-visitors-total-single"
},
"script": "uniqueVisitorsTotal - firstTimeVisitorsTotal"
}
}
}
Basically, you can convert a filters aggregation with a single filter into a single-value bucket via a pointless sum aggregation and use that.
Hope that helps.
Most helpful comment
We should also change the bucket_selector aggregation and other sibling pipeline aggregations to work on the top level as well