Reported via Discuss forum: https://discuss.elastic.co/t/bucket-selector-aggregation-on-date-histogram--key/80986
The bucket_script and bucket_selector aggregations use the BucketHelpers.resolveBucketValue() method to get the bucket_path values from the buckets. That method requires that the return value is a Double so currently it is required that all the bucket_paths are numeric values. This presents a problem when trying to use the _key of a bucket since the key might be a DateTime (in the case of a date_histogram or date_range aggs) or a String (in the case of a terms agg).
One thing to note here is that all current pipeline aggs except the bucket_script and bucket_selector require the value from the bucket_path to be a double, so whatever the solution to this bug we should maintain a route that guarantees a double to be returned.
I have a few thoughts on how to solve this but I don't know if which (if any) of them are a good idea yet:
BucketHelpers.resolveBucketValue() to have a generic return type and somehow check that type is compatible before returning (not sure if this is possible since generics are not available at runtime)public Object resolveBucketObject() method in BucketHelpers that just returns the value it gets from the bucket as long as its not an instance of InternalAggregation (So that its an actual value rather than an aggregationbucket_script and bucket_selector aggs get the bucket path values directly using Bucket.getProperty()Bleh. I think this is a thing that'd be fixed pretty well with the script contexts we keep talking about. In that case we'd compile the script against one of a couple of interfaces (returning a double, returning a date, returning a string) and then adapt them to something useful for aggs. Or something like that.
Without them (because they aren't coming quickly) we could add a couple more instanceof checks....
I don't really want to add the instanceof checks directly to that method since, as I mentioned, the rest of the pipeline aggregations rely on the value being a number.
@elastic/es-search-aggs
Ran into this and had a look at the BucketHelpers class. I'd go for solution _2_ proposed above by @colings86, namely:
resolveBucketValue() method to resolveBucketNumericalValue() - this should modify all uses, except for the _bucket_selector_ and _bucket_script_ aggregationsBucketHelpers, create method resolveBucketValue() with the same parameter lists and code, except for the fact that it should merrily return anything that's not an aggregationresolveBucketNumericalValue() with a longer parameter list to its resolveBucketValue() counterpart, plus an instanceof Double checkUhm... and I would replace the "high" with a "low" in that label.
Also related, recent discussions about exposing the string key of geo tiles: https://github.com/elastic/elasticsearch/issues/39957#issuecomment-472178949
Hi! I was just curious if this is something that is still being worked on or if this is working already in ES 7.3?
I'm curious too
Just ran into this issue while trying to calculate durations between events over time and was wondering if this is a workaround or if I'm misunderstanding the result:
Given a set of documents with a
@timestamp field
{"index":{}}
{"@timestamp":"2020-01-01T12:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T12:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T12:10:30Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:01:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:02:00Z"}
this query fails:
{
"size": 0,
"aggs": {
"permin": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "minute",
"min_doc_count": 1,
"keyed": false
},
"aggs": {
"diff": {
"serial_diff": {
"buckets_path": "_key",
"lag": 1
}
}
}
}
}
}
with:
{
"type" : "aggregation_execution_exception",
"reason" : "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [ZonedDateTime] at aggregation [_key]"
}
while this one, with a min aggregation to get the timestamp to something that will serial_diff runs without exception:
{
"size": 0,
"aggs": {
"permin": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "minute",
"min_doc_count": 1,
"keyed": false
},
"aggs": {
"minpermin": {
"min": {
"field": "@timestamp"
}
},
"diff": {
"serial_diff": {
"buckets_path": "minpermin",
"lag": 1
}
}
}
}
}
}
related to #54110
Most helpful comment
Hi! I was just curious if this is something that is still being worked on or if this is working already in ES 7.3?