Elasticsearch: Can't use Date or String values in `bucket_selector` and `bucket_script` pipeline aggregations

Created on 3 Apr 2017  路  10Comments  路  Source: elastic/elasticsearch

Reported via Discuss forum: https://discuss.elastic.co/t/bucket-selector-aggregation-on-date-histogram--key/80986

The bucket_script and bucket_selector aggregations use the BucketHelpers.resolveBucketValue() method to get the bucket_path values from the buckets. That method requires that the return value is a Double so currently it is required that all the bucket_paths are numeric values. This presents a problem when trying to use the _key of a bucket since the key might be a DateTime (in the case of a date_histogram or date_range aggs) or a String (in the case of a terms agg).

One thing to note here is that all current pipeline aggs except the bucket_script and bucket_selector require the value from the bucket_path to be a double, so whatever the solution to this bug we should maintain a route that guarantees a double to be returned.

I have a few thoughts on how to solve this but I don't know if which (if any) of them are a good idea yet:

  1. Change BucketHelpers.resolveBucketValue() to have a generic return type and somehow check that type is compatible before returning (not sure if this is possible since generics are not available at runtime)
  2. Add a public Object resolveBucketObject() method in BucketHelpers that just returns the value it gets from the bucket as long as its not an instance of InternalAggregation (So that its an actual value rather than an aggregation
  3. Let the bucket_script and bucket_selector aggs get the bucket path values directly using Bucket.getProperty()
:AnalyticAggregations >bug Analytics high hanging fruit

Most helpful comment

Hi! I was just curious if this is something that is still being worked on or if this is working already in ES 7.3?

All 10 comments

Bleh. I think this is a thing that'd be fixed pretty well with the script contexts we keep talking about. In that case we'd compile the script against one of a couple of interfaces (returning a double, returning a date, returning a string) and then adapt them to something useful for aggs. Or something like that.

Without them (because they aren't coming quickly) we could add a couple more instanceof checks....

I don't really want to add the instanceof checks directly to that method since, as I mentioned, the rest of the pipeline aggregations rely on the value being a number.

@elastic/es-search-aggs

Ran into this and had a look at the BucketHelpers class. I'd go for solution _2_ proposed above by @colings86, namely:

  • rename with an IDE the resolveBucketValue() method to resolveBucketNumericalValue() - this should modify all uses, except for the _bucket_selector_ and _bucket_script_ aggregations
  • inside BucketHelpers, create method resolveBucketValue() with the same parameter lists and code, except for the fact that it should merrily return anything that's not an aggregation
  • redirect the variant of resolveBucketNumericalValue() with a longer parameter list to its resolveBucketValue() counterpart, plus an instanceof Double check

Uhm... and I would replace the "high" with a "low" in that label.

Also related, recent discussions about exposing the string key of geo tiles: https://github.com/elastic/elasticsearch/issues/39957#issuecomment-472178949

Hi! I was just curious if this is something that is still being worked on or if this is working already in ES 7.3?

I'm curious too

Just ran into this issue while trying to calculate durations between events over time and was wondering if this is a workaround or if I'm misunderstanding the result:


Given a set of documents with a @timestamp field

{"index":{}}
{"@timestamp":"2020-01-01T12:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T12:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T12:10:30Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:00:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:01:00Z"}
{"index":{}}
{"@timestamp":"2020-01-01T13:02:00Z"}

this query fails:

{
  "size": 0,
  "aggs": {
    "permin": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "minute",
        "min_doc_count": 1,
        "keyed": false
      },
      "aggs": {
        "diff": {
          "serial_diff": {
            "buckets_path": "_key",
            "lag": 1
          }
        }
      }
    }
  }
}

with:

{
  "type" : "aggregation_execution_exception",
  "reason" : "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [ZonedDateTime] at aggregation [_key]"
}

while this one, with a min aggregation to get the timestamp to something that will serial_diff runs without exception:

{
  "size": 0,
  "aggs": {
    "permin": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "minute",
        "min_doc_count": 1,
        "keyed": false
      },
      "aggs": {
        "minpermin": {
          "min": {
            "field": "@timestamp"
          }
        },
        "diff": {
          "serial_diff": {
            "buckets_path": "minpermin",
            "lag": 1
          }
        }
      }
    }
  }
}

related to #54110

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rpalsaxena picture rpalsaxena  路  3Comments

matthughes picture matthughes  路  3Comments

abtpst picture abtpst  路  3Comments

clintongormley picture clintongormley  路  3Comments

malpani picture malpani  路  3Comments