Elasticsearch: Make bucketsPath syntax for lower and upper standard deviation bounds in the extended_stats aggregation more obvious

Created on 23 Jun 2016  路  7Comments  路  Source: elastic/elasticsearch

This issue was raised on the forum here: https://discuss.elastic.co/t/accessing-lower-bound-of-extended-stats-bucket/53763

The output of the extended_stats aggregation looks like this:

    "my_stats": {
      "count": 20523877,
      "min": 1,
      "max": 98250000,
      "avg": 165931.3618741722,
      "sum": 3405554861548,
      "sum_of_squares": 2540304986111513000,
      "variance": 96239937025.42874,
      "std_deviation": 310225.6227738591,
      "std_deviation_bounds": {
        "upper": 786382.6074218905,
        "lower": -454519.883673546
      }
    }

but the buckets paths to get the upper and lower standard deviation bounds are my_stats.std_lower and my_stats.std_upper which is not very intuitive given the output above. I think we should change it so the buckets_path (and terms sort path) is my_stats.std_deviation_bounds.lower and my_stats.std_deviation_bounds.upper

:AnalyticAggregations >breaking >enhancement Analytics help wanted

All 7 comments

+1

@colings86 Can I continue to work on this? Thanks in advance. 馃槃

@liketic if you are able to raise a PR for this then that would be great, thanks

@colings86 Thanks for your reply! 馃憤

If I'm not wrong, we expect to access upper and lower bound through my_stats. std_deviation_bounds.lower and my_stats. std_deviation_bounds.upper. However when we parse the path as https://github.com/elastic/elasticsearch/blob/e7d352b489272a87b05019869857656de7fb82fc/core/src/main/java/org/elasticsearch/search/aggregations/support/AggregationPath.java#L86

We didn't support access value more than one levels which means foo.bar is parse to
name=foo, key=bar and foo.bar1.bar2 will be parse to name=foo.bar1, key=bar2. So if this is not the expected behaviour, I think I can convert the key of PathElement to an array, for example:
name=foo, keys=[bar1, bar2].

Is that the correct direction? Any comments are appreciated. 馃槃

@liketic this is actually a bit tricky because the pattern for a valid aggregation name is: https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java#L53

So the reason why the line you linked to calls element.lastIndexOf('.') is so we support aggregation names containing dots. There is an argument that we should not support aggregation names containing dots but this is a bigger and potentially controversial change.

Because of this its going to be difficult to change the path to my_stats.std_deviation_bounds.lower and my_stats.std_deviation_bounds.upper since it wouldn't be easy to support both dots in aggregation names and dots in metric names.

I would if we should make a change to stop dots in the aggregation names so we can make this change or if we should instead keep the current limitation of metrics not containing dots and not worry about this issue? @jpountz @clintongormley what do you think?

@colings86 I can't think of a reason for supporting dot in aggregation names, esp as we don't support dots in field names (dots are treated as object paths). I'd be in favour of removing support (deprecating in 6.x and removing in 7.0)

cc @elastic/es-search-aggs

Was this page helpful?
0 / 5 - 0 ratings