We would like to support more complex metrics in Rollup such as cardinality, percentiles and percentile ranks. These are trickier since they are calculated from data sketches rather than simple numerics.
They also introduce issues with backwards compatibility. If the algorithm powering the sketch changes in the future (improvements, bug-fixes, etc) we will likely have to continue supporting the old versions of the algorithm. It's unlikely that these sketches will be "upgradable" to the new version since they are lossy by nature.
I see two approaches to implementing these types of metrics:
In the first approach, we implement new data types in the Rollup plugin. Similar to the hash, geo or completion data types, these would expect input data to adhere to some kind of complex format. Internally it would be stored as a compressed representation that could be used to build the sketch (e.g. a long[]
which could be used to build a HLL sketch).
The pro's are strong validation and making it easier for aggregations to work with the data. Another large positive is that it allows external clients to provide pre-built sketches as long as they follow the correct format. For example, edge-nodes may be collecting and aggregating data locally and just want to send the sketch.
The cons are considerably more work implementing the data types. It may also not be ideal to expose these data structures outside Rollup, since they carry the aforementioned bwc baggage.
Alternatively, we could implement these entirely by convention (like the rest of Rollup). E.g. a binary
field can be used to hold the appropriate data sketch, and we just use field naming to convey the meaning. Versioning can be done with a secondary field.
The advantage is much less upfront work...we can just serialize into fields and we're off. It also limits the impact of these data types, since only Rollup will be equipped to deal with the convention (less likely for a user to accidentally use one and then run into trouble later).
Big downside is that external clients will have a more difficult time providing pre-built sketches, since the format is just convention and won't be validated until search time. It also feels a bit more fragile since it is another convention to maintain.
In both cases, Rollup will probably have to maintain a catalog of "old" algorithms so that historical rollup indices can continue to function. Not ideal, but given that these algos don't change super often it's probably an ok burden to bear.
Pinging @elastic/es-search-aggs
Small update.
Relates: #24468
Hi @polyfractal, do you know when this slated to go into production?
Hi @painslie, I'm afraid I do not have an update. We'll update this issue when there's more information, or link to it from a PR.
@polyfractal I'm curious how well the promethus histogram would line up with what you're thinking?
HDRHistogram is essentially just a clever layout of different-sized intervals: a set of exponentially-sized intervals, with a fixed number of linear intervals inside each exponential "level". But at it's heart, it's still just a histogram of counts like Prometheus histos (and unlike algos like TDigest which are weighted centroids, etc).
So it should be possible to translate a Prometheus histogram into an HDRHisto. Prometheus histos have user-definable intervals, which means the accuracy of translation will depend on how nicely the Promtheus histos line up with the HDRHisto intervals. I think any Prometheus histo should be convertible, and the accuracy of that conversion depends on the exact layout.
Prometheus Summaries are an implementation of Targeted Quantiles and will be much harder to use. The output of a summary is just a percentile estimation at that point in time, which is mostly useless to us. It might be possible to convert the underlying Targeted Quantiles sketch into a TDigest since the algos share some similarities, but I suspect it won't give great accuracy. I've been told summaries aren't as common either compared to Histos, so also probably not a priority.
With all that said, it's still not entirely clear _how_ a user will convert a prometheus (or any other system's histogram output) into our datastructure. I'm kinda thinking an ingest processor would make the most sense, slurping up a prometheus histo and emitting a compatible HDRHisto-field. But I haven't spent a lot of time thinking about the ergonomics of that yet. :)
Hi @polyfractal is there any ticket for adding weighted average support in pack rollups?
@polyfractal A quick update here. @kbourgoin and I have implemented a custom field type for serialized HLL rollups in the ES index, along with a corresponding aggregation query that works much like cardinality
, but de-serializes and merges multiple serialized document-stored HLL blobs. We've built it as a proper Elasticsearch plugin and presented it in NYC at a local Elasticsearch meetup yesterday, and it's almost ready for review by Elastic folks. I'll be writing up my slides into a technical blog post, as well, so people can try it out. It's not quite ready for production, but it's getting there. Would be good to sync up about this, as I'm sure it can help inform the similar approach for HDRHistogram and percentiles.
Excellent. We have had some discussions on our end as well on what the API and implementation could look like for a histogram field for percentile aggregations and a HLL++ field for cardinality aggregations. I suspect both impls will end up looking similar. :) cc @iverase
Small note: histograms have been implemented in #48580 (:tada:). Support in Rollup is still pending... we may want to wait for #42720
Most helpful comment
Small update.