This issue contains a list of related feature requests that are not on the near-term roadmap. The feature requests in this issue are all new functions that have been requested. If you want to request a function not already listed please make a comment on this issue, and we will add it to the checklist.
[ ] cumulative distribution function: https://github.com/influxdata/influxdb/issues/7261
[ ] faster, slightly less accurate percentages calculation https://github.com/influxdata/influxdb/issues/513
[x] mode https://github.com/influxdata/influxdb/issues/1823
[x] trigonometric functions (sine, cosine, etc.) https://github.com/influxdata/influxdb/issues/659
[x] dot-product https://github.com/influxdata/influxdb/issues/5095
[x] difference https://github.com/influxdata/influxdb/issues/1825
[x] DIFFERENCE of two fields https://github.com/influxdata/influxdb/issues/761
[x] general support for nested functions https://github.com/influxdata/influxdb/issues/834
fill(previous) without GROUP BY https://github.com/influxdata/influxdb/issues/3633Is there any timeline for these functions? (I'm very interested in the aggregate integral functions, to calculate kWh from watts). Seems that a feature request was opened a year ago.
There is no timeline for each specific function/feature. All work on new functions was on hold while the query engine was refactored, and that refactoring was merged into InfluxDB 0.11. We plan to introduce a few functions with each release from now on.
Would also be great to get #3633 Added to your list this feature would be very useful on the client side when graphing data.
Thanks, @timgriffiths, added!
Filed #6208
would like to see holt-winter and some build-in time series anomaly detection functions in influx QL.
E.g. https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series
http://robjhyndman.com/hyndsight/yahoo-data/
@dengliu can you open a new issue so I can reference it here?
@beckettsean I have updated the issue id in my previous comment.
@dengliu as referenced in the other issue, anomaly detection functions should go into Kapacitor. The results can be fed back into InfluxDB for visualization, but that doesn't require any new functionality in InfluxDB.
Awesome work. I'm +1 for #813 - I've got data in Influxdb that I'd like to run it on.
+1 for exponents and logarithms #659
The lack of a mode function is really hurting us in switching from 0.8.9. In particular, we need to be able to select the most common occurrence of a specific string over a group by, which we have to do in post-processing now.
@brandoncazander COUNT(DISTINCT()) will give you the frequencies for each period in the GROUP BY, and you can pull the largest number from that client-side. Not a solution but perhaps a workaround.
The IOT stuff we are doing really needs integrals. Any idea on timeline of when we might get this?
+1 for lag (or some kind of per-series time-shift) so that we can show current and -1week data on the same panel in grafana.
something like:
select mean(value), mean(lag(value, 1w)) from metric where time > now() - 1d group by time(1m)
would return two fields, being the per minute averages for the last 1d and for the same period but one week prior.
If there is already a way to do this, I would appreciate being hit with the clue-stick.
@sferrett That's an excellent suggestion. Would you mind creating a new issue for it so we can track that separately? That's a bit beyond the level of adding a simple function.
@gunnaraasen any slick Grafana tricks to accomplish the same goal with current InfluxQL?
Grafana has a per-graph setting to add a time shift (see the screenshot below). I don't think it's possible to shift a single series within the same graph.

+1 one for adding more math functions
From the top of my head the function would be:
Would it be possible to create some way to extend InfluxDB with custom functions? Maybe to load some dynamic library at load time? This could allow community to experiment with different functions?
@mitar Kapacitor supports arbitrary User Defined Functions in almost any language, and it can be used to batch-process InfluxDB data.
Hm, but to my understanding, Kapacitor would have to read the whole span of a time-series to be able to compute the custom downsampled version of data, every time I would try to read it at a lower resolution?
@mitar Kapacitor can store its results in InfluxDB, just like a CQ. Instead of running a CQ inside InfluxDB, you would just have Kapacitor do a batch query, process the UDF, and then write the results back into the downsampled measurement in InfluxDB.
But I do not want to do this for every query users come up with, at different timespans with different downsampling? I like InfluxDB because it can compute downsampled values efficiently without moving data around. Reading InfluxDB into another process and then storing it back seems inefficient to me? Or are you saying that there is not much performance difference between InfluxDB doing downsampling on the fly, and Kapacitor doing it? Using Kapacitor to downsample the same amount of data takes similar resources than using InfluxDB to do such a query?
@mitar my thoughts exactly. The goal should be (at least that's mine) to do as much as possible on the db without sending data over the network. I would like to store the data in the finest granularity available and the derive data from it. And then only send over the derived data to the client. In this (my) case drived data are not only aggregated time buckets (e.g. seconds built from ticks), but computations build on those buckets.
@mitar I cannot think of a way to allow for ad hoc queries using user defined functions. We do accept PRs for new functions, or you can perform the function calculations client-side for now.
@mrecht sending as little data around the network as possible is our goal, too, but opening up core InfluxDB for arbitrarily defined user functions introduces too many challenges around maintaining performance and stability. I don't expect UDFs to be part of InfluxDB at any time in the 1.x version line.
@pauldix are UDFs for InfluxDB anywhere on the timeline yet?
So what about allowing InfluxDB to load some dynamic code (.so, .dll) to do that? Many other databases allow modules which extend the behavior.
So maybe there should just be a way to load a custom module, and then you could say "module:foo" which would simply call that module's function as a function in InfluxDB?
UDFs aren't on the timeline yet. Best bet is to use Kapacitor for that for now.
@mitar @mrecht I encourage you to open a new issue for user-defined functions in InfluxDB. It's not that we think it's a bad idea and are saying no, it's just horribly complex and not a current priority.
Done: #6891
Really need SUM to support nested functions such as MEAN. See issue https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/influxdb/uXdYy9JA6_E/cZRzQo7FBAAJ. This is likely to prevent us from deploying InfluxDB. As noted in that issue though this is just one respect in which InfluxDB's query language seems sorely limitted!
In regards to @MatMeredith comment: this is the one biggest missing feature for us as well!
As to the issue #5150, you can get around that with a continuous query that does a
NON_NEGATIVE_DERIVATIVE(MEAN(value)), 1s) INTO
a new measurement and then pull a 95th or whatever percentile out of it. This is perfect for a 95th percentile of interface traffic calculation, but alas, Grafana cannot yet draw that as a line sice the percentile is really a point, not a series of values.
@beckettsean Can you please add #5345 to this? Its related to Top aggregate.
@beckettsean also #6723 to make it easier to query on specific patterns of time windows (e.g. Every day 9AM traffic pattern).
A good addition would be exponential moving average (EMA), useful to minimize the MA jumping when a large value enters and exits the window.
https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
I was also a bit uncertain about the overlap between top/bottom and min/max - would be nice if the manual mentions if there is any performance difference between these two, besides the nicer syntax when your use case fits the min/max
Histograms are for continuous numeric data. What about discrete non-numeric data?
Specifically, an example use case:
I have a measurement which contains some data of unbounded dimension (say, userID). I store it in a field to avoid exploding my series cardinality, and because I don't really need to query on this data for a specific userID. What I really want is to be able to see _how many points occur in this series for each userID_. This could be displayed nicely in a bar graph which shows, say, the 5 userIDs with the highest number of points for this series.
I apologize if this is already possible with current functionality, but I haven't found a way to do it without selecting the field (and thus all the points) directly from the measurement, and transforming it into aggregate counts on the client side.
@bal2ag: Yes, I need that as well. I opened something similar for RRDTool as well: https://github.com/oetiker/rrdtool-1.x/issues/261
We have received user requests describing the need for a latest selector function: #7089
Can geometric and harmonic mean functions be also added to the list? They are useful for estimating non-normal distributions.
Casting was implemented however I think this fairly important use case was skipped out on:
select mean(value)::integer from firstmeasurement
Currently it's only possible to do the following: (2 steps)
select mean(value) into secondmeasurement from firstmeasurement
select mean_value::integer into thirdmeasurement from secondmeasurement
Is it possible to add support for this feature because it otherwise necessitates a CQ or the use of Kapacitor to get this feature.
When trying to do so now you receive:
select mean(value)::integer from firstmeasurement
ERR: error parsing query: found ::, expected FROM at line 1, char 15
Modulus division would be greatly appreciated.
ie, the % operator.
@beckettsean I would also like to see #6723 added to your list as well. This would allow for better retrieval of periodic or seasonal data. What's the processing for vetting this feature for your near-term roadmap?
@beckettsean Are there any plans of supporting MySQL pattern of SUM(IF(EXPR, 1, 0)) commonly used for conditional aggregating?
MariaDB [test]> select * from http_requests;
+----+-------------------+---------------+
| id | url | response_code |
+----+-------------------+---------------+
| 1 | http://google.com | 200 |
| 2 | http://google.com | 200 |
| 3 | http://google.com | 200 |
| 4 | http://google.com | 200 |
| 5 | http://google.com | 200 |
| 6 | http://google.com | 200 |
| 7 | http://google.com | 404 |
| 8 | http://google.com | 503 |
| 9 | http://google.com | 500 |
+----+-------------------+---------------+
9 rows in set (0.00 sec)
MariaDB [test]> select url, sum(if(response_code = 200, 1, 0)) as success, sum(if(response_code between 400 and 499, 1, 0)) as failures_client, sum(if(response_code >= 500, 1, 0)) as failures_server from http_requests group by url;
+-------------------+---------+-----------------+-----------------+
| url | success | failures_client | failures_server |
+-------------------+---------+-----------------+-----------------+
| http://google.com | 6 | 1 | 2 |
+-------------------+---------+-----------------+-----------------+
1 row in set (0.00 sec)
My original plan was to create a CQ where I could aggregate a # of requests made in a time frame to calculate aggregate successes and failures in a single measurement.
I know that Kapacitor lambda supports IF operator, however was hoping InfluxQL might natively support this. This has the flavor of #4619 except it involves supporting expressions.
Thanks for consideration!
I feel the point "dot-product #5095" is too narrow in this feature collection. That feature request originally talked about using arithmetic operations inside of aggregates in general, not only the dot-product in particular. The dot-product was just a specific example of the problem, i.e., SUM(a*b), but there are plenty of other similar operations like SUM(a/b+c) or, in my case, the so called "micro_price" in an order book: MAX((bid*bid_vol+ask*ask_vol)/(bid_vol+ask_vol)) (or the low MIN): http://quant.stackexchange.com/a/24510/2792
I've started work on integral function. Are there any quants or devs watching that have an opinion about default # of slices or a good rule of thumb relating data points to slices?
Is anybody working on histogram() ? It is a crucial function for us unfortunately and doesn't seem to be the focus of any development work at the moment.
I hope the team will put some documents on how to develop such aggregations / operators :)
The aggregation function "Root Mean Square" RMS would be great.
+1 wish point to histogram()
+1 wish point to histogram()
@pgeiger why not follow "trapezoidal numerical integration" such as the one found in MatLab and Python's SciPy ?
@dgomes - I'll check it out, thanks.
Hello,
Is it possible to add the function abs: _select abs("value") from ..._ ?
If not. Why is it so complicated? There is already the mean, max, min, derivative... which are more complicated as function!
If yes. when do you plan to add this feature?
Thank you!
We are waiting for #1115. Right now we have to calculate the average of series with odd time interval in our application. This is really problematic if the series has lots of point which have to be pushed over the network.
+1 for "Top accepts nested functions". Really missing a top(sum()) atm.
Thanks for the great work!
@pgeiger There is already a pull request for an integral function, we are just discussing what to call it! Feel free to get involved in the discussion.
@gwuillou Adding the abs() function to work as SELECT abs(value) FROM data is indeed simple. I have an example of this (along with about ten other functions) working in my repo.
Unfortunately the InfluxQL parser and query code is very complicated, so it turns out that it is really hard to make statements like the following work as users will expect:
SELECT value, abs(value) FROM data
SELECT max(value), abs(value) FROM data
Until I figure out how to make these things work properly, I doubt I will get the new functions accepted into the InfluxDB codebase.
+1 wish point to histogram()
+1 for Integral()
+1 to #142. InfluxDB needs timeshift as Graphite currently has. The monitoring should have some reference so 1 day back, 7 days back information should be possible to be added to the grap (timeshift)
Marking a few of these as done since they are possible using subqueries as shown in #8402.
+1 for timeShift, critical analytic function
+1 for time shift
+1 wish point to histogram()
+1 wish point for histogram()
This is pretty crucial for us, and is really a "killer feature" for any timeseries database.
+1 for timeShift. It's been 4 years since feature request #142
+1 for Harmonic and Geometric
+1 for exponents and logarithms #659
+1 for exponents and logarithms #659
+1 for histogram #3674
+1 for exponents and logarithms #659
+1 for trigonometric functions (cosine)
+1 for Histogram, the only other feature I need at the moment.
+1 for Histogram.
+1 for Timeshift!
+1 for TimeShift!
+1 wish point to histogram()
I'd like to request for a new summary function specifically made for time series visualization: Largest-Triangle-Three-Buckets.
I've been using mean to downsample data, but averaging time series data results in a smooth line which does not accurately represents the underlying data.
Sveinn Steinarsson wrote a thesis titled "Downsampling Time Series for
Visual Representation" that describes and tests algorithms for that purpose. The d3fc library makes use of "Largest-Triangle-Three-Buckets" in the front end, but it would be best if InfluxDB could downsample the data to transmit and process as few bytes as possible.
We are new to InfluxDB, but we are betting on using it, because it is robust and efficient. Congratulations and thank you for that.
However, there are some incomprehensible lacks for a TSDB, such as the time-weighted average, return of the previous value at the initial time of the query... In short, improve the support for data recorded by variation.
So +1 for #3633 and #7445.
Boolean type cast #7562 seems easy to solve, and very important for use with Grafana for example, which does not support well the use of boolean.
+1 wish point to histogram()
+1 for time shift
+1 for histogram()
+1 for CAST function. Seriously, why isn't this a thing?
Is user defined function feature available in influx db? It will of great help if somebody could help me with sample in influxdb, which are similar to stored procedures. Any suggestions? Thanks in advance.
@AnnapoorniS I'd suggest looking at Flux https://github.com/influxdata/platform/tree/master/query
Assuming that because #834 is checked it is seen as completed.
These functions does still not work (v.1.5.1) and would very be useful for me:
sum(non_negative_derivative(value))
sum(derivative(value))
Thanks
aggregation with respect to weekday and weeknumber based on the timestamp
Does #3552 fit in here?
+1 for histogram #3674
I'd like to request for a new summary function specifically made for time series visualization: Largest-Triangle-Three-Buckets.
I've been using
meanto downsample data, but averaging time series data results in a smooth line which does not accurately represents the underlying data.Sveinn Steinarsson wrote a thesis titled "Downsampling Time Series for
Visual Representation" that describes and tests algorithms for that purpose. The d3fc library makes use of "Largest-Triangle-Three-Buckets" in the front end, but it would be best if InfluxDB could downsample the data to transmit and process as few bytes as possible.
+1
馃憤 +1 for timeshift functions - Seems like this has been requested multiple times by a number of users
The checklist implies that both "percentile + derivative" (#5150) and "Top accepts nested functions" (#2467, #5345) are addressed. However, the documentation says that neither percentile nor top supports nested functions (as of 1.7). So... is the checklist wrong? Can you do these without nested function support? Is the doc wrong?
Hey @sfitts! (from the Forte days?!?) ... I believe you can do these now with subqueries. So, while it's not a direct nesting, it can be done that way. The primary focus going forward in terms of extending the query surface area is going to be done via Flux. https://docs.influxdata.com/flux/v0.7/
InfluxQL will, of course, continue to be supported. But there are challenges that we are going to address at the query engine layer and then open up the ability to address so many of these requests via Flux. Have a look, let us know!
@timhallinflux just after I wrote this it dawned on me that subqueries were probably the answer -- thanks for the confirmation. Also hadn't picked up on the fact that a query language replacement was in the works, so I'll definitely check that out.
(and yep -- I date back to Forte 馃憤)
Good to reconnect! I want to be super clear.... we are not going to "replace" InfluxQL. As we continue forward, InfluxQL will continue to be the primary on-ramp and supported. But, in terms of working with time series data -- we determined that a functional language can be a powerful way to manipulate the functions, results, and simplify developer code (in the end). So many of these requests were part of our design center for Flux itself and ensuring that we can deliver on them. We have been listening, observing, and attempting to address many of these for multiple years now. We started and failed at least twice...a couple of attempts that never saw the light of day and weren't "ship worthy".
With Flux, we are on the brink of breaking through and delivering on this list (and more!) while continuing to support InfluxQL. i.e. Histogram...already in. So, we maintain the easy on-ramp via InfluxQ. If that is all you need...great! But, if you need more power...and there are a number of time series use cases which certainly do -- particularly given this list, Flux will be there. In 1.7 InfluxDB, there are two query engines that run in parallel. In 2.0, the Flux engine will be the primary engine and InfluxQL will run in a compatibility mode on top of that engine. Hope that helps clarify!
Makes sense (and thanks for the clarification).
3 years and Boolean cast to Integer (1/0) is not yet implemented...
Thank you for the hard work. I just came here to express that more functions would definitely be very useful.
I am currently held back by the lack of logarithmic mean aggregator and I believe the query language is not flexible enough to allow me to do it from the query itself.
So, +1 for log mean please !
Boolean type cast would be really useful in my work place.
+1 for Histogram
I'd like to have CAST from string to boolean to integer
"true" => true => 1
"false" => false => 0
+1 to CAST from boolean to integer
Casting boolean -> int has been implemented in Flux. This is available as a technical preview in 1.7 and we are just about to release 1.8 which includes some additional, significant updates.
https://docs.influxdata.com/flux/v0.50/introduction/flux-vs-influxql/
Most helpful comment
+1 one for adding more math functions
From the top of my head the function would be: