Influxdb: Add support for 'having' queries

Created on 15 Jan 2014 · 42Comments · Source: influxdata/influxdb

Should we support having in queries? Like this:

select count(value) from some_series
group by time(5m)
having count(value) > 23

revisit in the future

Source

pauldix

👍23 👎1

Most helpful comment

reinhard-brandstaedter on 9 Aug 2016

👍5

All 42 comments

Having is particularly useful for top queries as well. For instance:

select * from cpu
group by time(1h), host having top(value, 10)
where time > now() - 1d

That would give you the top 10 hosts in each hour for the last 24 hours. Whereas this query:

select top(value, 10) from cpu
group by time(1h), host

Would give you the top 10 measurements from each host in a 1 hour interval. "Having" is what we need to actually pick up the top hosts for a given group by interval. For instance this query:

select max(value) as max_value, host from cpu
group by time(1h), host
having top(max_value, 10)

Would give you the top 10 hosts per hour.

pauldix on 20 May 2014

@pauldix Could you set approximate milestone for this? I'd like to reference that for planning our schedule.

chobie on 21 May 2014

@chobie, how important is this one to you? We have a number of other things we're working on at the moment so we were going to push this one out by 30 days or so.

But, if it's something you really need we can reassess. Any chance that we can help you through a few of the bits to write yourself and submit a PR?

pauldix on 21 May 2014

I also would like to see this feature. I want to use it for sorted and limited output of frequency counts on column values with many different values. For example, what are the values and counts of the top 10 most frequent values? Without this feature the output would be very large, making it very difficult to sort and limit on the application side.

With the help of some pointers in the code I might be able to help.

brightcode on 22 May 2014

Again,

I've challenged this but I couldn't figure out how to implement complete having clause features. Engine and boolean filters are difficult for me :cry:
I guess having clause might need some engine api change. (or, just I don't realized good solution)

For now, I only use top and bottom function for my use case. I'll use this patch until InfluxDB support having clause.
https://github.com/chobie/influxdb/compare/master...incomplete-top-bottom-having-clause

chobie on 24 May 2014

Thanks @chobie for giving it a shot. We'll come back to this soon. I think there are some other features that are higher up on our priority list for the next 2-3 weeks, but we'd love to get to this.

pauldix on 26 May 2014

samuraraujo on 28 May 2014

I see. I'll take some time to improve this patch in next weekend. probably I can implement it.

I guess mixing aggregate function and conditions makes things complicated.
So I'd like to choose below syntax for having clause at this time.

#Query Syntax:
GROUP BY VALUES 
  [HAVING 
         (AGGREGATE_FUNCTION_FOR_HAVING_CLAUSE | CONDITIONS )
  ]

chobie on 28 May 2014

The only other thing I'd note about having is that it should also support where style syntax. For example:

select count(value) as valCount from foo
group by time(10m) having valCount > 3
where time > now() - 6h

pauldix on 29 May 2014

I'd like this functionality also.

I have a series with a lot of grouping, and I want to group all small groups into a single "other" group. I believe I can have 2 queries, one for all groups with sufficiently large group sum, and another one for all "others"... I am not sure if it's even possible with "having" though.

soichih on 2 Sep 2014

ernoaapa on 11 Sep 2014

any updates on this feature?

droxer on 25 Sep 2014

yep, +1, very useful, and relieves some of the subquery use cases

bhtucker on 19 Nov 2014

RubenDZ on 10 Feb 2015

+1, this can be avoided by creating a continious query which not seems to be a dificult solution.

coudel on 13 Feb 2015

sc0rp10 on 24 Mar 2015

cdimitrov on 26 Mar 2015

abeninskibede on 26 Mar 2015

chakri-nelluri on 30 Mar 2015

jpbetz on 21 Jul 2015

Any news on this feature?

zapient on 1 Jan 2016

This would be very much needed.

ollijm on 8 Jan 2016

:+1: this would be extremely useful!

faxm0dem on 14 Jan 2016

How's going on?

ericx10ng on 31 Jan 2016

+1
This would be very helpful to get top, bottom items after aggregations

cmben on 4 Feb 2016

Hi @pauldix, any update on the status of the having feature?

I noticed you and @chobie had a discussion in the pull request, but it is not clear whether the feature has made it in and if not, do you have any estimate of when it would be available?

The use case you mention in one of your comments at the top of this thread is very powerful and necessary for exploring time series data – how the top n items instead of top 10 values for every item.

ram-nadella on 11 Feb 2016

kbespalov on 8 Jul 2016

TechBK on 22 Jul 2016

reinhard-brandstaedter on 9 Aug 2016

👍5

+1. Need this badly! Makes GROUP * much more efficient for querying.

ScottStevenson on 2 Oct 2016

JoCloud007 on 13 Oct 2016

On my first foray with Influx and very quickly ran into multiple cases where I need this kind of behaviour. How are people working round this currently, with continuous queries?