Influxdb: subquery does not join with parent query

Created on 17 Jan 2018  Â·  4Comments  Â·  Source: influxdata/influxdb

System info: CentOS 6.9, influxdb 1.4.2

Problem: influxdb does not join the subquery with the primary query.

The influxdb data is being sourced from collectd. One table is populated by the "filecount" plugin for collectd, the other is an application specific plugin. The "filecount" plugin delivers data every 10 seconds but is not aligned to a specific time boundary. The application ("app") plugin delivers data every 60 seconds and is aligned with minute boundaries.

Naturally, this means that a straight join on time is not possible - it is necessary to use "GROUP BY(60s)" to round out the filecount row times and "mean()" to determine what value should be associated with that time.

The ultimate goal is to be able to do a SELECT "bytes"/"entries" ... with the full query.

__Expected behavior:

> SELECT "entries","bytes" FROM (SELECT mean("value") AS "bytes" FROM "filecount" WHERE "host" =~ /hostname/ AND "instance" =~ /table/ AND type = 'bytes' AND time > now() - 5m GROUP BY time(60s)),"app" WHERE "host" =~ /hostname/ AND "instance" =~ /table/ AND time > now() - 5m
name: lips
time                entries bytes
----                ------- -----
1516174020000000000 180704 51089813129
1516174080000000000 180784 51100341941

__Actual behavior:__ [What actually happened]

> SELECT "entries" FROM "app" WHERE "host" =~ /hostname/ AND "instance" =~ /table/ AND time > now() - 5m
name: lips
time                entries
----                -------
1516174020000000000 180704
1516174080000000000 180784

> SELECT mean("value") AS "bytes" FROM "filecount" WHERE "host" =~ /hostname/ AND "instance" =~ /table/ AND type = 'bytes' AND time > now() - 1m GROUP BY time(60s)
name: filecount
time                bytes
----                -----
1516174020000000000 51089813129
1516174080000000000 51100341941

> SELECT "entries","bytes" FROM (SELECT mean("value") AS "bytes" FROM "filecount" WHERE "host" =~ /hostname/ AND "instance" =~ /table/ AND type = 'bytes' AND time > now() - 5m GROUP BY time(60s)),"app" WHERE "host" =~ /hostname/ AND "instance" =~ /table/ AND time > now() - 5m
name: lips
time                entries bytes
----                ------- -----
1516174020000000000 180704
1516174080000000000 180784

Most helpful comment

influxdb is a time series database, why can't sub queries join based on
time (and nothing else)?

I would have thought that would be the most important thing to do ...
I've got measurements of time interval X in table A and I want to join
them with those in table B from the same time interval.

On 30/1/18 7:35 am, Jonathan A. Sternberg wrote:
>

We currently have plans to transition current InfluxQL queries to work
with IFQL to allow for an easier transition between the two and allow
us to only work on one query engine. The problem is that this would
require advanced join functionality and the syntax isn't currently
present in InfluxQL to do that so we would have to add new syntax.

There /is/ join functionality in InfluxQL right now, but it is
implicit and impossible to configure. Basically, when you do |SELECT
usage_system, usage_guest FROM telegraf..cpu| you are performing a
join between the |usage_system| field and the |usage_guest| field into
a single row. It will always join these two based on having the same
measurement (|cpu|) and the same tag keys and values. IFQL is going to
allow joining based off of other criteria which will allow joining
between different series keys and I think measurements are supposed to
now become an optional tag rather than a required first class citizen.

That's the main difficulty. InfluxQL just doesn't have syntax to do
this and we're unlikely to develop syntax for solving this problem
since IFQL will be doing that.

I hope that helps explain things a little bit.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/influxdata/influxdb/issues/9330#issuecomment-361377685,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AXJ-tnlDqoj2rjdF8yeQO99Q1evHmobxks5tPiukgaJpZM4Rg8E6.

--
Darren Reed
darren.[email protected]
M +61-0427970586

All 4 comments

This functionality is not possible at the moment. More advanced joining functionality is planned for IFQL and I would keep a look out for it to show up in there at some point if it isn't there already.

I think Kapacitor may have this type of functionality and Kapacitor integrates with InfluxDB so you can have Kapacitor process a stream and then write it back to InfluxDB. I apologize that this isn't possible at the moment, but it is a quite complicated addition to the current query engine and big features like this are planned for IFQL rather than InfluxQL.

It seems to me that if IFQL gains new features like this and what has been talked about then the obvious thing to do is implement a translation layer to convert InfluxQL into IFQL.

But maybe I am glossing over details of the InfluxQL parser that make this difficult.

On 30 Jan 2018, at 03:16, Jonathan A. Sternberg notifications@github.com wrote:

This functionality is not possible at the moment. More advanced joining functionality is planned for IFQL and I would keep a look out for it to show up in there at some point if it isn't there already.

I think Kapacitor may have this type of functionality and Kapacitor integrates with InfluxDB so you can have Kapacitor process a stream and then write it back to InfluxDB. I apologize that this isn't possible at the moment, but it is a quite complicated addition to the current query engine and big features like this are planned for IFQL rather than InfluxQL.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

We currently have plans to transition current InfluxQL queries to work with IFQL to allow for an easier transition between the two and allow us to only work on one query engine. The problem is that this would require advanced join functionality and the syntax isn't currently present in InfluxQL to do that so we would have to add new syntax.

There _is_ join functionality in InfluxQL right now, but it is implicit and impossible to configure. Basically, when you do SELECT usage_system, usage_guest FROM telegraf..cpu you are performing a join between the usage_system field and the usage_guest field into a single row. It will always join these two based on having the same measurement (cpu) and the same tag keys and values. IFQL is going to allow joining based off of other criteria which will allow joining between different series keys and I think measurements are supposed to now become an optional tag rather than a required first class citizen.

That's the main difficulty. InfluxQL just doesn't have syntax to do this and we're unlikely to develop syntax for solving this problem since IFQL will be doing that.

I hope that helps explain things a little bit.

influxdb is a time series database, why can't sub queries join based on
time (and nothing else)?

I would have thought that would be the most important thing to do ...
I've got measurements of time interval X in table A and I want to join
them with those in table B from the same time interval.

On 30/1/18 7:35 am, Jonathan A. Sternberg wrote:
>

We currently have plans to transition current InfluxQL queries to work
with IFQL to allow for an easier transition between the two and allow
us to only work on one query engine. The problem is that this would
require advanced join functionality and the syntax isn't currently
present in InfluxQL to do that so we would have to add new syntax.

There /is/ join functionality in InfluxQL right now, but it is
implicit and impossible to configure. Basically, when you do |SELECT
usage_system, usage_guest FROM telegraf..cpu| you are performing a
join between the |usage_system| field and the |usage_guest| field into
a single row. It will always join these two based on having the same
measurement (|cpu|) and the same tag keys and values. IFQL is going to
allow joining based off of other criteria which will allow joining
between different series keys and I think measurements are supposed to
now become an optional tag rather than a required first class citizen.

That's the main difficulty. InfluxQL just doesn't have syntax to do
this and we're unlikely to develop syntax for solving this problem
since IFQL will be doing that.

I hope that helps explain things a little bit.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/influxdata/influxdb/issues/9330#issuecomment-361377685,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AXJ-tnlDqoj2rjdF8yeQO99Q1evHmobxks5tPiukgaJpZM4Rg8E6.

--
Darren Reed
darren.[email protected]
M +61-0427970586

Was this page helpful?
0 / 5 - 0 ratings

Related issues

scotloach picture scotloach  Â·  63Comments

beckettsean picture beckettsean  Â·  83Comments

srfraser picture srfraser  Â·  90Comments

pauldix picture pauldix  Â·  89Comments

beckettsean picture beckettsean  Â·  68Comments