Influxdb: CQ or Backfill using Top() with tag does not create tag in result

Created on 9 Aug 2016  路  21Comments  路  Source: influxdata/influxdb

System info: 1.0.0-beta3, official docker image

Steps to reproduce:

  1. Create a measurement with some fields and tags
  2. Create a continuous query or a backfill query with INTO clause while doing a select that looks like this:
    select top(summary_field, tag1, 100) as total, tag2, tag3 INTO tops FROM previous_measurement where time < now() GROUP BY tag4, time(1d) fill(none);

Expected behavior:
Output should mimic this:
total, tag2, tag3 should be become fields, due to the way they're selected.
tag1 and tag4 should be tags.

Actual behavior:
Tag1 is created as a field, not a tag.

This behavior is irrelevant if you're just selecting, however when used as a select INTO, because it is not created as a tag, you don't get the data you should.

Observe:

> select... INTO...:
name: result
------------
time    written
0       5181

> select count(total) from tops;
name: tops
---------------------------
time    count
0       100

> show tag keys from tops
name: tops
---------------------------
tagKey
tag4

Because it did not create tag1, the series was not unique enough to accommodate all the data, and the points were considered duplicates, and overwrote one another. This is why there were 5181 points written, but only 100 are available in the destination measurement of the INTO.

I don't believe this issue is specific to my config. It appears as though this section of code hasn't been touched much since Feb 2016, and this situation is not covered by any tests in the code or documentation, or google group.

Let me know if I can provide additional information.

arecontinuous queries arequeries kinenhancement

Most helpful comment

@kostasb seems to have hit the nail on the head. I had found exactly what he documented in my endeavors to similarly render a "top X by Y over time" graph.

select top(field, tag, number) from measurement where timeexpr group by time(interval)

will indeed return the top "number" (one per series) per interval, but the results themselves are a _single_ series. You would then try to add the appropriate tag to the group by:

select top(field, tag, number) from measurement where timeexpr group by time(interval),tag

However rather than getting the former results broken down into "number" buckets (one per tag) per interval the result is that every tag is returned and not just those resulting from the original top(). Instead of the one-value-per-series up to "number" of series, the results become the top "number" of values per series

There likely isn't much that Grafana could do with this since there doesn't see to be a way to get the needed data properly presented from Influxdb.

I've done quite a bit of searching and found that there are _many_ that have tried to do similar, and there have been various attempted alternatives/workarounds not generally with success. There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases.

I too am curious if there has been any further discussion on available or planned solutions for this.

All 21 comments

This is because when you put tag1 into the function, it outputs it as a field and you're writing it as a field. I think if you put tag1 in the GROUP BY instead of the function, you'll get what you want. Can you try that?

Doing that negates the affect of the top() command:

> select... INTO...:
name: result
------------
time                    written
1970-01-01T00:00:00Z    17243

> select count(total) from tops;
name: tops
---------------------------
time                    count
1970-01-01T00:00:00Z    17243

select count(total) from tops where tag4='2' and time > now() - 1d
name: tops
---------------------------
time                            count
2016-08-11T07:01:02.25711371Z   376

As you can see, it passed along all the data, and top returned more than 100 items

You are correct that it outputs tag1 as a field. That is the bug as far as I can see.

To further clarify -- by the way it handles tag4 in the group by, it seems to me it _PROCESSES_ tag1 correctly as a tag instead of a field, its just that the output has it has it incorrectly as a field.

This is another component of my strategy to reduce cardinality: I'm storing limit top() output, with reduced cardinality, to be queried long-term and keep all other measurements/continuous queries leading up to this one as very short retention policies.

Unfortunately, until this is resolved, I cannot reduce cardinality on the step prior to this; and it grows by the day.

Please let me know if there's anything additional I can provide to clarify this issue.

We are currently looking at this and trying to figure out if there is an available solution to your problem. The underlying issue is that this is intended behavior and we reached the point where we aren't allowing breaking changes into the system anymore. When you perform a select and retrieve a tag, it is telling the system that you want that tag to be treated as a string field.

I'll be writing up an issue soon to address some of the problems that are related to this one that have been popping up to see if we can understand the underlying issue better. At the moment, I don't have a solution for you or a workaround. I'm sorry. I'll link that issue when I've finished writing it.

Thanks for the update.

In this instance, the underlying issue may be the top syntax which allows a tag.
It has also created some ambiguity in relation to the 'AS fieldname' -- as it only applies to the first item in the top() function.

For my specific situation, I'd be curious to understand what it would look like for the 'solution' to be that if tag1 _IS_ also present in the 'group by' -- the only affect is that it toggles the top() output to have tag1 as a tag instead of field.

As I noted, currently having tag1 in both top() and group by effectively negates the functionality of the top() function.

So I'm not really sure exactly how having the same tag listed in top() and group by are coded to interact right now; but it would seem to be... not desired / not useful. So to reassign that behavior to a helpful case, may be optimal.

Checking in if there's any update, related issues that could be linked, etc.

This is still a sticking point for us, that I'd like to see resolved.
In the short term, I may need to select the data out, and insert it back in; emulating a continuous query, to get it in the format I need.

The issue is that series context is not preserved in the result of top(measurement,tag,number).

E.g.:

> insert mymeasurement,mytag=one value=1
> insert mymeasurement,mytag=two value=2
> insert mymeasurement,mytag=three value=3
> select top(value,mytag,1) from mymeasurement
name: mymeasurement
time                top mytag
----                --- -----
1488375608559605946 3   three

"mytag" is presented as a field in the output, so series context is lost in the results of top().

Group by doesn't help because it will return one result bucket for every tag value participating in the query, not just for the tag values returned by top():

> select top(value,mytag,1) from mymeasurement group by mytag
name: mymeasurement
tags: mytag=one
time                top mytag
----                --- -----
1488375598767546227 1   one

name: mymeasurement
tags: mytag=three
time                top mytag
----                --- -----
1488375608559605946 3   three

name: mymeasurement
tags: mytag=two
time                top mytag
----                --- -----
1488375603628085662 2   two

I had raised this with grafana originally, however it is the same issue as this one relating to the influx query.
https://github.com/grafana/grafana/issues/7605

@kostasb seems to have hit the nail on the head. I had found exactly what he documented in my endeavors to similarly render a "top X by Y over time" graph.

select top(field, tag, number) from measurement where timeexpr group by time(interval)

will indeed return the top "number" (one per series) per interval, but the results themselves are a _single_ series. You would then try to add the appropriate tag to the group by:

select top(field, tag, number) from measurement where timeexpr group by time(interval),tag

However rather than getting the former results broken down into "number" buckets (one per tag) per interval the result is that every tag is returned and not just those resulting from the original top(). Instead of the one-value-per-series up to "number" of series, the results become the top "number" of values per series

There likely isn't much that Grafana could do with this since there doesn't see to be a way to get the needed data properly presented from Influxdb.

I've done quite a bit of searching and found that there are _many_ that have tried to do similar, and there have been various attempted alternatives/workarounds not generally with success. There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases.

I too am curious if there has been any further discussion on available or planned solutions for this.

yep this would be gold to be able to achive

_There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases._

+1000

Not being able to do this in Grafana/InfluxDB is killing us right now =>

_There are various similar scenarios trying to graph over time the top five processes by CPU, top five processes by memory, top five interfaces by traffic, top five temperatures by city, etc. And those are just the simple/basic cases._

Does anyone have _any_ type of workaround for this? I really don't want to go back to RDBMS/roll-our-own -viz solution - let me know if we can help somehow.

@jsternberg Is there anything happening on this issue?

Here is a proposed approach for addressing this issue. Looking for feedback in terms of the syntax (ok, not ok, good heavens NO!, etc.)

In essence, the proposal is to allow for an optional hint -- in the form of '::tag' to be added. Here is an example:

SELECT top(summary_field, tag1::tag, 100) as total, tag2, tag3 
INTO tops 
FROM previous_measurement where time < now() 
GROUP BY tag4, time(1d) fill(none);

Comments?

Unfortunately, I don't think the casting approach will work. The main problem is that casting is mostly used for telling the query engine how to read a query rather than how to write a query. So it would end up serving a double function that would be incredibly confusing for the query engine.

@rbetts brought up one possibility and I'm going to bring up another. One possibility is just to make a new function so we don't have the ambiguity anywhere in the system. The old function has the old functionality and the new function has the new functionality. This might be the cleanest solution since it completely separates from the old behavior.

Another potential way of doing this which prevents the need to add a new function is to check if any of the tags referenced in the top() or bottom() call are located inside of the groupings. I might just be limited at the moment, but I can't think of a reason why this query would ever be issued and give a reasonable response.

SELECT top(value, host, 2) FROM cpu GROUP BY host

There is no way that query can ever return the top 2 hosts because it tells the query to return the top 2 different hosts in a grouping that only has one possible host. It might be possible to modify things so if a top() query sees that it is being grouped by one of the fields that are being selected, it instead uses the tag rather than converting the tag to a field.

I've been trying to mock this out and here are the current state of my thoughts. My thoughts are chaotic and hard to describe at the moment, so I apologize if any portion of this is difficult to understand.

There are three points where we can possibly insert this behavior.

  1. Within the functions themselves.
  2. Output by the emitter.
  3. A transformation done before inserting the points.

The first possibility is 1. Typically, the tags that are associated with an individual point are matched by the GROUP BY. The groupings also largely decide how points are ordered as they go through the query engine. In order to ensure the points are returned in the correct order, "buckets" exist to separate different groupings. The buckets are based on the outermost grouping within the query so SELECT ... FROM (SELECT ... FROM ... GROUP BY host, region) GROUP BY host is bucketed by the host variable and not the region. This means that the storage engine will return all points from the same host before it moves onto the next host regardless of anything else. These buckets are important to understand because they are what make the query engine work.

Now if we change the query engine so it treats the argument for top() and bottom() as a tag rather than a field, what is the output? If we handle this at the iterator level of the query engine, then we would keep the tag on every point. So if I have the following query with the following data (assume that this is in seconds):

cpu,host=server01 value=2 0
cpu,host=server02 value=4 0
cpu,host=server01 value=8 30
cpu,host=server02 value=7 30

> SELECT top(value, host, 1) FROM cpu GROUP BY time(30s)

We would need to get the following return:

tags: host=server01
time value
---- -----
30   8

tags: host=server02
time value
---- -----
0    4

If you notice the time values, this is a very strange output. First, it's nearly unreadable for a person to interpret this. If you add legitimate GROUP BY statements in there, it would become difficult to impossible to determine exactly what the query is saying.

But there's another concern and that's how we're processing the data. Imagine that host=server01 is the top value in the last interval, but is not included in any interval before that. To determine that this point should be displayed first requires reading all of the possible points that could be chosen first. While we could make this more efficient than having to keep all points in memory, we would still need to keep a minimum number of points in memory. If you have top(value, host, 10) and a GROUP BY time(...) that spans 10 intervals, you would need to keep a maximum of 100 points in memory at a time. The larger the interval, the more points you would have to keep in memory to ensure the ordering of the final output was correct. If you start to group by tags with high cardinalities, it could potentially lead to too many points being kept in the system at a time (especially in regards to subqueries). I can expand on that more if needed since I have skimmed over a lot of the technical aspects of how this would work, but I would have to do that as a separate comment.

The positives of doing it this way would be a more cohesive system. It would be possible to then process the output of this query as if it were a subquery. So if you wanted to count how many times a host is the top cpu consumer, you could do SELECT count(top) FROM (SELECT top(value, host, 1) FROM cpu GROUP BY time(30s), host) GROUP BY host). It also means we don't have to add hacks to other parts of the system to convert a field back to a tag.

But let's explore those options since the query performance of the first method is suspect.

The problem with 2 is that I don't really think it resolves the problems of 1. While it makes it so I don't have to change anything internal to the query engine, the points will still arrive out of order and you will get an output that I consider to be nonsensical. The positives of this one is that it makes it easier to write the data as tags because they will actually be tags.

The third is likely the easiest way to resolve this, but I think it's very hacky and I'm not sure how much I like it. If you perform the conversion before writing by adding some "cast before writing" mechanism, then the ordering doesn't matter since none of the points matter. But it means that you can't do any manipulation of the results with subqueries.

I'm still thinking about the third option though. Something seems wrong to me, but I can't really put it into words. I'm going to start working on a mockup until I can't put those words into feelings. Maybe that will bring up some ideas of why that's not a suitable method of doing this.

For its credit, the third option is the original request. So while I have my reservations, I do think it's possible and that we would be able to do it. Writing this out helps me work through the problem.

For the third option as listed above, any thoughts about something like this as the syntax? Since the third one has to do with only writing the points, I want the syntax to not be possible unless an INTO is included.

SELECT top(value, host, 2) INTO db0.rp0.:MEASUREMENT WITH host AS tag FROM db0.autogen.cpu ...

We're currently exploring an idea with changing the current top() and bottom() implementations that should improve memory performance. May or may not improve speed, but any improvement in speed will likely be from fewer memory allocations rather than any algorithmic change.

For implementing this, we're considering an idea where doing SELECT top(value, host, 2) FROM cpu will have the existing behavior, but adding SELECT top(value, host, 2) INTO ... FROM cpu will cause it to implicitly write the fields as tags instead. Since, based on this issue, the old behavior for SELECT ... INTO ... doesn't practically work anyway, I think this should give everyone what they want.

Before yesterday, I didn't think this idea was feasible due to the ordering of the query if we kept the output as tags instead of converting them to fields. I realized yesterday that, when writing, the ordering of the points didn't matter so it is very likely we can do this when writing even if we cannot do it when using a normal SELECT.

We hope that will work for this issue. If you need to do a normal query and retain the tags with top() for some reason like a subquery, it should be feasible to create a continuous query to write the data and then query that data with the tags.

The optimization I described above is implemented in #8394.

but this still doesnt work with grafana!?

Would someone mind explaining how this is fixed? Or is this only fixed for continuous queries that use the into_clause? I just upgraded to influxdb-1.4.2-1 and i still dont see have tags in the results.

select top("readdata",svm,volume,4) from (select mean("read_data") as "readdata" from vol where time > now() - 6h group by svm,volume)
name: vol
time                           top                 svm           volume
----                           ---                 ---           ------
2017-12-14T11:57:41.577182444Z 0.2910053643834527  testnas1b     vol0
2017-12-14T11:57:41.577182444Z 0.21032861140611472 test_storage1 splunk_vol02
2017-12-14T11:57:41.577182444Z 0.20021185479140136 test_storage1 splunk_vol01
2017-12-14T11:57:41.577182444Z 0.1810521717978208  test_storage1 splunk_vol03

Edit: see below gif for end result and why i need tags in results for grafana:

dec-14-2017 13-51-35

This feature only applies for writing data back into the database. If you don't use INTO, it has the old behavior.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jonira picture jonira  路  4Comments

ricco24 picture ricco24  路  3Comments

deepujain picture deepujain  路  3Comments

allenbunny picture allenbunny  路  3Comments

Raniz85 picture Raniz85  路  3Comments