Influxdb: Show tags, fields etc. much slower than they use to be.

Created on 6 Apr 2016 · 15Comments · Source: influxdata/influxdb

Bug report

System info: [Include InfluxDB version, operating system name, and other relevant details]
influxdb-0.12.0-1.x86_64, Centos 7, 120G M, 16 CPUs

Steps to reproduce:

show tag keys
show field keys

Expected behavior: [What you expected to happen]
These queries should (used to) come back pretty quickly.

Actual behavior: [What actually happened]
Since the change to the output of show tags etc., this has become very slow, which is most noticable waiting for templates to load in grafana. I noticed this after the upgrade to 0.11, but it is still relevant in 0.12

arequeries support

Source

cheribral

Most helpful comment

@benbjohnson @pauldix would be really really nice to get this for 1.0, given the major impact on Grafana users.

beckettsean on 31 May 2016

👍7

All 15 comments

@cheribral thanks for the report. Could you provide more details on example data where this issue is apparent?

e-dard on 6 Apr 2016

@e-dard, sorry, I'm not sure what you are after, so if I'm off, let me know. Picking a database at random wich has 25k series, all of which have the host tag, it takes 15 seconds to run show tag values with key=host. If I try it on one that has about 600k series, it takes 2 minutes.

On grafana dashboards we often have many variables, which used to pop right up. Now, with each list of values taking many seconds to run they take long time to load. Before, the load time for this meta information was negligable, and loading the series data was what we waited for if anything.

cheribral on 7 Apr 2016

Is there any more information that you would need?

cheribral on 14 Apr 2016

we noticed the same problem with our ~500k series database,
unfortunately this makes the grafana panel editor nearly unusable for us.
How can we help to quickly resolve this ?

ThomasKurz on 18 Apr 2016

👍1

I hope this is close enough to #6464 so this issue will be fixed as side effect!

ThomasKurz on 29 Apr 2016

This evening i updated to 0.13.0~rc2, but it seems that #6464 / #6533 has not improved the situation at all.
'SHOW TAG VALUES FROM "v" WITH KEY = "p" ' on my database "testdb" which has ~ 350000 series used (with influxdb 0.10) to respond in < 2 seconds,
now it takes several minutes:

time influx -database testdb -execute 'SHOW TAG VALUES FROM "v" WITH KEY = "p";'
name: v
-------
key value
p   Anjar
p   Maharashtra
p   Padalaya
p   Dingudal
p   Karnataka
p   Rajasthan

real    3m20.380s
user    0m0.000s
sys 0m0.000s

Can somebody explain why this deterioration happened?

In case it helps, here is the output of "SHOW STATS"...
show_stats.txt

ThomasKurz on 11 May 2016

@benbjohnson @jsternberg ideas why the SHOW TAG VALUES query would be so much slower with a WHERE clause now?

beckettsean on 12 May 2016

@ThomasKurz If you can pull down a profile dump then that should tell us what's going on.

Here are the steps for retrieving the dump:

Download https://github.com/benbjohnson/pprofdump
Execute your SHOW TAG VALUES query
While the query is running, execute: pprofdump -o dump.tar.gz http://localhost:8086
Attach dump.tar.gz to this issue. It shouldn't be very large. Maybe 30KB.

benbjohnson on 12 May 2016

here is the profile dump:
dump.tar.gz
hope, it helps.

ThomasKurz on 13 May 2016

@ThomasKurz Thanks! That helps a lot.

It looks like most of the time is spent in row deduplication:
https://rawgit.com/benbjohnson/440e2e1df71f773fd7291897a1c6eee1/raw/78b04b18c4714e046d46c63f9b3bb2fc6963274e/pprof.svg

We recently added influxql.floatFastDedupeIterator for SHOW SERIES and we should be able to reuse it for SHOW TAG VALUES with a little tweaking.

benbjohnson on 13 May 2016

@beckettsean :
i don't feel i can help porting "floatFastDedupeIterator" for "SHOW TAG VALUES",
so how can we get this moving again?
Is there anything else i can help?
Send more dumps?
Make better example setup?

There has no milestone been assigned yet - is it only a problem for me and @cheribral ?

Or could you recommend any workaround?

Currently this problem makes influxdb (in this constellation) unusable with databases of 350k series :-(
only because "SHOW TAG VALUES" is soo slow (3 minutes in 0.13 as opposed to 1 sec in 0.10 !).

ThomasKurz on 31 May 2016

I have the same issue for our database with ~500k series. We use "show tag values where key = ..." for our Grafana dashboards. What used to take less than a second now uses about a minute to complete. We upgraded from 0.10.1 -> 0.13 and essentially broke all Grafana dashboards using templating. It's quicker when specifying measurement name, but it's still alot slower than before.

jhrv on 31 May 2016

We have the same problem. It's slightly faster if we use a query similar to select tag,value from my_measurement where time > now() - 2m, but then I have to filter out the actual values to get just tags. Grafana helps with this by letting you use a regex on the return values.

morganda on 31 May 2016

@benbjohnson @pauldix would be really really nice to get this for 1.0, given the major impact on Grafana users.

beckettsean on 31 May 2016

👍7

Can anyone confirm whether this is an issue for clusters?
PR 6792 states:

The previous implementation of tagValuesIterator still exists because it is used for the distributed SHOW TAG VALUES. This change only affects the single node implementation.