Influxdb: Show tags, fields etc. much slower than they use to be.

Created on 6 Apr 2016  路  15Comments  路  Source: influxdata/influxdb

Bug report

System info: [Include InfluxDB version, operating system name, and other relevant details]
influxdb-0.12.0-1.x86_64, Centos 7, 120G M, 16 CPUs

Steps to reproduce:

  1. show tag keys
  2. show field keys

Expected behavior: [What you expected to happen]
These queries should (used to) come back pretty quickly.

Actual behavior: [What actually happened]
Since the change to the output of show tags etc., this has become very slow, which is most noticable waiting for templates to load in grafana. I noticed this after the upgrade to 0.11, but it is still relevant in 0.12

arequeries support

Most helpful comment

@benbjohnson @pauldix would be really really nice to get this for 1.0, given the major impact on Grafana users.

All 15 comments

@cheribral thanks for the report. Could you provide more details on example data where this issue is apparent?

@e-dard, sorry, I'm not sure what you are after, so if I'm off, let me know. Picking a database at random wich has 25k series, all of which have the host tag, it takes 15 seconds to run show tag values with key=host. If I try it on one that has about 600k series, it takes 2 minutes.

On grafana dashboards we often have many variables, which used to pop right up. Now, with each list of values taking many seconds to run they take long time to load. Before, the load time for this meta information was negligable, and loading the series data was what we waited for if anything.

Is there any more information that you would need?

we noticed the same problem with our ~500k series database,
unfortunately this makes the grafana panel editor nearly unusable for us.
How can we help to quickly resolve this ?

I hope this is close enough to #6464 so this issue will be fixed as side effect!

This evening i updated to 0.13.0~rc2, but it seems that #6464 / #6533 has not improved the situation at all.
'SHOW TAG VALUES FROM "v" WITH KEY = "p" ' on my database "testdb" which has ~ 350000 series used (with influxdb 0.10) to respond in < 2 seconds,
now it takes several minutes:

time influx -database testdb -execute 'SHOW TAG VALUES FROM "v" WITH KEY = "p";'
name: v
-------
key value
p   Anjar
p   Maharashtra
p   Padalaya
p   Dingudal
p   Karnataka
p   Rajasthan

real    3m20.380s
user    0m0.000s
sys 0m0.000s

Can somebody explain why this deterioration happened?

In case it helps, here is the output of "SHOW STATS"...
show_stats.txt

@benbjohnson @jsternberg ideas why the SHOW TAG VALUES query would be so much slower with a WHERE clause now?

@ThomasKurz If you can pull down a profile dump then that should tell us what's going on.

Here are the steps for retrieving the dump:

  1. Download https://github.com/benbjohnson/pprofdump
  2. Execute your SHOW TAG VALUES query
  3. While the query is running, execute: pprofdump -o dump.tar.gz http://localhost:8086
  4. Attach dump.tar.gz to this issue. It shouldn't be very large. Maybe 30KB.

here is the profile dump:
dump.tar.gz
hope, it helps.

@ThomasKurz Thanks! That helps a lot.

It looks like most of the time is spent in row deduplication:
https://rawgit.com/benbjohnson/440e2e1df71f773fd7291897a1c6eee1/raw/78b04b18c4714e046d46c63f9b3bb2fc6963274e/pprof.svg

We recently added influxql.floatFastDedupeIterator for SHOW SERIES and we should be able to reuse it for SHOW TAG VALUES with a little tweaking.

@beckettsean :
i don't feel i can help porting "floatFastDedupeIterator" for "SHOW TAG VALUES",
so how can we get this moving again?
Is there anything else i can help?
Send more dumps?
Make better example setup?

There has no milestone been assigned yet - is it only a problem for me and @cheribral ?

Or could you recommend any workaround?

Currently this problem makes influxdb (in this constellation) unusable with databases of 350k series :-(
only because "SHOW TAG VALUES" is soo slow (3 minutes in 0.13 as opposed to 1 sec in 0.10 !).

I have the same issue for our database with ~500k series. We use "show tag values where key = ..." for our Grafana dashboards. What used to take less than a second now uses about a minute to complete. We upgraded from 0.10.1 -> 0.13 and essentially broke all Grafana dashboards using templating. It's quicker when specifying measurement name, but it's still alot slower than before.

We have the same problem. It's slightly faster if we use a query similar to select tag,value from my_measurement where time > now() - 2m, but then I have to filter out the actual values to get just tags. Grafana helps with this by letting you use a regex on the return values.

@benbjohnson @pauldix would be really really nice to get this for 1.0, given the major impact on Grafana users.

Can anyone confirm whether this is an issue for clusters?
PR 6792 states:

The previous implementation of tagValuesIterator still exists because it is used for the distributed SHOW TAG VALUES. This change only affects the single node implementation.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shilpapadgaonkar picture shilpapadgaonkar  路  3Comments

MayukhSobo picture MayukhSobo  路  3Comments

756445638 picture 756445638  路  3Comments

dandv picture dandv  路  3Comments

jayannah picture jayannah  路  3Comments