Influxdb: Question: Upper limit for tags

Created on 27 Jul 2015  路  11Comments  路  Source: influxdata/influxdb

I have a small question. Is there an upper limit to how many tags we can add in InfluxDB?

Most helpful comment

Not really. There used to be a 64k limit but that was removed with the TSM engine. I suspect 4GB would be the limit now, but anything above a few thousand KB seems like a bad idea, just for throughput concerns.

Remember, the full uncompressed tag set lives in memory as the index. No better way to chew up RAM than with 10KB tag names and values.

All 11 comments

From the docs

As a rule of thumb, keep tag cardinality below 100,000. The limit will vary depending on the resources available to InfluxDB, but it is best to keep tag cardinality as low as possible. If you have a value in your data with high cardinality, it should probably be a field, not a tag.

The 100k number is very rough. If you stay below that you should be fine. With tag cardinality in the millions schema and query design become more important, as it becomes easier to create poor performance situations.

We will document this more extensively as performance testing matures.

An extension to the question: is there a limit to the length of a tag?

Not really. There used to be a 64k limit but that was removed with the TSM engine. I suspect 4GB would be the limit now, but anything above a few thousand KB seems like a bad idea, just for throughput concerns.

Remember, the full uncompressed tag set lives in memory as the index. No better way to chew up RAM than with 10KB tag names and values.

Thanks @beckettsean

Sorry to add to this issue, but by tag cardinality does that include both tag keys and tag values or just one or the other?

@elvarb usually we mean a single tag key with many tag values as the the cardinality of a tag. As mentioned above this is very much just a rule of thumb and is increasingly not relevant.

At the moment we're working on https://github.com/influxdata/influxdb/issues/7151 which will remove these types of restrictions.

@desa thanks for the info, I have been testing using influxdb to track logs and using tags for the log metadata. Extremely promising and glad to hear that this possible problem will be removed in the future.

eek. we just ran into this. given that retention policies can't go less than an hour we are eating all available ram (16g) within the hour with (fairly) unique tag sets of source / destination IP/port plus volume counters (cisco netflow logging). aggregating doesn't help coz just the initial data is killing us, let alone our desire to archive aggregated data. we've got a stress-test python script that simulates our load in case it's interesting.

-i

Data newb here. This has been a useful issue to read, but can someone explain cardinality in terms of InfluxDB tags?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dandv picture dandv  路  3Comments

shilpapadgaonkar picture shilpapadgaonkar  路  3Comments

ricco24 picture ricco24  路  3Comments

allenbunny picture allenbunny  路  3Comments

affo picture affo  路  3Comments