Influxdb: [feature request] Insert new tags to existing values, like update

Created on 31 Aug 2015 · 60Comments · Source: influxdata/influxdb

Can we have a query syntax which allows to insert/attach a new set of tags (along with exiting ones) to values/rows that are already part of a measurement?

My use case: I created a measurement from PerfMon log, which already has Host= tag. Now I want to categorize the data by applications, so I want to add tags like "App1=, App2=" assuming I can have two apps hosted on same server.

Then I want to be able to say Update <measurement name> add <tag name=value> where <some condition based on tags>

Source

mvadu

👍51

Most helpful comment

@beckettsean Hi Sean, since closing this request quite few folks have expressed their interest and support for this feature. Do you guys want to reconsider this request?

mvadu on 13 Dec 2016

👍20 ❤4

All 60 comments

airyland on 1 Sep 2015

This feature would require a fairly major architectural change to the database and is not something we will do this year, if ever. Closing the request for now.

beckettsean on 1 Sep 2015

😕18 👎10

Please consider this during your next architecture review. I think this would be required in many enterprise situations, where you make a mistake while inserting values, and realize it later. Having an option to update would be a life saver!

mvadu on 2 Sep 2015

👍19

Hi,
I have been two specific use cases for this functionality

1) Obfuscation of data for testing / demo purposes.
As the data we deal with is Market data the Tags within the data may actually refer to a ticker and given the other attributes of the data you can work out trade information. I am currently working on a data set which I have been asked to Obfuscate for a demo. To achieve this I am currently loading the dataset into memory and writing it back in with new tags. (This would be simplified by the insert into functionality that has been proposed in #211)

2) Change of the Tag value in the real world
Imagine that one of the tags in your data referenced the Name of a department, if that department name changed it would be good to be able to change it historically, rather than having to have the two tag values running in tandem.
In the world of Data Warehousing this is known as slowly changing dimensions:
http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html

ivanscattergood on 30 Sep 2015

👍2

+1 For us events are usually only recognising in the data following a reasoning step after capture. And, we want to tag data with its event once its recognised. I am considering using continuous queries to do some of this tagging after capture but it seems inefficient and I am not sure it will work for all our use cases

rapport on 17 Jun 2016

srikara on 17 Jun 2016

👎2

+1
This is will be surprisingly useful feature! Sometimes our developers sending metrics not following the namespace requirement, but following some pre-defined hardcoded application name structure (storm metrics collector for example) and those metrics have a lot of value for research.

mrh666 on 27 Jun 2016

+1, if you guys aren't going to add this in some time soon could you at least add work around options to the current documentation since there was nothing mentioning appending tags to an existing measurement.

jarlwolfganger on 19 Jul 2016

+1 this is definitely needed in many real-world solutions as architectural changes might need additional tags.

open-homeautomation on 24 Jul 2016

+1 to add this feature since as of today, there is no other options than to dump a measure in ASCII, hack ASCII file and send it back to a new measure via wireprotocol.

hgomez-sonarsource on 25 Jul 2016

For now, I updated my InfluxDB Fetcher tool (https://github.com/hgomez/influxdb) to transform fields in tags so I could reimport them in proper format

hgomez-sonarsource on 26 Jul 2016

👍1

jincejames on 25 Aug 2016

szll on 13 Oct 2016

autumnw on 24 Oct 2016

andreaaizza on 18 Nov 2016

+1 please provide ability to update existing tags

ryanmills on 7 Dec 2016

geethanjalieswaran on 8 Dec 2016

@beckettsean Hi Sean, since closing this request quite few folks have expressed their interest and support for this feature. Do you guys want to reconsider this request?

mvadu on 13 Dec 2016

👍20 ❤4

Additional use-cases:

Adding a new tag to an existing schema -- It's crazy you can't do this?
You might find that suddenly you need to GROUP BY an existing field value so you must then update the records to remove the value as a field and now add it as a tag
Deletes are expensive so you could update a tag to mark the record as active/disabled instead of deleting it and include the status tag in queries.

ryanmills on 13 Dec 2016

hcsaustrup on 16 Dec 2016

Issue https://github.com/influxdata/influxdb/issues/828 concerns renaming databases, measurements, fields and tags. Some of that functionality is provided by INTO. Eg.
SELECT "old_field_key" AS "new_field_key" INTO new_db..new_measurement FROM old_db..old_measurement GROUP BY "old_tag_key"
renames database, measurement and field key, but cannot be extended to rename the tag key or update with new tags.

Given those limitations, it may be better to dump data, convert to line protocol, go nuts with sed to change/add keys, and then POST back.

@hgomez-sonarsource 's excellent InfluxDB Fetcher makes this quite quick and easy, but note:

It requires maven and java. Easy to install and use, but requires quite a few packages to be downloaded/built/tested/installed.
The invocation line is long and complex, but works just fine if you're careful.
The extra "data1,data2" arguments are for converting a tag to a field, so don't do anything if you don't have fields by those names. You can achieve the same in your sed script anyway.
You probably don't want the 'i' (integer) suffixes on your values because Influx numeric fields are floats by default, so make sure you strip them out in your sed script.

So you can do a simultaneous database rename, measurement rename, field rename, tag rename and tag addition with three chunky lines:

java -cp target/influxdb-fetcher-1.0.0-SNAPSHOT.jar com.github.hgomez.influxdb.InfluxDBFetcher http://127.0.0.1:8086 login password old_db "SELECT * from old_measurement GROUP BY *" > dump.wireproto
sed -i -e "s/old_measurement/new_measurement/;s/old_field_key/new_field_key/;s/old_tag_key/new_tag_key/;s/new_tag_key/additional_tag_key=additional_tag,new_tag_key/;s/i,/,/g;s/i / /" dump.wireproto 
curl -i -POST "http://127.0.0.1:8086/write?db=new_db" --data-binary @dump.wireproto

I used something very similar to convert 20000 points in a couple of seconds. YMMV.

hraftery on 23 Dec 2016

👍9

andyflury on 26 Dec 2016

👎2

Just came across a possible improvement on my workaround posted above. Haven't investigated it, but influx_inspect export might be a replacement for InfluxDB Fetcher for this purpose?

hraftery on 6 Jan 2017

👍3

@hraftery Nice, but you can only use it on database level, not measurement level. Output will get very big, very soon, and I didn't manage to have it output to stdout (i.e. pipe.). Of course you can work around that with mkfifo etc. but still big output...

Of course you can add to the workaround by first "select"ing "into" a new db and so on 😁

einhirn on 25 Jan 2017

+1 this will be really useful feature

vikrammurugesan on 28 Mar 2017

👎2

+1, this will be really useful feature while refact app

runner-mei on 28 Mar 2017

👎1

+1 : this could also be accomplished if SELECT INTO could alter or replace tag keys/values.

samhatchett on 4 Apr 2017

👍2

I google "influxdb add tag" and get here, now I decidid to go back to use postgresql.

raidsan on 2 May 2017

😄9

+1, I need this.

Vistritium on 14 May 2017

👎2

j-wil on 15 May 2017

👎1

I wanted a light weight part to do what @hgomez-sonarsource 's excellent InfluxDB Fetcher (with few drawbacks) does. So I wrote a simple extractor in Powershell. https://github.com/AdysTech/ExtractTransformInflux/blob/master/ExtractTransformInflux.ps1 gets the job done..

mvadu on 26 May 2017

carlospenedorocha on 1 Jun 2017

👎1

In the documentation of v1.2, it is stated that: "InfluxDB is a schemaless database which means it’s easy to add new measurements, tags, and fields at any time. It’s designed to make working with time series data awesome."

But there aren't any reference/API for me to do this from the documentation...

Faint

UPDATE: now I know what schemaless means here: you can insert any column into a table/measurement, but you cannot modify (add new columns) to a record but only overwrite it.

tcloaa on 14 Jun 2017

👍6

vhdmsm on 25 Jun 2017

👎2

davweiss on 30 Jun 2017

baoyicai on 1 Jul 2017

ckatsulis on 15 Jul 2017

JeffAbrahamson on 30 Jul 2017

+1 to allow tagging measurements with post processed tags.

meesern on 25 Aug 2017

amoondra19 on 13 Oct 2017

👎3

@beckettsean Can this be reconsidered at this point? Its been two years since the thread was closed, and it might be more feasible architecturally to consider this now!

This will be extremely useful for maintaining data in medium to long term in an enterprise setting.

bkdonline on 18 Oct 2017

+1
Just realized that a CQ by default drops all tags during while it downsamples data. The tags in the source table were always the same (network traffic for 1 host, 1 interface), so I never noticed the missing tags in the downsampled table. Now that I'm adding a second host, I realize that all existing downsampled entries need to have the tags added (the same for all entries) before I can add a new host. There's months of data in the table already...

Update:
Found a workaround here to add tags. If anyone has the same problem, then this may work for you as well:

Drop the continuous query

> DROP CONTINUOUS QUERY cq_1m ON network

Run the python script (I adjusted it a bit to my problem) to copy the data into an intermediate table, while adding some tags:

from influxdb import InfluxDBClient

client = InfluxDBClient('localhost', database='network')
db_data = client.query('select tx_bps, rx_bps from downsampled_traffic')
data_to_write = [
        {
            'measurement': 'intermediate',
            'tags': {'host': 'compass', 'if': 'eth0'},
            'time': d['time'],
            'fields': {'tx_bps': float(d['tx_bps']), 'rx_bps': float(d['rx_bps'])}
        }
        for d in db_data.get_points()
    ]
client.write_points(data_to_write)

Drop the original measurement

> DROP MEASUREMENT downsampled_traffic

Reload the intermediate measurement into the original measurement:

> SELECT * INTO downsampled_traffic FROM intermediate GROUP BY *

Now the downsampled table contains the host and interface tags I needed:

> select * from downsampled_traffic limit 3
name: downsampled_traffic
time                host    if   rx_bps        tx_bps
----                ----    --   ------        ------
1504201620000000000 compass eth0 12962.4666667 13043
1504201740000000000 compass eth0 107997.333333 125236.466667
1504201800000000000 compass eth0 50329.7333333 51249.3333333

Drop the intermediate measurement:

> DROP MEASUREMENT intermediate

Recreate the continuous query, now including a GROUP BY for the host and interface (if) tags:

> CREATE CONTINUOUS QUERY cq_1m ON network BEGIN SELECT 8 * derivative(mean(rx), 1s) AS rx_bps, 8 * derivative(mean(tx), 1s) AS tx_bps INTO network.autogen.downsampled_traffic FROM network.autogen.traffic GROUP BY time(1m),host,if END

gwijnja on 1 Nov 2017

+9000
Having to loop through every single data point to insert or rename a tag is saddening and extremely resource consuming

gylu on 7 Nov 2017

UVk on 17 Nov 2017

👎2

Encountered this issue on a production system. We have around a million entries every day, the current dataset has been accumulating for half a year.
I was going to set up some additional continuous queries, but turned out that is not possible, as some of the values in measurements, which I wanted to GROUP BY are stored as fields and there is not way to group by "field" types and also no way to convert the "field" types to "tag" types.

Work has been started on a simple utility to convert fields to tags in the wire protocol representation.

go get -u github.com/rvolosatovs/influx-taggify

Example usage:

influx_inspect export -database "$db" -datadir "$datadir" -waldir "$waldir" -out /tmp/influx-export
# Delete the measurements you don't need to convert using `sed`/`perl` (i.e. `perl -in -e 'print unless m/^unrelated_measurement.*/' /tmp/influx-export`)
influx-taggify -in /tmp/influx-export -out /tmp/influx-export-tagged fieldFoo fieldBar
# Drop the old database or edit generated file to change the name of the database
influx -import -path /tmp/influx-export-tagged

It worked for my use case, but your mileage may vary.
Try locally on non-critical setup first!
Feel free to try, report issues and contribute! :)

rvolosatovs on 26 Nov 2017

👍1

+1 @2018

YEMEAC on 4 May 2018

👎3

jheusser on 17 May 2018

👎3

javiergarciad on 1 Jun 2018

👎1

HyEvil on 5 Jun 2018

👎1

longit644 on 6 Jun 2018

👎1

kikohs on 19 Jul 2018

👎1

galindro on 5 Sep 2018

👎1

For what it's worth I've revisited this 2 years later and explored all alternative options for exporting/importing data. To save others the hours of frustration, it seems my workaround above is still the easiest way to achieve this and all manner of related export/import tasks.

Lesser alternatives:

The influx_inspect tool is complicated, requires sudo, relies on TSM files and dumps the whole db.
The HTTP API (eg. via curl) is great, but it produces JSON that can't be read by influx!
The influx tool's - execute flag is promising, but even csv2influx.py can't read the csv generated.

hraftery on 13 Sep 2018

👍3

We were trying to solve this problem for a customer - where data may have been inserted incorrectly and we then needed to go back and re-tag a series (or several series). We ended up creating a little node-based migration service. It's solving our problem for now, and our customer has a more friendly means of managing their data. May not be suitable for all cases. You can find it here.

samhatchett on 13 Sep 2018

👍1

+1 : this could also be accomplished if SELECT INTO could alter or replace tag keys/values.

this would be nice in CONTINUOUS QUERY... because the data is being transformed and new tags would be the most logical.

example:

we have a few lab room and we use a CQ to find how busy each room is... The rooms are not tagged but are inferred from the host name.
It be ideal if we could tag the room like mc3006... but because of influxdb current limitations we are using a new metric for each room.

SELECT sum("value") INTO "thinclient_active_mc3006" FROM "thinclient_active" WHERE ("host" =~ /3006/) GROUP BY time(1m, -5m)
SELECT sum("value") INTO "thinclient_active_mc3007" FROM "thinclient_active" WHERE ("host" =~ /3007/) GROUP BY time(1m, -5m)
SELECT sum("value") INTO "thinclient_active_mc3008" FROM "thinclient_active" WHERE ("host" =~ /3008/) GROUP BY time(1m, -5m)

something like the following would would be a nice improvement!

SELECT sum("value"), room::tag=mc3008 INTO "thinclient_active_room" FROM "thinclient_active" WHERE ("host" =~ /3008/) GROUP BY time(1m, -5m), room

steverweber on 28 Jan 2019

👍2

@hraftery please reconsider this feature request.

steverweber on 28 Jan 2019

@steverweber I'm just a user - you probably want to direct your request to a contributor. But I wouldn't hold your breath. This issue was closed back in 2015. There are at least three workarounds described in this thread, but I agree, something along the lines of a SELECT INTO would be far better.

hraftery on 28 Jan 2019

I sincerely with that this would be reconsidered. It's a must-have :)