Can we have a query syntax which allows to insert/attach a new set of tags (along with exiting ones) to values/rows that are already part of a measurement?
My use case: I created a measurement from PerfMon log, which already has Host=
Then I want to be able to say Update <measurement name> add <tag name=value> where <some condition based on tags>
+1
This feature would require a fairly major architectural change to the database and is not something we will do this year, if ever. Closing the request for now.
Please consider this during your next architecture review. I think this would be required in many enterprise situations, where you make a mistake while inserting values, and realize it later. Having an option to update would be a life saver!
Hi,
I have been two specific use cases for this functionality
1) Obfuscation of data for testing / demo purposes.
As the data we deal with is Market data the Tags within the data may actually refer to a ticker and given the other attributes of the data you can work out trade information. I am currently working on a data set which I have been asked to Obfuscate for a demo. To achieve this I am currently loading the dataset into memory and writing it back in with new tags. (This would be simplified by the insert into functionality that has been proposed in #211)
2) Change of the Tag value in the real world
Imagine that one of the tags in your data referenced the Name of a department, if that department name changed it would be good to be able to change it historically, rather than having to have the two tag values running in tandem.
In the world of Data Warehousing this is known as slowly changing dimensions:
http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html
+1 For us events are usually only recognising in the data following a reasoning step after capture. And, we want to tag data with its event once its recognised. I am considering using continuous queries to do some of this tagging after capture but it seems inefficient and I am not sure it will work for all our use cases
+1
+1
This is will be surprisingly useful feature! Sometimes our developers sending metrics not following the namespace requirement, but following some pre-defined hardcoded application name structure (storm metrics collector for example) and those metrics have a lot of value for research.
+1, if you guys aren't going to add this in some time soon could you at least add work around options to the current documentation since there was nothing mentioning appending tags to an existing measurement.
+1 this is definitely needed in many real-world solutions as architectural changes might need additional tags.
+1 to add this feature since as of today, there is no other options than to dump a measure in ASCII, hack ASCII file and send it back to a new measure via wireprotocol.
For now, I updated my InfluxDB Fetcher tool (https://github.com/hgomez/influxdb) to transform fields in tags so I could reimport them in proper format
+1
+1
+1
+1
+1 please provide ability to update existing tags
+1
@beckettsean Hi Sean, since closing this request quite few folks have expressed their interest and support for this feature. Do you guys want to reconsider this request?
Additional use-cases:
Adding a new tag to an existing schema -- It's crazy you can't do this?
You might find that suddenly you need to GROUP BY an existing field value so you must then update the records to remove the value as a field and now add it as a tag
Deletes are expensive so you could update a tag to mark the record as active/disabled instead of deleting it and include the status tag in queries.
+1
Issue https://github.com/influxdata/influxdb/issues/828 concerns renaming databases, measurements, fields and tags. Some of that functionality is provided by INTO. Eg.
SELECT "old_field_key" AS "new_field_key" INTO new_db..new_measurement FROM old_db..old_measurement GROUP BY "old_tag_key"
renames database, measurement and field key, but cannot be extended to rename the tag key or update with new tags.
Given those limitations, it may be better to dump data, convert to line protocol, go nuts with sed to change/add keys, and then POST back.
@hgomez-sonarsource 's excellent InfluxDB Fetcher makes this quite quick and easy, but note:
So you can do a simultaneous database rename, measurement rename, field rename, tag rename and tag addition with three chunky lines:
java -cp target/influxdb-fetcher-1.0.0-SNAPSHOT.jar com.github.hgomez.influxdb.InfluxDBFetcher http://127.0.0.1:8086 login password old_db "SELECT * from old_measurement GROUP BY *" > dump.wireproto
sed -i -e "s/old_measurement/new_measurement/;s/old_field_key/new_field_key/;s/old_tag_key/new_tag_key/;s/new_tag_key/additional_tag_key=additional_tag,new_tag_key/;s/i,/,/g;s/i / /" dump.wireproto
curl -i -POST "http://127.0.0.1:8086/write?db=new_db" --data-binary @dump.wireproto
I used something very similar to convert 20000 points in a couple of seconds. YMMV.
+1
Just came across a possible improvement on my workaround posted above. Haven't investigated it, but influx_inspect export might be a replacement for InfluxDB Fetcher for this purpose?
@hraftery Nice, but you can only use it on database level, not measurement level. Output will get very big, very soon, and I didn't manage to have it output to stdout (i.e. pipe.). Of course you can work around that with mkfifo etc. but still big output...
Of course you can add to the workaround by first "select"ing "into" a new db and so on 馃榿
+1 this will be really useful feature
+1, this will be really useful feature while refact app
+1 : this could also be accomplished if SELECT INTO could alter or replace tag keys/values.
I google "influxdb add tag" and get here, now I decidid to go back to use postgresql.
+1, I need this.
+1
I wanted a light weight part to do what @hgomez-sonarsource 's excellent InfluxDB Fetcher (with few drawbacks) does. So I wrote a simple extractor in Powershell. https://github.com/AdysTech/ExtractTransformInflux/blob/master/ExtractTransformInflux.ps1 gets the job done..
+1
In the documentation of v1.2, it is stated that: "InfluxDB is a schemaless database which means it鈥檚 easy to add new measurements, tags, and fields at any time. It鈥檚 designed to make working with time series data awesome."
But there aren't any reference/API for me to do this from the documentation...
Faint
UPDATE: now I know what schemaless means here: you can insert any column into a table/measurement, but you cannot modify (add new columns) to a record but only overwrite it.
+1
+1
+1
+1
+1
+1 to allow tagging measurements with post processed tags.
+1
@beckettsean Can this be reconsidered at this point? Its been two years since the thread was closed, and it might be more feasible architecturally to consider this now!
This will be extremely useful for maintaining data in medium to long term in an enterprise setting.
+1
Just realized that a CQ by default drops all tags during while it downsamples data. The tags in the source table were always the same (network traffic for 1 host, 1 interface), so I never noticed the missing tags in the downsampled table. Now that I'm adding a second host, I realize that all existing downsampled entries need to have the tags added (the same for all entries) before I can add a new host. There's months of data in the table already...
Update:
Found a workaround here to add tags. If anyone has the same problem, then this may work for you as well:
> DROP CONTINUOUS QUERY cq_1m ON network
from influxdb import InfluxDBClient
client = InfluxDBClient('localhost', database='network')
db_data = client.query('select tx_bps, rx_bps from downsampled_traffic')
data_to_write = [
{
'measurement': 'intermediate',
'tags': {'host': 'compass', 'if': 'eth0'},
'time': d['time'],
'fields': {'tx_bps': float(d['tx_bps']), 'rx_bps': float(d['rx_bps'])}
}
for d in db_data.get_points()
]
client.write_points(data_to_write)
> DROP MEASUREMENT downsampled_traffic
> SELECT * INTO downsampled_traffic FROM intermediate GROUP BY *
> select * from downsampled_traffic limit 3
name: downsampled_traffic
time host if rx_bps tx_bps
---- ---- -- ------ ------
1504201620000000000 compass eth0 12962.4666667 13043
1504201740000000000 compass eth0 107997.333333 125236.466667
1504201800000000000 compass eth0 50329.7333333 51249.3333333
> DROP MEASUREMENT intermediate
GROUP BY for the host and interface (if) tags:> CREATE CONTINUOUS QUERY cq_1m ON network BEGIN SELECT 8 * derivative(mean(rx), 1s) AS rx_bps, 8 * derivative(mean(tx), 1s) AS tx_bps INTO network.autogen.downsampled_traffic FROM network.autogen.traffic GROUP BY time(1m),host,if END
+9000
Having to loop through every single data point to insert or rename a tag is saddening and extremely resource consuming
+1
Encountered this issue on a production system. We have around a million entries every day, the current dataset has been accumulating for half a year.
I was going to set up some additional continuous queries, but turned out that is not possible, as some of the values in measurements, which I wanted to GROUP BY are stored as fields and there is not way to group by "field" types and also no way to convert the "field" types to "tag" types.
Work has been started on a simple utility to convert fields to tags in the wire protocol representation.
go get -u github.com/rvolosatovs/influx-taggify
Example usage:
influx_inspect export -database "$db" -datadir "$datadir" -waldir "$waldir" -out /tmp/influx-export
# Delete the measurements you don't need to convert using `sed`/`perl` (i.e. `perl -in -e 'print unless m/^unrelated_measurement.*/' /tmp/influx-export`)
influx-taggify -in /tmp/influx-export -out /tmp/influx-export-tagged fieldFoo fieldBar
# Drop the old database or edit generated file to change the name of the database
influx -import -path /tmp/influx-export-tagged
It worked for my use case, but your mileage may vary.
Try locally on non-critical setup first!
Feel free to try, report issues and contribute! :)
+1 @2018
+1
+1
+1
+1
+1
+1
For what it's worth I've revisited this 2 years later and explored all alternative options for exporting/importing data. To save others the hours of frustration, it seems my workaround above is still the easiest way to achieve this and all manner of related export/import tasks.
Lesser alternatives:
influx_inspect tool is complicated, requires sudo, relies on TSM files and dumps the whole db.influx tool's - execute flag is promising, but even csv2influx.py can't read the csv generated.We were trying to solve this problem for a customer - where data may have been inserted incorrectly and we then needed to go back and re-tag a series (or several series). We ended up creating a little node-based migration service. It's solving our problem for now, and our customer has a more friendly means of managing their data. May not be suitable for all cases. You can find it here.
+1 : this could also be accomplished if SELECT INTO could alter or replace tag keys/values.
this would be nice in CONTINUOUS QUERY... because the data is being transformed and new tags would be the most logical.
example:
we have a few lab room and we use a CQ to find how busy each room is... The rooms are not tagged but are inferred from the host name.
It be ideal if we could tag the room like mc3006... but because of influxdb current limitations we are using a new metric for each room.
SELECT sum("value") INTO "thinclient_active_mc3006" FROM "thinclient_active" WHERE ("host" =~ /3006/) GROUP BY time(1m, -5m)
SELECT sum("value") INTO "thinclient_active_mc3007" FROM "thinclient_active" WHERE ("host" =~ /3007/) GROUP BY time(1m, -5m)
SELECT sum("value") INTO "thinclient_active_mc3008" FROM "thinclient_active" WHERE ("host" =~ /3008/) GROUP BY time(1m, -5m)
something like the following would would be a nice improvement!
SELECT sum("value"), room::tag=mc3008 INTO "thinclient_active_room" FROM "thinclient_active" WHERE ("host" =~ /3008/) GROUP BY time(1m, -5m), room
@hraftery please reconsider this feature request.
@steverweber I'm just a user - you probably want to direct your request to a contributor. But I wouldn't hold your breath. This issue was closed back in 2015. There are at least three workarounds described in this thread, but I agree, something along the lines of a SELECT INTO would be far better.
I sincerely with that this would be reconsidered. It's a must-have :)
Are there any plans to implement this feature in influxdb 2.0?
+1
Most helpful comment
@beckettsean Hi Sean, since closing this request quite few folks have expressed their interest and support for this feature. Do you guys want to reconsider this request?