Influxdb: Old fields and tags show up after dropping measurement and rewriting

Created on 6 Jul 2018 · 59Comments · Source: influxdata/influxdb

Using version influxdb-1.5.2, if I drop a measurement, and then write again using the same name, the measurement is recreated but it contains all the tags and keys from the previous measurement.

I noticed this after writing a measurement with an identifier as a field, then deciding to make it a tag. I could no longer query the data without a ::tag cast because the database still retained the old field key. I started playing around with arbitrary keys and values, and dropping the measurement. Every thing I've sent is retained across 'drop measurement's

I've even tried to drop the measurement, stop the database and rebuild the index, but this doesn't work either.

1.x kinbug

Source

cheribral

👍22

Most helpful comment

Update

Hello everyone affected by this issue. Firstly, I would like to apologise that it's been almost a year since this issue was filed. Yesterday I was able to really dig into what was causing this issue, mainly due to the .influxdb directly that @ragnarkurmwunder sent me a while back.

This is a pretty difficult issue to reproduce without an existing dataset. Triggering the issue relies on duplicates of a new series being inserted into a database using the inmem index _within the same batch_. Further, it looks like they need to sit inside of the WAL so that when the database is restarted they will be replayed in and the problem will continue...

The cause of the issue is that the inmem index, in this rare case over counts how many series belong to the measurement (it counts the duplicate points for the same series as different series). Then, when you go to delete the measurement, the index thinks there are still some series around for the measurement and it does not clean up the fields.idx file. This file contains mappings from measurements to field keys, and if it's not cleaned up properly, then those old field keys can be returned in some cases.

I believe I have fixed this issue in https://github.com/influxdata/influxdb/pull/14266

The fix will be available in the 1.8 release, and also in a future 1.7.8 release.

Operational Mitigation Steps

Here are some operational steps you could take to try and resolve this issue without waiting for 1.8 or 1.7.8.

Use the TSI index

I was unable to reproduce this issue using the TSI index. Even when I triggered the issue on the inmem index, and then upgraded to the TSI index, I saw the issue disappear. Whilst we will of course continue to support the inmem index on the 1.x line, from 2.x onwards the TSI index will be the main index InfluxDB uses, and all our development effort will continue on that.

You can find out more information about how to upgrade to TSI here. In the simplest case, you bring your server down and then do something like:

influx_inspect buildtsi -datadir ~/.influxdb/data -waldir ~/.influxdb/wal

Remove invalid fields.idx files

The bug is caused because the fields.idx files (there is one file per shard directory) are not properly rebuilt when the measurement is deleted. However, InfluxDB will rebuild these files if they're missing. If you are currently suffering from fields that are appearing in queries when they shouldn't be then I recommend that you delete all of the field.idx files for the problematic database/retention policy. You will need to bring down your server to do this, then:

$ rm -i ~/.influxdb/data/<db_name>/<rp_name>/*/fields.idx

e-dard on 5 Jul 2019

👍12 🎉4 👀1

All 59 comments

@cheribral could you provide some steps for us to reproduce this issue? Which index type are you using?

e-dard on 9 Jul 2018

This is using the disk based tsi1 index.
This came right after I deleted the measurement and let it sit for a day before writing again.

I went back to make a test measurement to copy and paste the steps, and I can't reproduce it for some reason. I have no idea what the difference is other than that I don't have writes coming in to the measurement while I delete.
I'll see if I can get it happen again tomorrow.

cheribral on 10 Jul 2018

@cheribral thanks, we will need example data and steps so we can follow along and understand the issue better.

e-dard on 10 Jul 2018

@e-dard
Im having the same issue. Im using version: 1.5.2. Here is an example:
> drop measurement memory
>show field keys from memory Returns nothing
>select * from memory Returns nothing

> insert memory,host=test,type=memory value=0
> show field keys from memory
name: memory
fieldKey       fieldType
--------       ---------
buffered       float
cached         float
free           float
heap_usage     float
non_heap_usage float
slab_recl      float
slab_unrecl    float
used           float
value          float
> select * from memory limit 10
name: memory
time                buffered cached free heap_usage host non_heap_usage slab_recl slab_unrecl type   used value
----                -------- ------ ---- ---------- ---- -------------- --------- ----------- ----   ---- -----
1531445331027521830                                 test                                      memory      0

All these fields shown are old fields I am not using anymore but I can't seem to get them to go away. I have gotten them to go away before by running this:
nflux_inspect buildtsi -database graphite -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/
This is not fixing this issue this time.

Hope this helps.

dustin96080 on 13 Jul 2018

Thanks @dustin96080,

do you have a set of data to insert that will reproduce this bug?

e-dard on 13 Jul 2018

Also, were you using TSI previous to 1.5.2?

e-dard on 13 Jul 2018

@e-dard We have been using TSI from the beginning (about 1 year). Not sure how much data you want or how i would get that to you as i don't want it public. I have included a small subset of the data.

memory,host=dustintest01,type=memory buffered=4.272128e+06 1531180067000000000
memory,host=dustintest01,type=memory buffered=4.272128e+06 1531180127000000000
memory,host=dustintest01,type=memory cached=9.96225024e+08 1531179767000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531179827000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531179887000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531179947000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531180007000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531180067000000000
memory,host=dustintest01,type=memory cached=9.9631104e+08 1531180127000000000
memory,host=dustintest01,type=memory free=1.43478784e+09 1531179767000000000
memory,host=dustintest01,type=memory free=1.572323328e+09 1531179827000000000
memory,host=dustintest01,type=memory free=1.572294656e+09 1531179887000000000
memory,host=dustintest01,type=memory free=1.57231104e+09 1531179947000000000
memory,host=dustintest01,type=memory free=1.57231104e+09 1531180007000000000
memory,host=dustintest01,type=memory free=1.572327424e+09 1531180067000000000
memory,host=dustintest01,type=memory free=1.571733504e+09 1531180127000000000
memory,host=dustintest01,type=memory slab_recl=1.668096e+08 1531179767000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531179827000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531179887000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531179947000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531180007000000000
memory,host=dustintest01,type=memory slab_recl=1.66854656e+08 1531180067000000000
memory,host=dustintest01,type=memory slab_recl=1.66846464e+08 1531180127000000000
memory,host=dustintest01,type=memory slab_unrecl=4.1725952e+07 1531179767000000000
memory,host=dustintest01,type=memory slab_unrecl=4.1197568e+07 1531179827000000000
memory,host=dustintest01,type=memory slab_unrecl=4.093952e+07 1531179887000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531179947000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531180007000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531180067000000000
memory,host=dustintest01,type=memory slab_unrecl=4.0833024e+07 1531180127000000000
memory,host=dustintest01,type=memory used=4.396711936e+09 1531179767000000000
memory,host=dustintest01,type=memory used=4.25957376e+09 1531179827000000000
memory,host=dustintest01,type=memory used=4.25986048e+09 1531179887000000000
memory,host=dustintest01,type=memory used=4.259950592e+09 1531179947000000000
memory,host=dustintest01,type=memory used=4.259950592e+09 1531180007000000000
memory,host=dustintest01,type=memory used=4.259934208e+09 1531180067000000000
memory,host=dustintest01,type=memory used=4.26053632e+09 1531180127000000000

dustin96080 on 13 Jul 2018

Also seeing this issue on Influx Cloud 1.5.3-c1.5.3.

shakefu on 28 Jul 2018

I'm running into this behavior of influxdb as well. I'm using logstash to write json into an influx measurement. I'm actually only trying to change the field type from float to int - since this seems to be impossible I tried to DROP MEASUREMENT ... - but the field with the old type reappears. Seems I'm going from one issue to another :-( I'm using the docker image influxdb:1.4-alpine.

cha87de on 6 Oct 2018

Same issue for me - why can't we drop and wipe out a measurement; Influxdb - why are you caching the OLD fields .... grr ....

If I insert the same data into a new table, insert goes through; Otherwise I get this error:

influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: field type conflict: input field \"BLAH_007\" on measurement \"BB2\" is type float, already exists as type integer dropped=1"}

sada-narayanappa on 2 Nov 2018

me too have same issue..

abbasqamar on 8 Nov 2018

What I've found is that If I do a backup, and then restore this backup somewhere else (in another docker container, for example), the new database will work fine.

derrix060 on 6 Dec 2018

The only way that I managed to deal with this (very hackish workaround) was creating a backup to the database that I want, drop this database and finally restoring the backup.

derrix060 on 6 Dec 2018

👍1

I am still not able to drop measurements, and I have been waiting quite some time for an update that fixes this. How is it even possible that you are releasing versions that are so buggy???
I would be ashamed if I put clients in such position, also totally no support on your community forum.

drop measurement "collectd"
drop series from "collectd"
delete from "collectd"

influxdb-1.7.1-1.x86_64
CentOS Linux release 7.5.1804 (Core)
Linux db1 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

f1-outsourcing on 8 Dec 2018

😄2

Seems I'm going from one issue to another :-( I'm using the docker image influxdb:1.4-alpine.

Yes, it is quite bad here. I think it has to do also with the GO language, such a 'low entry threshold level' language, compared to something like c and c++. Is attracting the 'left over' programmers that were 'unable' adopt to more complex and more effort taking language. This difference is also noticeable with something like php. But that is a personal opinion from years of experience, maybe I was just not to lucky with my contacts.
Here some rookie accepts this as being normal. I guess it is a sign of the times.
https://community.influxdata.com/t/or-rention-policies-are-not-correctly-dropped-or-there-is-something-wrong-with-the-cli/7461/4?u=f1outsourcing

f1-outsourcing on 8 Dec 2018

Currently experiencing this issue with InfluxCloud - really painful as I can't even try the backup/restore option someone else mentioned

nibynool on 19 Dec 2018

We think this could be fixed with the 1.7.2 release, where we have fixed a few bugs to do with concurrent writes and deletes.

@nibynool please drop a ticket into support where you will be assisted.

e-dard on 19 Dec 2018

@f1-outsourcing would you upgrade to 1.7.2 and see if that resolves your issue? If you have a dataset that we can use to reproduce your issue that would be great too.

e-dard on 19 Dec 2018

This seemed to work, (just posting it here also)

find . -type d -name index -exec rm -Rf {} \;

influx_inspect buildtsi -datadir data/ -waldir wal/

f1-outsourcing on 29 Dec 2018

Running 1.7.2, experiencing the same issue here..

garceri on 9 Jan 2019

👍2

@garceri do you have a definitive way to reproduce this issue on 1.7.2?

e-dard on 11 Jan 2019

@e-dard I'm experiencing this problem too.

I tried to reproduce it on a lean install, but I was not able to.

But I have a database where it consistently happens. It's 75 Mb and contains info that is not confidential, but I'm not comfortable sharing publicly. I can send it your way so you can check what's going on.

A few more details:

I had a type mismatch after changing telegraf config, so I dropped the measurement to start clean
I can only insert new rows using the old types
if I restart the database the measurement does not show up with SHOW MEASUREMENTS until I try to insert a new row. After that it does show up, and SHOW FIELD KEYS shows the incorrect and old types

silviot on 18 Jan 2019

Experiencing this issue as well on 1.7.2.
Tried the index delete & influx_inspect mentioned by @f1-outsourcing but it didn't work for me :-/

I'm unable to reproduce on demand. I just have a screwed up measurement I can't get fixed.

phemmer on 21 Jan 2019

Experiencing the issue on 1.7.2.
@e-dard - I sent you broken db, check mail.

ragnarkurmwunder on 21 Jan 2019

👍1

Maybe connected to this problem -- I deleted some (not all) data from series. And immediate query didn't show it anymore. After some time lets say a minute, deleted data reappears! Trigger for reappearance may be next time new data is inserted (I have minute interval).

ragnarkurmwunder on 21 Jan 2019

Also have this issue on 1.7.3. (in my case, deleted fields re-appear, with null values)

oliv3 on 29 Jan 2019

👍5

Justice experienced this same problem, on Influx 1.7.3 in the influx-1.7.3-alpine container from dockerhub.
I created a new measurement. Added some data to it. Then realized I need to change the datatype on one of the value fields.
I dropped the Series, then dropped the whole measurment.
Craeted a new measurement with the same name, then tried re-adding the values with a new datatype. Getting error saying the value field is the wrong type.
Influxdb (the index?) is still holding onto the old datatype from the old measurement, and trying to apply it to the new values in the new measurement.

ashleysommer on 14 Feb 2019

I am also experiencing this bug. It is very annoying during development. Is someone working on solving this ?

lovasoa on 18 Feb 2019

👍6

I meet the same bug.how to solve it....
influxdb version 1.6.2
index-version = "inmem"
measurement only has fields

EmptyRabbit on 19 Feb 2019

Issue still occurs
InfluxDB version: 1.7.3

Did someone found any workaround?

ghost on 27 Feb 2019

@kordianslowacki They way I worked around it in my project was to create the new measurement with a slightly different name than the old one. That ensures it does not pick up any cached field properties from the deleted measurement because that only happens when you use the same name.

ashleysommer on 27 Feb 2019

The only workaround seems to be making a snapshot of the database, and restoring it.
https://docs.influxdata.com/influxdb/v1.7/administration/backup_and_restore/

Seeing this kind of issues opened for so long, without feedback from the maintainers, is a little worrying about the project as a whole

lovasoa on 27 Feb 2019

👍4

Same here:

$ influxd version
InfluxDB v1.7.4 (git: 1.7 ef77e72f435b71b1ad6da7d6a6a4c4a262439379)

talek on 18 Mar 2019

I am wondering if this has something to do with telegraf? I have several measurements, a few of them come from telegraf that I am just now experimenting with. It seems like the measurements that were created from telegraf cannot be dropped but I am not certain of that yet. It might just be a coincidence.

sslupsky on 18 Mar 2019

@sslupsky Definitely not telegraf related. We've replicated it on multiple databases (both InfluxCloud and our own) without telegraf data present.

shakefu on 20 Mar 2019

👍3

@shakefu Good to know. Thanks for the feedback. I stopped ingesting data into few measurements several days ago. I still cannot drop them.

sslupsky on 20 Mar 2019

Issue still occurs
InfluxDB version: 1.7.3

lufeewu on 22 Mar 2019

We're also affected by this.

mnicky on 8 Apr 2019

I just noticed this is still happening on 1.7.5. Dropped a measurement with 6 fields and rewrote it with just 1, and the other 5 came back with null values

fsauer65 on 17 Apr 2019

Just tested on 1.7.6 without any luck. We are storing automated test run results with dozens of specific fields, so before each run we want to have a clean state, but after one insert old fields keys show up again with null values. Tried with DELETE, DROP SERIES and DROP MEASUREMENT, with the same outcome.

gapanyc on 19 Apr 2019

Mee too, drop measurement wit 5 fields, re-create measure with 1 field.
Select show 6 fields.
InfluxDb version 1.7.5,
InfluxShell version 1.7.4

rinomau on 23 Apr 2019

I can reproduce that behaviour with InfluxDb 1.7.6, I can provide data if anyone's interested, just tell me what you need.

wollew on 3 May 2019

Well, this is quite an obnoxious bug.
As @derrix060 mentioned above, the only hackish workaround is to drop a database and restore it. But it has an interesting twist to it. It has to be restored as a new database (may or may not have the same name).
Thus the routine (on InfluxDB 1.6.0) is as follows:

# Backing up the database
influxd backup -portable -database "${DATABASE}" /path/to/backup

# Droping the database
influx -username $USERNAME -password $PASSWORD -execute "DROP DATABASE \"${DATABASE}\""

# Restoring the database as newdb
influxd restore -portable -db "${DATABASE}" -newdb "${DATABASE}" /path/to/backup

```

BluePat on 3 May 2019

👍1

@wollew do you have shards/dbs that you would be willing to upload to us, along with repro steps? Alternatively, can you reproduce from scratch by inserting, deleting and querying in a certain way?

e-dard on 7 May 2019

I cannot reproduce with a database from scratch but I can upload an existing database and steps to reproduce. Which files do you need, contents of folders wal and data?

wollew on 7 May 2019

@wollew that would be great. Please email me edd@<nameoftherepo>.com and I will provide you with some credentials where you can securely upload data to our company SFTP server. If that doesn't work for you we can figure something else out.

e-dard on 7 May 2019

I have spend quite some time to find a sure way to reproduce the problem. It did not work. The same procedure (with little data) only seldomly reproduced the problem. However, I have a large number of larger datasets where I almost always found the problem.

Therefore, I have the feeling that this only occurs if large amounts of data are accumulated, e.g. a datapoint every 5 or 10 seconds for several days or weeks (maybe if data is spread over different files or shards?).

Thanks for looking into this, this gives us a huge headache as somtimes datapoints are written with the wrong datatype and we have to rename the values and keep the old fields. Currently, the only way is to copy the whole measurement without the wanted fields, drop the measurements, copy it back and then backup and restore the whole database. Can take hours just to get rid of a single field.

drb-germany on 7 May 2019

@wollew that would be great. Please email me edd@<nameoftherepo>.com and I will provide you with some credentials where you can securely upload data to our company SFTP server. If that doesn't work for you we can figure something else out.

I just emailed you, hope that helps.

wollew on 7 May 2019

Any news on this issue?

wollew on 4 Jun 2019

Just got slammed with this in production. Really really frustrating.

kezsto on 28 Jun 2019

Same here with InfluxDB 1.6.6 on Linux

I wrote some fields as strings instead of float. Old schema survives a "drop measurement" if you reuse measurement name. It is really surprising a database company:

fails to test a fundamental feature
fails to recreate a simple and dire bug

jfcg on 4 Jul 2019

😕1

We are actively investigating this issue. Thanks for your patience, and to those of you who have provided me with data to reproduce the issue.

e-dard on 4 Jul 2019

👍6

Update

I believe I have fixed this issue in https://github.com/influxdata/influxdb/pull/14266

The fix will be available in the 1.8 release, and also in a future 1.7.8 release.

Operational Mitigation Steps

Here are some operational steps you could take to try and resolve this issue without waiting for 1.8 or 1.7.8.

Use the TSI index

You can find out more information about how to upgrade to TSI here. In the simplest case, you bring your server down and then do something like:

influx_inspect buildtsi -datadir ~/.influxdb/data -waldir ~/.influxdb/wal

Remove invalid fields.idx files

$ rm -i ~/.influxdb/data/<db_name>/<rp_name>/*/fields.idx

e-dard on 5 Jul 2019

👍12 🎉4 👀1

That's some great news!

Will we need to do the manual cleanup if we just update to 1.7.8 when it's released?

1ma on 5 Jul 2019

@1ma that's a great point. We will have to add something to the release notes. You will have to either do a manual cleanup, or the issue will resolve itself if you re-drop the measurement.

The manual cleanup would involve removing the stale fields.idx files.

e-dard on 5 Jul 2019

officail docker image influxdb:1.7.7 still has this issue.
So I had to drop database

haodongh on 23 Aug 2019

Any news on this ? I am not sure the issue mentioned by @e-dard above is the same as what everyone is encountering here. The issue is not "hard to reproduce", for me it happens systematically with any measurement that I drop.

lovasoa on 11 Oct 2019

also experienced this issue on influx v1.7.7.
just now, I'm experimenting using telegraf to convert a csv to my influxdb database.
At first the drop is successful, but after 3 attempts the drops suddenly didn't work. The points are gone from the measurement, but everytime I "show measurements", it's still there. Also when I use "show tag keys" and "show field keys", the tags and fields are still there. Then untill now, the measurement cannot be dropped at all.
I hope there is a continuation from influxdb team to solve this "bug"