Influxdb: Missing data

Created on 15 Sep 2018 · 19Comments · Source: influxdata/influxdb

We have 2 databases on same host. From time to time we notice that huge chunk of data is missing across all measurements. I'ts like someone hit delete from where time > X and time < Y.
First two times we tried to re-create the data in influx manually because it was in smaller DB ("only" few thousand records were missing). By accident i figured that if we backup our database and just restore it back, all the data is back. So data is not lost, but influx is not showing them. Few days ago we had identical situation with our larger database. Everything from July 12, 2018 to August 23, 2018 - gone!
We did backup/restore (12 GB dump), everything is back! Any ideas?
We use default RP, so none/infinite, no CQ, version is 1.5.2. It happened before with 1.2.4 also.
Debian 8 x64

1.x wontfix

Source

CrAsH1101

Most helpful comment

This issue is not stale until someone responds to concerns, even if it is as simple as there's a fix, the fact is a resolution has not explicitly been found.

gkman on 1 Oct 2019

👍5

All 19 comments

I guess I just hit the same issue on 1.5.2. Suddenly old data is missing altough we have infinite retention (default RP). Did you find a solution for this problem @CrAsH1101 ?

mattpoel on 3 Dec 2018

I did, although it's not a solution.
I dumped whole db to file, and just imported same dump back. Magically, all missing data were back!???
Hope this helps you too...

P.S. Try importing it first to different DB and see if missing data exists before you drop original DB ;)

CrAsH1101 on 3 Dec 2018

👍2

Thanks for the fast response! I'm already in the preparations for this ;-) Read some posts about missing data and possible recovery by dumping and importing the data.

mattpoel on 3 Dec 2018

This is how I "fixed" it with my standard installation, just in case somebody else is encountering the same problem.

I don't take responsibility for any of the following steps. I just want to document how I was able to restore all of the data. If you have custom retention policies, continious queries, etc. please keep in mind that you might have to re-create them manually.

Stop influxdb and take a cold backup

systemctl stop influxdb
cp -rp /var/lib/influxdb /BACKUP_DESTINATION

Extract data

influx_inspect export -datadir /var/lib/influxdb/data  -waldir /var/lib/influxdb/wal -out /DUMP/influxdb.dmp -compress

Without compress option, dumps can get huge.

Erase existing data

rm -rf /var/lib/influxdb/*

Start influxdb

systemctl start influxdb

Import data

influx -import -compressed -path /DUMP/influxdb.dmp

mattpoel on 3 Dec 2018

Also seeing this similar issue, it seems to be related to a restart of the service. However my details on the cause, restart, and when exactly the issue came up are still limited. I will update when I get more information.

Version Info:

# curl -sL -I localhost:8086/ping
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: 9ce66a5e-612f-11e9-947f-000000000000
X-Influxdb-Build: OSS
X-Influxdb-Version: 1.5.3
X-Request-Id: 9ce66a5e-612f-11e9-947f-000000000000
Date: Wed, 17 Apr 2019 16:41:13 GMT

Our Manual Solution:

Make a backup of the database
Restore backup to a new database
Change the retention policy on the original database
Write the data from the backup database to the original database
Change the retention policy back to what it was
Clean up

Explanation

We executed a restore from a backup to the original database so as not to have destructive operations and no downtime of the particular database we need to operate against.
We update the retention policy because attempting to write data outside the retention policy throws unhandleable errors and causes restore to fail

On the host

sudo -u influxdb bash
export DB_NAME="database_name"
mkdir /tmp/db_backup
influxd backup -portable -database ${DB_NAME} /tmp/db_backup
influxd restore -portable -db ${DB_NAME} -newdb backup_temp /tmp/db_backup
unset DB_NAME

On a terminal with `influxdb` cli

export DB_NAME="database_name"
  influx -host example.com -execute "show retention policies on ${DB_NAME}"
  export RETENTION="336h"
  export RETENTION_TEMP="6000h"
  export SHARD="168h"
  export SHARD_TEMP="168h"
  export POLICY_NAME="autogen"
  influx -host example.com -execute "alter retention policy ${POLICY_NAME} on ${DB_NAME} DURATION ${RETENTION_TEMP} shard duration ${SHARD_TEMP} default"
  influx -host example.com -execute "SELECT * INTO ${DB_NAME}..:MEASUREMENT FROM /.*/ GROUP BY *" -database="backup_temp"
  influx -host example.com -execute "alter retention policy ${POLICY_NAME} on ${DB_NAME} DURATION ${RETENTION} shard duration ${SHARD} default"
  influx -host example.com -execute "show retention policies on ${DB_NAME}"
  influx -host example.com -execute 'DROP DATABASE backup_temp'

On the host again:

 rm -rf /tmp/db_backup

gkman on 17 Apr 2019

We experienced a similar issue (like: above ) twice in the last month. However, after a restart the data was displayed (and fine). It seems data written to a new shard, wasn't returned in queries. InfluxDB version: 1.7.6.

tw-bert on 3 Jul 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 1 Oct 2019

👎6

This issue is not stale until someone responds to concerns, even if it is as simple as there's a fix, the fact is a resolution has not explicitly been found.

gkman on 1 Oct 2019

👍5

This issue is happening every time the server is restarted by force (power outage). I already have scripts in place which export/import database and after that everything is fine and all data is displayed correctly. But it's insane that I have to think about it. To be honest, I'm considering alternatives for my time series database :(

CrAsH1101 on 1 Oct 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 30 Dec 2019

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

stale[bot] on 6 Jan 2020

This stale bot is really annoying considering no one has ever responded.

gkman on 16 Jan 2020

It's amazing that this is not getting more attention. I can literally
reproduce issue at will, just kill the server by cutting power off, start
the server and points are missing in database. I'm still on version 1.5.2

Igor

On Thu, Jan 16, 2020, 20:09 Gregory Kman notifications@github.com wrote:

This stale bot is really annoying considering no one has ever responded.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/influxdata/influxdb/issues/10287?email_source=notifications&email_token=ADAGA72I5VO6NCS4OWMYULLQ6CWHVA5CNFSM4FVJKHMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJFF5XY#issuecomment-575299295,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADAGA7Z5WWZ7O5GD5B6CWNDQ6CWHVANCNFSM4FVJKHMA
.

CrAsH1101 on 16 Jan 2020

I completely agree, this issue should be looked at.

tw-bert on 16 Jan 2020

To be honest, I'm not worried to much about it, as i have quick solution
for when it happens, but we are currently in the project migrating
everything from influx to TimeScale, and this issue is one of the primary
reasons :(

On Thu, Jan 16, 2020, 20:37 tw-bert notifications@github.com wrote:

I completely agree, this issue should be looked at.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/influxdata/influxdb/issues/10287?email_source=notifications&email_token=ADAGA76PTJUXIPWBR74G65DQ6CZQRA5CNFSM4FVJKHMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJFIZCQ#issuecomment-575310986,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADAGA76OTQOG5NEY3P5WVWTQ6CZQRANCNFSM4FVJKHMA
.

CrAsH1101 on 16 Jan 2020

Apologies for the handling of this issue. @CrAsH1101 could you test this with 1.7.10 or 1.8?

dgnorton on 19 May 2020

@dgnorton, I extensively tested Influxdb version 1.8.0-1 on machine A with data collected by machine B running Influxdb version 1.7.8, and I, unfortunately, have to confirm that the above-discussed problem still occurs. Although my data is still there (15+GB of data for some measure between feb 2020 and june 2020), I can only retrieve data back from april 2020 till june 2020 with help of for example Grafana or the standard Influx-CLI tool. Any updates on this issue? The above-discussed workarounds work temporary but are not an option in my situation as machine A has to reboot quite often..

alexl04 on 24 Jul 2020

Having the same problem, with v1.8.0. I'm surprised only a few people reporting the issue. This makes it really hard to rely on InfluxDB.
I guess a reloading method of the existing shards would be nice, since the data itself is present, but the DB just can't see it.