Influxdb: Should be able to backup/restore single database

Created on 22 Dec 2016 · 15Comments · Source: influxdata/influxdb

Currently, when a user runs a restore from a backup, all of the metadata is replaced with the data from the restore. A user should be able to restore a single database while leaving all other databases in tact.

arebackup and restore aredocumentation kinfeature-request

Source

pauldix

👍7

Most helpful comment

I mistakenly used influx restore to transfer one database to an online system. As a result, it overwrote the meta data, causing all data lost from that point of time. I think the doc should explicitly state the effects of influx restore.

silentred on 15 Nov 2017

👍4

All 15 comments

+1 on this
I would love to be able to do a restore of just a single database from production onto my testing system for example without wiping out all/overwriting existing metadata for other databases.

tmonk42 on 3 Mar 2017

+1
The production users are different from develop environment's, if restoring from production database leaks the user information to develop environment, I think it have some security risks.

Dumping only one database from production into develop or testing environment without the metadata sounds a good fix in this case.

gtt116 on 5 Apr 2017

+1 Migration of databases into existing environments is impossible because of this.

jonanxv on 26 Jun 2017

I also think this is a valuable feature. I came into an environment in which my predecessor scattered some influxdb instances across the various systems that I wanted to unify in order to be able to just maintain one set of RPs, QRs and version.
However reading this just put a dead halt on my undertaking as I have to transfer the collected data so far.

Baiteman on 17 Jul 2017

+1 Same for me - coming across this crucial shortcoming of the Open Source version today put a hold on our plan to use InfluxDB as the core for our new application platform :(. IMHO, such an essential feature should not be used to separate the commercial from the OS version.

cray74 on 23 Aug 2017

This would be great to have, I'm struggling to migrate from one server to another because of the lack of this feature.. :(

acederlund on 11 Sep 2017

FYI its possible to export / import a single DB using the following method. Its poorly documented but works for me. Tsm databases required:

Export database using influx_inspect

Beware exported files are in text format and will be huge.

influx_inspect export -database <db name> -datadir <data dir path> -waldir <wal dir path> -out  <export file>

Import database using influx

influx -import <export file>

jonans on 13 Sep 2017

@jonans I actually found that trick after commenting here, took forever to find out.. First time I'm really working with Influx, so it may be a rookie mistake :)

But for backup and restore purposes, it's not very user friendly to have to export databases that way to be able to restore single databases.. For migration purposes, it's valid tho!

acederlund on 13 Sep 2017

You can add the -compress option to influx_inspect help with the file size...

Docs are here: https://docs.influxdata.com/influxdb/v1.3/tools/influx_inspect/#influx-inspect-export

timhallinflux on 9 Oct 2017

So I've analyzed the case of influx_inspect to dump and influx to restore into another database name:
This script "almost" works (see Problems listed there, it would work if I use sed as described in comments):

#!/bin/sh
if [ -z "$1" ]
then
  echo "You need to provide source database name as argument"
  exit 1
fi
if [ -z "$2" ]
then
  echo "You need to provide destination database name as argument"
  exit 1
fi
# Probelm 1:
# We cannot use compressed output, because generated file contains database name
# We need to change that database name from "$1" to "$2" before restoring - this is needed to do actual copy instead of just backup and restore on the same database.
# influx_inspect export -database "$1" -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/ -compress -out "$1.out"
# Problem 2:
# influx_inspect CAN ACCESS database bypasing credentials (of course [http] auth-enable = true is set in the config file and influxd daemon restarted !!!)
# Credentials works OK because influx commands doesn't allow access without credentials

influx_inspect export -database "$1" -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/ -out "$1.out" || exit 1

# Problem 3
# Here we need to edit "$1.out" and change database name here from "$1" to "$2", this file is 1.4Gb.
# Of course I can call some kind of `sed` to do it, but still we need uncompressed file, because `sed`itting compressed file will take extra time to decompress and compress again.

influx -username "user_name" -password "password" -import -path "$1.out" || exit 1

rm -f "$1.out"
echo 'OK'
# Problem 4
# It takes 3m 40s, which is quite fast but still 2x slower than use:
# https://github.com/cncf/gha2db/blob/master/cmd/idb_backup/idb_backup.go

Generated output file looks like this:

# INFLUXDB EXPORT: 1677-09-21T00:12:43Z - 2262-04-11T23:47:16Z
# DDL
CREATE DATABASE temp WITH NAME autogen
# DML
# CONTEXT-DATABASE:temp
# CONTEXT-RETENTION-POLICY:autogen
# writing tsm data
# then comes the data

We need to edit it and replace "temp" with "prod" in this case.

Summing up there are serious problems:

The generated output file contains database name, and instructions to create the database with given name. It would be better to support giving database name via influx -database <name_here> when restoring.
influx_inspect is NOT using auth!! It can access database without username and password !! :(
It is still slow, but this time is just about acceptable.

I'll still use my own approach that copies one InfluxDB into another 2x faster and uses auth:
https://github.com/cncf/gha2db/blob/master/cmd/idb_backup/idb_backup.go

lukaszgryglicki on 13 Oct 2017

To sum things up, I've tried 4 approaches:

1) I've tried the recommended way:
Backup + restore: https://docs.influxdata.com/influxdb/v1.3/administration/backup_and_restore/
But You can only backup and restore into the same database name, and restoring requires stopping Influx database server.
So this is not usable for our case.

2) I've tried to copy the entire database into another database using:
SELECT * INTO newdb..:MEASUREMENT FROM /.*/ GROUP BY *
This works, but it takes 1 hour and 30 minutes while generating the database from scratch takes 12-14 minutes.
Much much too slow.

3) Tried influx_inspect to dump database and influx to restore. It has the following problems:

Dump file contains database name, so needs to be edited to change this name, this also means we can not do compressed dump, because editing compressed file would require decompress and compress again.
influx_inspect bypases auth, so you can access the database without any auth.
It is quite fast at 3m 40s (3 times faster than regenerate database) but still 2x slower than approach (4)

4) Finally, I've written a tool in go that connects to one InfluxDB database then lists all series, then for each series it is copying its contents into another database. It uses multithreading and takes about 1 minute 50 seconds. This is currently the fastest way of duplicating InfluxDB IMHO: https://github.com/cncf/gha2db/blob/master/cmd/idb_backup/idb_backup.go

lukaszgryglicki on 13 Oct 2017

silentred on 15 Nov 2017

👍4

naiconovalabs on 19 Mar 2018

@aanthony1243 -- @silentred's mistake resulted in lost data. Is this still true? What needs to be documented about this scenario?

stevebang on 20 Mar 2018

this is resolved in version 1.5. Users must indicate either the -online or -portable flag to prevent this scenario. Because the legacy, offline method remains for backward compatibility it would be an improvement to the docs to indicate that this method may result in data loss. Any new users/new implementations should use the -portable method.

existing users/implementations may find the -online flag useful for importing legacy data without and data loss.