Influxdb: Should be able to backup/restore single database

Created on 22 Dec 2016  路  15Comments  路  Source: influxdata/influxdb

Currently, when a user runs a restore from a backup, all of the metadata is replaced with the data from the restore. A user should be able to restore a single database while leaving all other databases in tact.

arebackup and restore aredocumentation kinfeature-request

Most helpful comment

I mistakenly used influx restore to transfer one database to an online system. As a result, it overwrote the meta data, causing all data lost from that point of time. I think the doc should explicitly state the effects of influx restore.

All 15 comments

+1 on this
I would love to be able to do a restore of just a single database from production onto my testing system for example without wiping out all/overwriting existing metadata for other databases.

+1
The production users are different from develop environment's, if restoring from production database leaks the user information to develop environment, I think it have some security risks.

Dumping only one database from production into develop or testing environment without the metadata sounds a good fix in this case.

+1 Migration of databases into existing environments is impossible because of this.

I also think this is a valuable feature. I came into an environment in which my predecessor scattered some influxdb instances across the various systems that I wanted to unify in order to be able to just maintain one set of RPs, QRs and version.
However reading this just put a dead halt on my undertaking as I have to transfer the collected data so far.

+1 Same for me - coming across this crucial shortcoming of the Open Source version today put a hold on our plan to use InfluxDB as the core for our new application platform :(. IMHO, such an essential feature should not be used to separate the commercial from the OS version.

This would be great to have, I'm struggling to migrate from one server to another because of the lack of this feature.. :(

FYI its possible to export / import a single DB using the following method. Its poorly documented but works for me. Tsm databases required:

Export database using influx_inspect

Beware exported files are in text format and will be huge.

influx_inspect export -database <db name> -datadir <data dir path> -waldir <wal dir path> -out  <export file>

Import database using influx

influx -import <export file>

@jonans I actually found that trick after commenting here, took forever to find out.. First time I'm really working with Influx, so it may be a rookie mistake :)

But for backup and restore purposes, it's not very user friendly to have to export databases that way to be able to restore single databases.. For migration purposes, it's valid tho!

You can add the -compress option to influx_inspect help with the file size...

Docs are here: https://docs.influxdata.com/influxdb/v1.3/tools/influx_inspect/#influx-inspect-export

So I've analyzed the case of influx_inspect to dump and influx to restore into another database name:
This script "almost" works (see Problems listed there, it would work if I use sed as described in comments):

#!/bin/sh
if [ -z "$1" ]
then
  echo "You need to provide source database name as argument"
  exit 1
fi
if [ -z "$2" ]
then
  echo "You need to provide destination database name as argument"
  exit 1
fi
# Probelm 1:
# We cannot use compressed output, because generated file contains database name
# We need to change that database name from "$1" to "$2" before restoring - this is needed to do actual copy instead of just backup and restore on the same database.
# influx_inspect export -database "$1" -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/ -compress -out "$1.out"
# Problem 2:
# influx_inspect CAN ACCESS database bypasing credentials (of course [http] auth-enable = true is set in the config file and influxd daemon restarted !!!)
# Credentials works OK because influx commands doesn't allow access without credentials

influx_inspect export -database "$1" -datadir /var/lib/influxdb/data/ -waldir /var/lib/influxdb/wal/ -out "$1.out" || exit 1

# Problem 3
# Here we need to edit "$1.out" and change database name here from "$1" to "$2", this file is 1.4Gb.
# Of course I can call some kind of `sed` to do it, but still we need uncompressed file, because `sed`itting compressed file will take extra time to decompress and compress again.

influx -username "user_name" -password "password" -import -path "$1.out" || exit 1

rm -f "$1.out"
echo 'OK'
# Problem 4
# It takes 3m 40s, which is quite fast but still 2x slower than use:
# https://github.com/cncf/gha2db/blob/master/cmd/idb_backup/idb_backup.go

Generated output file looks like this:

# INFLUXDB EXPORT: 1677-09-21T00:12:43Z - 2262-04-11T23:47:16Z
# DDL
CREATE DATABASE temp WITH NAME autogen
# DML
# CONTEXT-DATABASE:temp
# CONTEXT-RETENTION-POLICY:autogen
# writing tsm data
# then comes the data

We need to edit it and replace "temp" with "prod" in this case.

Summing up there are serious problems:

  1. The generated output file contains database name, and instructions to create the database with given name. It would be better to support giving database name via influx -database <name_here> when restoring.
  2. influx_inspect is NOT using auth!! It can access database without username and password !! :(
  3. It is still slow, but this time is just about acceptable.

I'll still use my own approach that copies one InfluxDB into another 2x faster and uses auth:
https://github.com/cncf/gha2db/blob/master/cmd/idb_backup/idb_backup.go

To sum things up, I've tried 4 approaches:

1) I've tried the recommended way:
Backup + restore: https://docs.influxdata.com/influxdb/v1.3/administration/backup_and_restore/
But You can only backup and restore into the same database name, and restoring requires stopping Influx database server.
So this is not usable for our case.

2) I've tried to copy the entire database into another database using:
SELECT * INTO newdb..:MEASUREMENT FROM /.*/ GROUP BY *
This works, but it takes 1 hour and 30 minutes while generating the database from scratch takes 12-14 minutes.
Much much too slow.

3) Tried influx_inspect to dump database and influx to restore. It has the following problems:

  • Dump file contains database name, so needs to be edited to change this name, this also means we can not do compressed dump, because editing compressed file would require decompress and compress again.
  • influx_inspect bypases auth, so you can access the database without any auth.
  • It is quite fast at 3m 40s (3 times faster than regenerate database) but still 2x slower than approach (4)

4) Finally, I've written a tool in go that connects to one InfluxDB database then lists all series, then for each series it is copying its contents into another database. It uses multithreading and takes about 1 minute 50 seconds. This is currently the fastest way of duplicating InfluxDB IMHO: https://github.com/cncf/gha2db/blob/master/cmd/idb_backup/idb_backup.go

I mistakenly used influx restore to transfer one database to an online system. As a result, it overwrote the meta data, causing all data lost from that point of time. I think the doc should explicitly state the effects of influx restore.

+1

@aanthony1243 -- @silentred's mistake resulted in lost data. Is this still true? What needs to be documented about this scenario?

this is resolved in version 1.5. Users must indicate either the -online or -portable flag to prevent this scenario. Because the legacy, offline method remains for backward compatibility it would be an improvement to the docs to indicate that this method may result in data loss. Any new users/new implementations should use the -portable method.

existing users/implementations may find the -online flag useful for importing legacy data without and data loss.

Was this page helpful?
0 / 5 - 0 ratings