Many users ask for a tool for migrating historical data from Prometheus to VictoriaMetrics. There is no such tool yet, though VictoriaMetrics supports data back-filling, i.e. ingesting historical data.
To create a tool for historical data migration from Prometheus to VictoriaMetrics. This tool can be based on Prometheus' tsdb package. Basic algorithm for the tool is quite simple:
For each tsdb block in Prometheus storage do:
put protocol.The data migration may be performed in parallel for multiple Prometheus tsdb blocks in order to reduce time required for historical data upload.
There are the following projects, which could be useful when creating the data migration tool:
We need to create such tool for InfluxDB also
i will try to make a python script for influxdb 1.xx
@laurentL thank you so much
We need to create such tool for InfluxDB also
This tool can be based on influx_inspect tool, which can export data from InfluxDB into Influx line protocol format, which can be uploaded to VictoriaMetrics.
@laurentL Did you finally create the influx 2 vm script?
I create a tools to import from influx2 to victoria. It work for one by one serie, but it fail in bulk insert without log in victoria
I was just looking into this by using the inspector to create a dump an then just replaying the metrics back into VM.
Did publish your tool?
Hello Guys,
Using existing tools (influx_inspect, telegraf) we can migrate data from InfluxDB to VM.
The main downside of this approach is that the telegraf inputs.tail module does not support reading compressed files, so this can require lots of space.
Export data:
influx_inspect export -datadir /opt/influxdb/data -waldir /opt/influxdb/wal -database xyz -compress -out /tmp/xyz.dat
Import data:
./telegraf -config telegraf.conf
# telegraf.conf
[agent]
hostname = ""
omit_hostname = true
debug = false
quiet = false
[[inputs.tail]]
files = ["/tmp/xyz.dat"]
from_beginning = true
pipe = false
data_format = "influx"
[[outputs.influxdb]]
urls = ["http://localhost:8428"]
database = "xyz"
skip_database_creation = true
# telegraf.conf
Compressed files can be read by Telegraf from named pipes:
influx_inspect export -datadir /opt/influxdb/data -waldir /opt/influxdb/wal -database xyz -compress -out /tmp/xyz.dat.gz
mkfifo /tmp/xyz.dat
zcat /tmp/xyz.dat.gz > /tmp/xyz.dat &
This will create compressed file xyz.dat.gz and uncompress it to xyz.dat on demand when Telegraf reads it.
Note that pipe = true must be passed to [[inputs.tail]] in this case - see inputs.tail docs for details.
Great addition!
For the record, I just gave this import process a test drive with just one month (51GB of gz metrics) and telegraf did not stop getting errors.
Is there any options that could be set on the VM side to postpone indexing or overs costly operations?
Update: I have upgraded the server specs that hosts VM and I no longer see the errors. but I think there still way for optimization.
I have been pushing the same 1 month dataset and it's still running for the past 24hs.
telegraf output
2019-12-20T12:00:33Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2019-12-20T12:01:03Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2019-12-20T12:01:21Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:01:21Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:01:27Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:01:28Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:01:36Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting header
s)
2019-12-20T12:01:37Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:01:42Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:01:42Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:01:48Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:01:48Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:01:54Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:01:54Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:02:01Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:02:01Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:02:06Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:02:06Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:02:12Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:02:12Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:02:18Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:02:19Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:02:25Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:02:26Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:02:38Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:02:38Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:03:33Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2019-12-20T12:04:03Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2019-12-20T12:04:33Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2019-12-20T12:05:03Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2019-12-20T12:05:18Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:05:18Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:05:32Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:05:32Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:06:05Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:06:05Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:06:11Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:06:11Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:06:16Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:06:17Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:06:23Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:06:23Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:06:29Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:06:29Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:07:04Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval
2019-12-20T12:07:25Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:07:25Z E! [agent] Error writing to outputs.influxdb: could not write any address
2019-12-20T12:07:30Z E! [outputs.influxdb] When writing to [http://localhost:8428]: Post http://localhost:8428/write?db=nxms-emea: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-12-20T12:07:31Z E! [agent] Error writing to outputs.influxdb: could not write any address
VictoriaMetrics output
2019-12-20T11:35:30.081Z info VictoriaMetrics@/lib/mergeset/table.go:156 opening table "/storage/indexdb/15E21128FE454190"...
2019-12-20T11:35:30.090Z info VictoriaMetrics@/lib/mergeset/table.go:190 table "/storage/indexdb/15E21128FE454190" has been opened in 8.580727ms; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 0
2019-12-20T11:35:30.091Z info VictoriaMetrics@/lib/mergeset/table.go:156 opening table "/storage/indexdb/15E21128FE45418F"...
2019-12-20T11:35:30.096Z info VictoriaMetrics@/lib/mergeset/table.go:190 table "/storage/indexdb/15E21128FE45418F" has been opened in 5.05554ms; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 0
2019-12-20T11:35:30.099Z info VictoriaMetrics@/app/vmstorage/main.go:65 successfully opened storage "/storage" in 32.445419ms; partsCount: 0; blocksCount: 0; rowsCount: 0; sizeBytes: 0
2019-12-20T11:35:30.101Z info VictoriaMetrics@/app/vmselect/promql/rollup_result_cache.go:50 loading rollupResult cache from "/storage/cache/rollupResult"...
2019-12-20T11:35:30.102Z info VictoriaMetrics@/app/vmselect/promql/rollup_result_cache.go:76 loaded rollupResult cache from "/storage/cache/rollupResult" in 1.075233ms; entriesCount: 0, sizeBytes: 0
2019-12-20T11:35:30.102Z info VictoriaMetrics@/app/victoria-metrics/main.go:31 started VictoriaMetrics in 35.553445ms
2019-12-20T11:35:30.102Z info VictoriaMetrics@/lib/httpserver/httpserver.go:63 starting http server at http://:8428/
2019-12-20T11:35:30.102Z info VictoriaMetrics@/lib/httpserver/httpserver.go:64 pprof handlers are exposed at http://:8428/debug/pprof/
2019-12-20T11:37:02.861Z info VictoriaMetrics@/lib/storage/partition.go:197 creating a partition "2019_10" with smallPartsPath="/storage/data/small/2019_10", bigPartsPath="/storage/data/big/2019_10"
2019-12-20T11:37:02.867Z info VictoriaMetrics@/lib/storage/partition.go:212 partition "2019_10" has been created
2019-12-20T12:01:37.433Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 6283 rows in 14.052773665s at 447 rows/sec to "/storage/data/small/2019_10/6283_5_20191007000412.000_20191013235551.000_15E211488EFD3E0A";
sizeBytes: 12686
2019-12-20T12:01:37.374Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 16 items in 13.507316393s at 1 items/sec to "/storage/indexdb/15E21128FE454190/16_1_15E21132F509D60F"; sizeBytes: 365
2019-12-20T12:01:53.987Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 196 items in 15.671451167s at 12 items/sec to "/storage/indexdb/15E21128FE454190/141_1_15E21132F509D613"; sizeBytes: 1197
2019-12-20T12:02:06.801Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 313 items in 11.936588355s at 26 items/sec to "/storage/indexdb/15E21128FE454190/172_1_15E21132F509D616"; sizeBytes: 1527
2019-12-20T12:02:07.159Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 387425 rows in 41.395829362s at 9359 rows/sec to "/storage/data/small/2019_10/387425_320_20191007000018.000_20191013235943.000_15E211488EF
D3E0C"; sizeBytes: 582178
2019-12-20T12:02:30.179Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 116 items in 10.299435997s at 11 items/sec to "/storage/indexdb/15E21128FE454190/75_1_15E21132F509D61A"; sizeBytes: 1118
2019-12-20T12:02:30.701Z error VictoriaMetrics@/lib/httpserver/httpserver.go:421 error in "/write": cannot read influx line protocol data: unexpected EOF
2019-12-20T12:02:31.680Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 6556 rows in 19.98660947s at 328 rows/sec to "/storage/data/small/2019_10/6556_9_20191007000410.000_20191013235550.000_15E211488EFD3E11";
sizeBytes: 13330
2019-12-20T12:02:33.906Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 77456 items in 1m9.765959175s at 1110 items/sec to "/storage/indexdb/15E21128FE454190/69143_151_15E21132F509D611"; sizeBytes: 1109744
2019-12-20T12:02:40.423Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 267522 items in 1m3.672830833s at 4201 items/sec to "/storage/indexdb/15E21128FE454190/249200_518_15E21132F509D612"; sizeBytes: 4570140
2019-12-20T12:06:17.919Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 32443 rows in 11.386647352s at 2849 rows/sec to "/storage/data/small/2019_10/32443_33_20191007000010.000_20191013235729.000_15E211488EFD3E
45"; sizeBytes: 46096
2019-12-20T12:06:25.589Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 506413 rows in 10.914599341s at 46397 rows/sec to "/storage/data/small/2019_10/506413_511_20191007000001.000_20191013235939.000_15E211488E
FD3E47"; sizeBytes: 812783
2019-12-20T12:09:00.778Z error VictoriaMetrics@/lib/httpserver/httpserver.go:421 error in "/write": cannot read influx line protocol data: unexpected EOF
2019-12-20T12:09:05.533Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 1138 items in 15.06178505s at 75 items/sec to "/storage/indexdb/15E21128FE454190/885_2_15E21132F509D74E"; sizeBytes: 7202
2019-12-20T12:09:09.942Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 423 items in 19.828404195s at 21 items/sec to "/storage/indexdb/15E21128FE454190/185_1_15E21132F509D750"; sizeBytes: 1536
2019-12-20T12:09:21.891Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 66649997 rows in 1m53.009869324s at 589771 rows/sec to "/storage/data/small/2019_10/66649997_29059_20191007000000.000_20191013235959.000_1
5E211488EFD3E59"; sizeBytes: 66599756
2019-12-20T12:10:25.784Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 18 items in 12.905976031s at 1 items/sec to "/storage/indexdb/15E21128FE454190/18_1_15E21132F509D78E"; sizeBytes: 840
2019-12-20T12:10:36.843Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 2110 rows in 20.104032658s at 104 rows/sec to "/storage/data/small/2019_10/2110_5_20191007000056.000_20191013235642.000_15E211488EFD3E7E";
sizeBytes: 3800
2019-12-20T12:10:38.412Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 50 items in 11.822240944s at 4 items/sec to "/storage/indexdb/15E21128FE454190/50_1_15E21132F509D791"; sizeBytes: 957
2019-12-20T12:10:39.711Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 33177 rows in 24.626717823s at 1347 rows/sec to "/storage/data/small/2019_10/33177_32_20191007000055.000_20191013235642.000_15E211488EFD3E
7D"; sizeBytes: 49624
2019-12-20T12:10:40.215Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 666 items in 27.385345649s at 24 items/sec to "/storage/indexdb/15E21128FE454190/357_1_15E21132F509D78F"; sizeBytes: 2681
2019-12-20T12:10:57.076Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 34 items in 23.188011169s at 1 items/sec to "/storage/indexdb/15E21128FE454190/34_1_15E21132F509D792"; sizeBytes: 916
2019-12-20T12:10:57.298Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 215 rows in 14.527440153s at 14 rows/sec to "/storage/data/small/2019_10/215_1_20191012153229.000_20191013083718.000_15E211488EFD3E80"; si
zeBytes: 518
2019-12-20T12:10:58.057Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 82 items in 16.632036697s at 4 items/sec to "/storage/indexdb/15E21128FE454190/55_1_15E21132F509D793"; sizeBytes: 988
2019-12-20T12:10:58.588Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 3977 items in 44.63399003s at 89 items/sec to "/storage/indexdb/15E21128FE454190/3901_7_15E21132F509D790"; sizeBytes: 28152
2019-12-20T12:11:22.476Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 114 items in 13.528123424s at 8 items/sec to "/storage/indexdb/15E21128FE454190/114_1_15E21132F509D798"; sizeBytes: 1028
2019-12-20T12:11:39.911Z info VictoriaMetrics@/lib/mergeset/table.go:856 merged 369 items in 25.843964394s at 14 items/sec to "/storage/indexdb/15E21128FE454190/188_1_15E21132F509D79A"; sizeBytes: 1671
2019-12-20T12:11:44.133Z info VictoriaMetrics@/lib/storage/partition.go:1170 merged 7954 rows in 36.328990922s at 218 rows/sec to "/storage/data/small/2019_10/7954_10_20191007000056.000_20191013235643.000_15E211488EFD3E82"
; sizeBytes: 12488
I'd suggest trying the following steps:
1) Update VictoriaMetrics to v1.31.2 - it contains patches for improving bulk import performance - 97f70ccda79668e955645befb581a2922370131c and 1825893eef39f36e9ee8bde2980b046ce8f9f628
2) Increase metric_batch_size and metric_buffer_limit values in Telegraf config in order to reduce overhead on data transfer to VictoriaMetrics.
3) Increase timeout value in outputs.influxdb config for Telegraf.
4) Duplicate url values inside outputs.influxdb config, so multiple Telegraf workers could send data to VictoriaMetrics.
Another option is to stream gzipped file directly to VictoriaMetrics with curl:
curl -X POST -H 'Content-Encoding: gzip' http://localhost:8428/write -T /tmp/xyz.dat.gz
The performance could be improved further by splitting the file into multiple chunks and importing all these chunks in parallel into VictoriaMetrics. Note that chunks must be split on newlines. Something like the following should work:
zcat /tmp/xyz.dat.gz | split -u -n r/8 --filter='gzip > $FILE.dat.gz' - /tmp/xyz_part_
Then the resulting /tmp/xyz_part_*.dat.gz files must be written to VictoriaMetrics in parallel using curl as shown above.
@valyala Just to confirm this new method is way more resource safe and time optimized.
FYI: I've developed a tool to migrate TSDB data from Prometheus to VictoriaMetrics
https://github.com/ryotarai/prometheus-tsdb-dump#how-to-import-tsdb-data-from-prometheus-to-victoriametrics
This, prometheus-tsdb-dump, reads a Prometheus TSDB block and writes data in a format for VictoriaMetrics' /api/v1/import.
I have just finished composing a small bash script to export InfluxDB metrics following all the above optimizations:
#!/bin/bash
# apk add coreutils
split_files=10
months_back=1
# Skip current month
months_back=$(($months_back + 1))
mkdir -p /opt/influxdb/data/export
for m in $(seq 2 $months_back); do
month=`date --date="$(date +'%Y-%m-01') - ${m} month" +'%Y-%m'`
month_start=`date --date="$(date +'%Y-%m-01') - ${m} month" +'%Y-%m-%dT%H:%M:%SZ'`
month_end=`date --date="$(date +'%Y-%m-01') - $(( ${m} - 1 )) month - 1 second" +'%Y-%m-%dT%H:%M:%SZ'`
echo "#$m $month / $month_start -> $month_end"
time influx_inspect export -datadir /opt/influxdb/data/data -waldir /opt/influxdb/data/wal -database nxms -start ${month_start} -end ${month_end} -out /dev/stdout | split -u -n r/$split_files --filter='gzip > $FILE.dat.gz' - /opt/influxdb/data/export/${month}_part_
done
After all the metrics have been exported we now can parallelly send the metrics to VM
curl -X POST -H 'Content-Encoding: gzip' http://localhost:8428/write -T ${month}_part_xx.dat.gz
@ryotarai , I mentioned about https://github.com/ryotarai/prometheus-tsdb-dump in README.md - see cd66d3fc43a10a2d7c1af37c52ef73d3fc2896b6 .
@valyala
Question, I am trying to test this out with the cluster version but I am getting and error:
unsupported path requested: "/insert/0/influx"
What would be the write url in the cluster version? I am using http://host:8480/insert/0/influx
@syepes , Influx write url for cluster version should be http://<any_vminsert_host>:8480/insert/0/influx/write . See https://github.com/VictoriaMetrics/VictoriaMetrics/blob/cluster/README.md#url-format for more details.
Pls see also https://github.com/VictoriaMetrics/vmctl#migrating-data-from-prometheus
Most helpful comment
FYI: I've developed a tool to migrate TSDB data from Prometheus to VictoriaMetrics
https://github.com/ryotarai/prometheus-tsdb-dump#how-to-import-tsdb-data-from-prometheus-to-victoriametrics
This, prometheus-tsdb-dump, reads a Prometheus TSDB block and writes data in a format for VictoriaMetrics'
/api/v1/import.