Victoriametrics: Best way to migrate big VictoriaMetrics instance from on prem cluster into the cloud

Created on 23 Sep 2020  路  9Comments  路  Source: VictoriaMetrics/VictoriaMetrics

Hi,

we are migrating our VictoriaMetrics instances from on-prem cluster into the cloud. Smaller ones are easy to migrate via using export and import API.

But encountered an issue with our big VM instance. It currently uses 2.4TB of data, and is running single server version.
Attempt to export data is not working. After tweaking search.maxUniqueTimeseries to 100,000,000, export of even 1 second of data is taking more than hour and it still working.
Here is export API:
curl --output central.jsonl.gz -H 'Accept-Encoding: gzip' https://victoria-metrics-instance/api/v1/export -d 'match[]={__name__!=""}&start=1600646400&end=1600646401'
After 1 hour and 20 min so far it only exported 4MB of data.

Instance itself is currently using 4cpu and 38GB of ram, and not close to any limit (running in kubernetes cluster).

VM instance in the cloud is running cluster version. So vmexport and vmimport is not compatible (according to docs).

Would like to know if there is a more efficient way to copy data over? Can we just copy over /storage data directly? I assume it is not going to work?

Any help will be greatly appreciated.

enhancement question

Most helpful comment

FYI, all the improvements mentioned above have been included in release v1.42.0.

All 9 comments

The best way to migrate big amounts of data between VictoriaMetrics instances is to export the data via /api/v1/export from one instance and then import it to another instance via /api/v1/import. This approach works for migrating data from single-node to cluster instance and vice versa. Note that cluster version of VictoriaMetrics has slightly different urls comparing to single-node version - see these docs for details.

As you noted above, this approach may work slowly or may use big amounts of RAM during export when match[] query arg matches all the time series, i.e. {__name__!=""}. The solution is to try splitting time series into multiple groups by certain label filters and export each such group individually. For instance, if the database contains a million of unique time series with label deployment_id with label values starting from random digit, then it is possible to split such time series into 10 groups by 100K time series per group with the following match[] filter: {deployment_id=~"0.*"}, {deployment_id=~"1.*"}, ... , {deployment_id=~"9.*"}. Then each such group can be migrated independently of each other. I'd suggest investigating the output of /api/v1/status/tsdb page in order to determine
labels, which could be used for even data splitting.

Can we just copy over /storage data directly? I assume it is not going to work?

Unfortunately this doesn't work, because cluster version and single-node version of VictoriaMetrics have different data formats - cluster version additionally stores accountID and projectID (aka tenants).

It is clear that the approach outlined above may be slow and awkward to execute. So let's re-purpose this issue to feature request on fast data migration between VictoriaMetrics instances.

FYI, the next release of VictoriaMetrics will provide optimized data migration path by exporting data via native protocol - see /api/v1/export/native docs for details. It should use lower amounts of RAM and CPU during export and it is up to 50x faster than the /api/v1/export. Additionally to this, the exported data in native format occupies up to 100x less storage space comparing to JSON line format.

Import performance has been also optimized. Previously there was a recommendation for running multiple parallel imports in order to improve import performance, since each import process could load only a single CPU core at VictoriaMetrics side. Now this restriction has been removed - VictoriaMetrics achieves the maximum available import performance with a single import process. And this performance should scale with the number of available CPU cores at VictoriaMetrics side.

It is possible to try these improvements by building VictoriaMetrics from sources:

@valyala ah, that will be amazing! Thank you for adding that. Currently exporting our largest Victoria (2.5TB) has been going for 3 days now... almost done, but it is painfully slow process. So that will be a much welcome improvement!

FYI, all the improvements mentioned above have been included in release v1.42.0.

@valyala thank you!
I am trying it out right now. So far I noticed that amount of data transferred is MUCH higher (but it is still seems faster than old way):

curl --retry 10 --retry-connrefused --retry-delay 30 --fail --output central-1600905600.jsonl.gz -H 'Accept-Encoding: gzip' https://victoria-metrics.hidden.com/api/v1/export -d 'match[]={__name__=~"bil.*"}&start=1590573600&end=1590577200&max_rows_per_line=100000' || exit 1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 29.2M    0 29.2M  100    84  4040k     11  0:00:07  0:00:07 --:--:-- 6613k

vs native:

curl --retry 10 --retry-connrefused --retry-delay 30 --fail --output central-1600905600.jsonl.gz -H 'Accept-Encoding: gzip' https://victoria-metrics.hidden.com/api/v1/export/native -d 'match[]={__name__=~"bil.*"}&start=1590562800&end=1590566400' || exit 1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  728M    0  728M  100    59  69.6M      5  0:00:11  0:00:10  0:00:01 71.5M

Both use gzip. Old way: 29.2MB, native: 728MB.

@valyala actually, I am wondering if native export is working correctly?
I am exporting one hour of data at a time. Previously, json gzip would export 200MB of data, new native export for same time window and same metrics selector is going now at 3TB of data and still going. That doesn't seem right, since native format supposed to be much more compact? Does it ignore start and end by any chance?

I actually had to switch back to old way of exportingimporting, since it generates much less data and is actually faster due to that.

VictoriaMetrics stores data for each time series in compressed blocks with up to 8K data points per block. Each block contains data points on a particular time range. Multiple blocks may have overlapping time ranges. Blocks that have at least a single data point with timestamps on the [start ... end] time range are exported as a whole via /api/v1/export/native without filtering data points outside the requested time range. This is needed in order to increase performance during data export - CPU time isn't spent on block unpacking / filtering / packing. The unpacking and filtering of data points outside the requested time range is performed during data import via /api/v1/import/native. VictoriaMetrics should verify correctness for imported data, so it has to unpack data during import in any case.

So it is likely you hit this case - the requested time range - one hour - matches many blocks with data points outside the time range. So /api/v1/export/native returned bigger amounts of data comparing to /api/v1/export on the given time range. Try increasing the time range to one month and see whether /api/v1/export/native returns smaller amounts of data comparing to /api/v1/export.

A few additional notes:

  • The size of exported data via /api/v1/export/native should be slightly bigger than the size of <-storageDataPath> directory when exporting all the data, since data is exported as is. If you export data from VictoriaMetrics cluster, then the size of exported data should be slightly bigger than the sum of all the <-storageDataPath> directories on all vmstorage nodes.
  • /api/v1/export now supports reduce_mem_usage=1 query arg, which can be used instead of max_rows_per_line query arg in order to reduce memory usage during exporting time series with big number of samples. In this case each line will contain up to 8K data points, i.e. data points from a single block.
  • There is no need in setting Accept-Encoding: gzip request header during data export via /api/v1/export/native, since the data in exported blocks is already compressed. The Accept-Encoding: gzip header may slow down data export.
  • vmctl recently gained support for data migration between VictoriaMetrics instaces via native export / import. See these docs for details.

Ah, that would explain it!
Thank you for detailed response and for adding native exportimport.
We have migrated our victorias successfully, so I am going to close this issue now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

WilliamDahlen picture WilliamDahlen  路  3Comments

ozn0417 picture ozn0417  路  3Comments

valyala picture valyala  路  4Comments

Serrvosky picture Serrvosky  路  3Comments

prdatur picture prdatur  路  3Comments