Victoriametrics: cluster data storage and replication

Created on 24 May 2019  路  4Comments  路  Source: VictoriaMetrics/VictoriaMetrics

Can someone clarify what the data writing looks like in cluster mode? Is data actually replicated in the cluster, or is it hashed and written to a specific vmstorage node, or just fanned out so can land on any vmstorage node, or some other setup?
When data is queried, is the query passed to all vmstorage nodes, or does the query layer know where the data lives so just asks the appropriate node?

question

Most helpful comment

Hello, there are 3 services in cluster version
vmstorage - persistence storage (statefull)
vmselect - read gateway (stateless)
vminsert - write gateway (stateless)
vminsert knows about all storage nodes and uses consistent hashes to choose one particular node and writes to it (hash from metric + label in sorted order), if node doesn't exist at the moment it writes to next one
vmselect - reads from all nodes and merge the result, if one of them are not reachable it marks result as partial

here a bit more info https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster#cluster-availability

All 4 comments

Hello, there are 3 services in cluster version
vmstorage - persistence storage (statefull)
vmselect - read gateway (stateless)
vminsert - write gateway (stateless)
vminsert knows about all storage nodes and uses consistent hashes to choose one particular node and writes to it (hash from metric + label in sorted order), if node doesn't exist at the moment it writes to next one
vmselect - reads from all nodes and merge the result, if one of them are not reachable it marks result as partial

here a bit more info https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster#cluster-availability

A few words about replication additionally to the info provided by @tenmozes :

We didn't come up with reliable yet simple replication scheme on VictoriaMetrics level, which could provide data safety and high availability in the event of storage loss. So vmstorage nodes rely on durable replicated disks such as Google Compute disks instead of implementing the replication itself.

The most straightforward approach for the replication on VictoriaMetrics level - just put N copies of data to different vmstorage nodes, where N is replication factor - requires complex and fragile data reshuffling scheme in order to restore the required replication factor for the data stored on broken disks. The automatic reshuffling may hurt cluster availability and performance due to increased usage of network, disk and CPU resources. Additionally, the reshuffling may fail on edge cases such as temporary unavailability of the network between vmstorage nodes. So we chose the simplest approach - to shift the data safety headache to durable disks.

It is possible to implement replication on the Prometheus level by running multiple VictoriaMetrics clusters in distinct availability zones and writing data in parallel to all these clusters. Then the data may be queried via promxy sitting in front of all the VictoriaMetrics clusters.

Related issue: #118

FYI, release v1.36.0 contains application-level replication support for cluster version of VictoriaMetrics. See more details about the replication here.

Closing this issue.

Was this page helpful?
0 / 5 - 0 ratings