Victoriametrics: RemoteWrite leads 5-6minutes Lag

Created on 15 Jan 2020  路  21Comments  路  Source: VictoriaMetrics/VictoriaMetrics

Describe the bug
We have been using VM for about 2 months, recently we found that the metrics are 5~6 minutes lag than Prometheus. I really need some advices to help solve that. Thanks

Version

$ ./victoria-metrics-prod --version
victoria-metrics-20191224-105655-tags-v1.31.3-0-g5cb8d977
Single Version

Additional context

Prometheus Configurations:

remote_write:
- url: http://xxx:8428/api/v1/write
  remote_timeout: 30s
  queue_config:
    capacity: 20000
    max_shards: 50
    min_shards: 1
    max_samples_per_send: 100000
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms

VM Configurations:

./victoria-metrics -storageDataPath=/data1/vm -retentionPeriod=3 -httpListenAddr=:8428

Some metrics of Prometheus

prometheus_remote_storage_pending_samples{queue="0:xxx} 200000

Some metrics of VM

vm_blocks{type="indexdb"} 223635
vm_blocks{type="storage/big"} 86686550
vm_blocks{type="storage/small"} 50965739
vm_merges_total{type="indexdb"} 417663
vm_merges_total{type="storage/big"} 28
vm_merges_total{type="storage/small"} 1180652
vm_metric_rows_skipped_total{name="vmselect"} 13772448590
vm_missing_metric_names_for_metric_id_total 0
vm_missing_tsids_for_metric_id_total 0
vm_new_timeseries_created_total 7585641
vm_nfs_dir_remove_failed_attempts_total 0
vm_opentsdb_request_errors_total{name="write", net="tcp"} 0
vm_opentsdb_request_errors_total{name="write", net="udp"} 0
vm_opentsdb_requests_total{name="write", net="tcp"} 0
vm_opentsdb_requests_total{name="write", net="udp"} 0
vm_parts{type="indexdb"} 21
vm_parts{type="storage/big"} 7
vm_parts{type="storage/small"} 27
vm_pending_rows{type="indexdb"} 0
vm_pending_rows{type="storage"} 199930

Prometheus screenshot.

image

Vm screenshot (By Promxy, 2 VM nodes are delayed, we have using http curl to ensure that)

image

bug question

Most helpful comment

After the configuration tunning, the lag is 1~2 minutes now.
image

All 21 comments

VM monitor screenshots

image

Hi @sundy-li!
VM doesn't look like overloaded, but could you provide screenshots of concurrent inserts and tcp connections panels?
I am wondering what version of Prometheus do you use? Can you also check the following metrics in Prometheus(the graph screenshots are preferable):

  • prometheus_remote_storage_failed_samples_total
  • prometheus_remote_storage_enqueue_retries_total
  • prometheus_remote_storage_dropped_samples_total
  • prometheus_remote_storage_shards
  • histogram_quantile(0.9, rate(prometheus_remote_storage_sent_batch_duration_seconds_bucket[5m]))

@hagen1778 Thanks for your reply.

Here is the screenshots.

image

image

I also think that VM not overloaded.

32Core 128G Top

image

Prometheus version


Version: 2.14.0
GIT log: edeb7a44cbf745f1d8be4ea6f215e79e651bfe19

Prometheus Metrics (Restarts at 2 weeks ago) from /metrics api.

prometheus_remote_storage_failed_samples_total{queue="0:xxx"} 1.4765e+09
prometheus_remote_storage_enqueue_retries_total{queue="0:xxx"} 547
prometheus_remote_storage_dropped_samples_total{queue="0:xxx"} 270
prometheus_remote_storage_shards{queue="0:xxx"} 1
histogram_quantile(0.9, rate(prometheus_remote_storage_sent_batch_duration_seconds_bucket{instance="xxx:9090"}[5m]))
 0.24077551020408164

Looks like prometheus_remote_storage_failed_samples_total is very high, yet I did not notice data lost through the Graph series, just the latest metrics are lagged.

BTW, both VMs are lagged, we use 2 Prometheus to 2 VMs.

@sundy-li , could you verify whether the lag persists when querying VictoriaMetrics directly without Promxy?

Also could you compare local time on hosts that run Prometheus and VictoriaMetrics?

@valyala

could you verify whether the lag persists when querying VictoriaMetrics directly without Promxy?

Yes, as I mentioned on the first comment I have tested it with both curl vm:8428 and grafana dashboard , it's not the problem of Promxy. Both VMs are lagged.

Also could you compare local time on hosts that run Prometheus and VictoriaMetrics?

But they are running on the same node.

cat /etc/timezone
Asia/Hong_Kong

```

I am sorry, I just verfiy it again, not both VM are lagged now (But I am sure they are lagged before).

Let's look at this screenshots, I have created 5 datasources on grafana

Prometheues1_1  which write to  VM1_1   same node deployed
Prometheues1_2 which  write to VM1_2  same node deployed

Proxmy which reads from VM1_1 and VM1_2
Prometheues1_1 and Prometheues1_2 are with same scrape configs

image

VM1 are dropping messages, here are some errors on VM1

2020-01-15T11:04:07.388Z    info    VictoriaMetrics@/lib/storage/partition.go:1170  merged 208693687 rows in 21.485584531s at 9713195 rows/sec to "/data1/vm/data/small/2020_01/208693687_4828214_20200115094501.209_20200115105731.790_15E59A35E24FBB32"; sizeBytes: 374528191
2020-01-15T11:04:17.273Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes
2020-01-15T11:04:17.491Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes
2020-01-15T11:04:18.963Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes
2020-01-15T11:04:19.190Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes
2020-01-15T11:04:25.060Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes
2020-01-15T11:04:34.106Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes
2020-01-15T11:04:34.339Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes
2020-01-15T11:04:39.476Z    error   VictoriaMetrics@/lib/httpserver/httpserver.go:421   error in "/api/v1/write": cannot read prompb.WriteRequest: too big unpacked request; musn't exceed 33554432 bytes


I restart VM1 with maxInsertRequestSize=134217728, now error logs not show, but the lag is still there.

image

Update:

I restart both Prometheus with new configs 3 hours ago.

  queue_config:
    capacity: 20000
    max_shards: 50
    min_shards: 1
    max_samples_per_send: 100000
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms

Now the lag of VM1 is 2 minutes, the lag of VM2 if 7 minutes, so weird.

image

Try reducing max_samples_per_send from 100000 to 10000 in Prometheus configs. This should decrease possible lag on Prometheus side by reducing the maximum number of pending data points in the queue before they are sent to VictoriaMetrics.
Then also try reducing max_shards from 50 to 30.

As an additional benefit, lowering both max_samples_per_send and max_shards could reduce Prometheus memory usage - see https://prometheus.io/docs/practices/remote_write/ .

prometheus_remote_storage_failed_samples_total{queue="0:xxx"} 1.4765e+09

Could you show a graph for increase(prometheus_remote_storage_failed_samples_total[1h]) over the last 24 hours? This should show whether Prometheus fails sending data to VictoriaMetrics and whether this correlates with errors in VictoriaMetrics or Prometheus logs.

Note: I restarted Prometheus yesterday to test some configs.

image

This graph shows that Prometheus couldn't write too big chunks of data to VictoriaMetrics, since the size of these chunks exceeded -maxInsertRequestSize value. I'd recommend returning -maxInsertRequestSize to default value after reducing max_samples_per_send from 100K to 10K. Lower -maxInsertRequestSize should reduce memory usage on VictoriaMetrics side.

Ok, I will test that, now there are not any error logs in VMs and Prometheus. I found Prometheus restart took more than 40 minutes, that's really bad for tuning configs. But VM restart just needs some seconds 馃憤

$ sudo du -sh /data1/prometheus/
482G    /data1/prometheus/
$ sudo du -sh /data1/vm
228G    /data1/vm

I found Prometheus restart took more than 40 minutes, that's really bad for tuning configs.

Prometheus configs can be applied without restart by sending SIGHUP signal to Prometheus - see https://www.robustperception.io/reloading-prometheus-configuration .

Prometheus restarts slowly if it has been shut down uncleanly. Clean shutdown for Prometheus is via SIGTERM signal - see https://www.robustperception.io/shutting-down-prometheus . All the other approaches for Prometheus shutdown would lead to recovery from WAL on the next start. This recovery can take too long time for big WAL directory. If you can afford losing recent data, then remove the WAL directory contents before starting Prometheus. See these docs about WAL directory.

@valyala Really thanks for your detail advice. I will try that.

After the configuration tunning, the lag is 1~2 minutes now.
image

In theory the lag shouldn't exceed 5 seconds from the configured batch_send_deadline value in Prometheus config. Needs additional investigation...

Same question. I set prometheus remote write configure as this

  queue_config:
    capacity: 20000
    max_shards: 30
    min_shards: 1
    max_samples_per_send: 10000
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms

After reload Prometheus, 20-30 minutes lag reduced to 1-2 minutes. But still exceed 5 seconds.

Prometheus version: 2.13.1
Victoria Metrics version: 1.33.0-cluster

I upgraded the VM version from 1.33.0-cluster to 1.34.2-cluster, and something strange happened.
I found that the old metrics query form vmselect still lag 1-2minutes, but some new metrics do not.
I guess that lag is due to the vmstorage cache?
image

Thanks for the additional info, @Lnek ! I still have no idea for the root cause of the lag.

I guess that lag is due to the vmstorage cache?

The maximum lag between data ingestion to vminsert and data availability via vmselect querying shouldn't exceed a few seconds, which should be enough to propagate the ingested data through various in-memory buffers to persistent storage at vmstorage

vmselect has a command-line flag -search.latencyOffset, which is set to 30 seconds by default. This flag value must be bigger than the maximum lag induced by Prometheus or any other data source, otherwise incomplete data may be returned from queries.

I'd recommend trying to substitute Prometheus with vmagent and see whether this reduces the lag.

Update.
I keep watching the lags, and find the problem may be the high load of Prometheus.

Now, node1 has 3-hour lags, and node2 is normal.
image

Node1 with lags

prometheus_remote_storage_pending_samples{queue="0:http://vm1:8428/api/v1/write"} 1.4241501071e+10

Node2 without lags
prometheus_remote_storage_pending_samples{queue="0:http://vm2:8428/api/v1/write"} 20000

Node1 and Node2 and with the same exporters, data and configurations, but the load is different.

image

So I think the problem is Prometheus not VM, I will try to substitute Prometheus with vmagent .

Thanks for the update! It is strange that identically configured Prometheus instances have such different CPU load.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

genericgithubuser picture genericgithubuser  路  4Comments

EricAntoni picture EricAntoni  路  3Comments

pmitra43 picture pmitra43  路  3Comments

oOHenry picture oOHenry  路  4Comments

dima-vm picture dima-vm  路  3Comments