Victoriametrics: Lost data when all vmstorage nodes are unavailable

Created on 13 Nov 2020  路  4Comments  路  Source: VictoriaMetrics/VictoriaMetrics

Describe the bug
I setup a victoria-metrics cluster and setup is like:

vmagent -> vminsert -> vmstorage -> vmselect

stop all vmstorage and vminsert start showing the following logs:

error when processing imported data: cannot send 293923 bytes to storageNode "10.65.234.19:9011": 2250
rows dropped because the current vsmtorage is unavailable and all the vmstorage nodes are unavailable and reroutedBR has no enough space for storing 293923 bytes; 
only 32924 free bytes left out of 314572800 bytes in reroutedBR

tcpdump shows vminsert is returning http 204 instead of http 503

sudo tcpdump -i any src host 10.65.248.58 and src port 8480 -Anlps0 | grep HTTP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
vHTTP/1.0 204 No Content
vHTTP/1.0 204 No Content
vHTTP/1.0 204 No Content
vHTTP/1.0 204 No Content
vHTTP/1.0 204 No Content
vHTTP/1.0 204 No Content
wHTTP/1.0 204 No Content
wHTTP/1.0 204 No Content
wHTTP/1.0 204 No Content

I go through the code, and find the err is drop somehow because the unmarshal work is done asynchronously

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/f7a6ae3d1173d2c7851abaf5bb440d8dfe5031de/lib/protoparser/promremotewrite/streamparser.go#L141-L147

To Reproduce
simply stop all vmstorage

Expected behavior
vminsert returns http 503 and data start pending on vmagent

Version
The line returned when passing --version command line flag to binary. For example:

vminsert-prod --version
vminsert-20201013-142118-tags-v1.44.0-cluster-0-g4727bad12
bug

All 4 comments

@waldoweng , could you build vminsert from commit 22c1e292844b340125c5b6fd84bca1515030c3b5 and verify whether it properly returns 503 status code to vmagent when all the vmstorage nodes are unavailable?

The httpserver.ErrorWithStatusCode should be already returned when vminsert cannot send data to vmstorage - see https://github.com/VictoriaMetrics/VictoriaMetrics/blob/882e2e2099c7e1cf7919d0622b03dc3fb500ddb6/app/vminsert/netstorage/insert_ctx.go#L45-L50

I hope the issue should be fixed by the commit 882e2e2099c7e1cf7919d0622b03dc3fb500ddb6 . @waldoweng , could you build vminsert from this commit and verify this? Note that vminsert may return 204 HTTP status code when there is enough space in in-memory buffers, which are used for sending data to vmstorage nodes in batches. vminsert must start returning 503 HTTP status code when all these buffers are full.

The issue was resolved both by commit 22c1e29 and 882e2e2.

I did not wait until the in-memory buffer is full when i check commit 22c1e29, sorry for the misleading information.

All the commits mentioned above have been included in v1.47.0. Closing the issue as fixed.

Was this page helpful?
0 / 5 - 0 ratings