Victoriametrics: 【too big data size】vmstorage can't process specific query

Created on 12 Feb 2020 · 6Comments · Source: VictoriaMetrics/VictoriaMetrics

Describe the bug

We have run victoria metrics(cluster mode) on GKE for 5 months.
Nowadays, we have seen the error which occurs only for certain queries.

Queries are here:

sum(stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count{forwarding_rule_name=~"k8s.*prd[0-9]+.*",response_code="502"}) / 60

sum(stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count{forwarding_rule_name=~"k8s-fws-haproxy-.*",response_code="503"}) / 60



md5-7182f00296467c00c946a24de851c02c



{"status":"error","errorType":"422",
"error":"cannot execute \"sum(stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count{forwarding_rule_name=~\\\"k8s.*prd[0-9]+.*\\\",response_code=\\\"502\\\"}) / 60\": 
error occured during search: cannot perform search on vmstorage production-misc-0-victoria-metrics-v000-vmstorage-0.production-misc-0-victoria-metrics-v000-vmstorage.monitoring.svc.cluster.local:8400: 
cannot execute rpcName=\"search_v3\" on vmstorage \"10.150.67.34:8400\" with timeout 30s: cannot read error message: too big data size: 395841; it mustn\u0027t exceed 65536 bytes"}



md5-e16e2fe446b0f0f09a1345676f866767



# Running on GKE v1.13.11-gke.14
victoriametrics/vminsert:v1.26.0-cluster
victoriametrics/vmselect:v1.26.0-cluster
victoriametrics/vmstorage:v1.26.0-cluster



md5-c9986b59ee3639a1c5fa4fcfc14c3001



# vmstorage
flag{name="http.disableResponseCompression", value="false"} 1
flag{name="httpListenAddr", value=":8482"} 1
flag{name="loggerLevel", value="INFO"} 1
flag{name="memory.allowedPercent", value="60"} 1
flag{name="precisionBits", value="64"} 1
flag{name="retentionPeriod", value="24"} 1
flag{name="rpc.disableCompression", value="false"} 1
flag{name="search.maxTagKeys", value="secret"} 1
flag{name="search.maxTagValues", value="10000"} 1
flag{name="search.maxUniqueTimeseries", value="1000000"} 1
flag{name="snapshotAuthKey", value="secret"} 1
flag{name="storageDataPath", value="/storage"} 1
flag{name="version", value="false"} 1
flag{name="vminsertAddr", value=":8401"} 1
flag{name="vmselectAddr", value=":8400"} 1

Additional context

I think the error has occured here:

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/e9db22a551268cfbb2836946f7b756e66c35495e/app/vmstorage/transport/server.go#L475-L481

https://github.com/VictoriaMetrics/VictoriaMetrics/blob/e9db22a551268cfbb2836946f7b756e66c35495e/app/vmstorage/transport/server.go#L382-L384

vmstorage has const value const maxTagFiltersSize = 64 * 1024.
So, we can not change this value.

Do you have any workarounds or solutions?

If above information isn't enough, please let me know.
Thank for your help.

bug

Source

govargo

Most helpful comment

It is likely the original issue is already fixed in v1.33.*

valyala on 13 Feb 2020

🎉2

All 6 comments

It looks like vmstorage tries sending too big error message to vmselect. I added handling for this case in the commit afecb3449159a5a2256b7b7ace594f39e0ae677e. This commit is located far from v1.26.0-cluster , i.e. it will go into v1.33.1-cluster, so I backported the commit to v1.26.1-cluster. You can build VictoriaMetrics cluster components from this tag in order to obtain the original error message, which vmstorage failed to send to vmselect. Build steps for building Docker images for v1.26.1-cluster are below:

git clone https://github.com/VictoriaMetrics/VictoriaMetrics
cd VictoriaMetrics
git checkout v1.26.1-cluster
make package

This should build the following local docker images:

victoriametrics/vminsert:v1.26.1-cluster
victoriametrics/vmselect:v1.26.1-cluster
victoriametrics/vmstorage:v1.26.1-cluster

Another option is to wait for the v1.33.1-cluster release and upgrade to it.

valyala on 13 Feb 2020

It is likely the original issue is already fixed in v1.33.*

valyala on 13 Feb 2020

🎉2

The commit that limits the maximum error message size, which can be sent from vmstorage to vminsert, has been included into v1.33.1 release. @govargo , could you check either v1.26.1 as described above or try v1.33.1?

valyala on 13 Feb 2020

Thank you very much!
I'll check it.

【Added】
We plan to upgrade victoria metrics version upgrade on March.
I'm sorry, until then please keep Issue open.

govargo on 14 Feb 2020

I published docker images for v1.26.1-cluster, so you can upgrade to this release in order to detect the original error message.

valyala on 14 Feb 2020

We update to the latest version(1.33.1-cluster).

And now we get no error with this issue.
Thank you!!

If the issue occur, we will open the other issue.

govargo on 17 Feb 2020

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Kubernetes: second storage unit stuck in Unready Status

Serrvosky · 3Comments

Add /graph page for PromQL debugging

valyala · 4Comments

rate() on long-running counter doesn't ignore the first reading, causing a huge spike

localpref · 3Comments

Errors in logs ` error app/vmselect/main.go:149 error `

isality · 3Comments

Add ability to set small retention periods (starting from a day)

valyala · 4Comments