Describe the bug
We have run victoria metrics(cluster mode) on GKE for 5 months.
Nowadays, we have seen the error which occurs only for certain queries.
Queries are here:
sum(stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count{forwarding_rule_name=~"k8s.*prd[0-9]+.*",response_code="502"}) / 60
sum(stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count{forwarding_rule_name=~"k8s-fws-haproxy-.*",response_code="503"}) / 60
md5-7182f00296467c00c946a24de851c02c
{"status":"error","errorType":"422",
"error":"cannot execute \"sum(stackdriver_https_lb_rule_loadbalancing_googleapis_com_https_request_count{forwarding_rule_name=~\\\"k8s.*prd[0-9]+.*\\\",response_code=\\\"502\\\"}) / 60\":
error occured during search: cannot perform search on vmstorage production-misc-0-victoria-metrics-v000-vmstorage-0.production-misc-0-victoria-metrics-v000-vmstorage.monitoring.svc.cluster.local:8400:
cannot execute rpcName=\"search_v3\" on vmstorage \"10.150.67.34:8400\" with timeout 30s: cannot read error message: too big data size: 395841; it mustn\u0027t exceed 65536 bytes"}
md5-e16e2fe446b0f0f09a1345676f866767
# Running on GKE v1.13.11-gke.14
victoriametrics/vminsert:v1.26.0-cluster
victoriametrics/vmselect:v1.26.0-cluster
victoriametrics/vmstorage:v1.26.0-cluster
md5-c9986b59ee3639a1c5fa4fcfc14c3001
# vmstorage
flag{name="http.disableResponseCompression", value="false"} 1
flag{name="httpListenAddr", value=":8482"} 1
flag{name="loggerLevel", value="INFO"} 1
flag{name="memory.allowedPercent", value="60"} 1
flag{name="precisionBits", value="64"} 1
flag{name="retentionPeriod", value="24"} 1
flag{name="rpc.disableCompression", value="false"} 1
flag{name="search.maxTagKeys", value="secret"} 1
flag{name="search.maxTagValues", value="10000"} 1
flag{name="search.maxUniqueTimeseries", value="1000000"} 1
flag{name="snapshotAuthKey", value="secret"} 1
flag{name="storageDataPath", value="/storage"} 1
flag{name="version", value="false"} 1
flag{name="vminsertAddr", value=":8401"} 1
flag{name="vmselectAddr", value=":8400"} 1
Additional context
I think the error has occured here:
vmstorage has const value const maxTagFiltersSize = 64 * 1024.
So, we can not change this value.
Do you have any workarounds or solutions?
If above information isn't enough, please let me know.
Thank for your help.
It looks like vmstorage tries sending too big error message to vmselect. I added handling for this case in the commit afecb3449159a5a2256b7b7ace594f39e0ae677e. This commit is located far from v1.26.0-cluster , i.e. it will go into v1.33.1-cluster, so I backported the commit to v1.26.1-cluster. You can build VictoriaMetrics cluster components from this tag in order to obtain the original error message, which vmstorage failed to send to vmselect. Build steps for building Docker images for v1.26.1-cluster are below:
git clone https://github.com/VictoriaMetrics/VictoriaMetrics
cd VictoriaMetrics
git checkout v1.26.1-cluster
make package
This should build the following local docker images:
victoriametrics/vminsert:v1.26.1-cluster
victoriametrics/vmselect:v1.26.1-cluster
victoriametrics/vmstorage:v1.26.1-cluster
Another option is to wait for the v1.33.1-cluster release and upgrade to it.
It is likely the original issue is already fixed in v1.33.*
The commit that limits the maximum error message size, which can be sent from vmstorage to vminsert, has been included into v1.33.1 release. @govargo , could you check either v1.26.1 as described above or try v1.33.1?
Thank you very much!
I'll check it.
【Added】
We plan to upgrade victoria metrics version upgrade on March.
I'm sorry, until then please keep Issue open.
I published docker images for v1.26.1-cluster, so you can upgrade to this release in order to detect the original error message.
We update to the latest version(1.33.1-cluster).
And now we get no error with this issue.
Thank you!!
If the issue occur, we will open the other issue.
Most helpful comment
It is likely the original issue is already fixed in v1.33.*