Victoriametrics: Bad metrics / labels when using -promscrape.streamParse on vmagent

Created on 17 Nov 2020 · 4Comments · Source: VictoriaMetrics/VictoriaMetrics

This mostly fixed in https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825#issuecomment-723430240
But garbage metrics still present in small amounts. For example, labels from random ts appeared as metric names:

machine_dc=\dc1\
machine_group=\group1\}
\nodejs_common_name\

All this must be a labels in some time series, but became as a name of new ts.

bug

Source

wf1nder

All 4 comments

This may be related to a data race similar to this one, but I couldn't reproduce it yet on v1.47.0 :( Data race detector doesn't catch anything in vmagent / VictoriaMetrics.

@wf1nder , could you provide more details on the setup used? I.e. which config options are used for vmagent / VictoriaMetrics, which versions used? Which queries are used for obtaining invalid metrics names? Probably, these names are obtained from historical data when the issue from https://github.com/VictoriaMetrics/VictoriaMetrics/issues/825#issuecomment-723430240 wasn't fixed.

valyala on 18 Nov 2020

I'm sorry, but I also can't reproduce it again :(

I created a test stand with latest version of each VM component, started vmagent without -promscrape.streamParse, and started vmstorage on blank storage. It scrapes all our metrics, the same as main VM cluster. Metrics name list was obtained from VM by request: http://<vmselect>/select/0/prometheus/api/v1/label/__name__/values.
Then I enabled -promscrape.streamParse option on vmagent, repeated query to obtain metrics list, and compared those lists. And they was the same at this time.

I have an assumption that last time garbage metrics might have appeared due to the fact that after upgrade vmagent from v1.46.0 to v1.47.0 I wasn't cleaned its caches. It could read garbage metrics from cache, including scraped bad metrics, which wasn't sent before upgrade, and now send them to VM.

I'll keep an eye on it, but it looks good so far.
Thanks, and I'm sorry for bothering you.

wf1nder on 19 Nov 2020

👀2

After a while there still no bad metrics, looks like the problem is solved. Thank you!

wf1nder on 28 Nov 2020

🎉1

Then closing the bug as resolved.

valyala on 29 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings