Hi all,
I am using kafka and telegraf to import data into Victoria metrics.
Also using the vmalert with standard config and datasource.url from Victoria metrics.
Vmalert using a simple rule with sum(counter) > 20.
The problem/question:
There is sometime a lag on the import from telegraf. So, data will be imported after 30-120 seconds the current timestamp. If there is such a lag, vmalert will clear/delete the current fireing or pending alarm. If the data is imported without any lag afterwards, a new alert will be created.
Is this a bug or just normal behavior? I would expect vmalert will wait until new data is gather before clearing the alert (especially when using the for parameter in the alert rule).
Regards
CWollinger
Hi @CWollinger!
I would expect vmalert will wait until new data is gather before clearing the alert
vmalert is a separate stateless service and it doesn't know if data was already delivered to VM or not. The logic behind any alert is simple: if query returns at least one result it will fire an alert; if no results were returned alert will be moved to inactive.
In order to deal with the issue you describe I'd propose following actions:
sum_over_time(counter[5m]) > 20. Such query will check not only the latest datapoint, but all datapoints in 5m interval. This should mitigate cases when data is delayed.-datasource.lookback flag - "Lookback defines how far to look into past when evaluating queries. For example, if datasource.lookback=5m then param "time" with value now()-5m will be added to every query." Please let me know your thoughts regarding this.
Hi @hagen1778
Thank you very much for the clarification.
I my case the lookback didn鈥檛 work. But I figured out that sometimes Victoria metrics doesn鈥檛 return the data for tis query. In the http response the "result" is empty, and the metric key is missing. After a few retries with the same timestamp in query (1 hour back), vm will return the data again.
So, I think vmalert works like expected and the empty http response from Victoria metrics is the problem.
I can use or absent(counter) in the alert query as an workaround. Is there a known issue for the empty http response?
Similar to #382, but using latest victoria-metrics-20201013-140135-tags-v1.44.0-0-g94978af9b version.
I my case the lookback didn鈥檛 work.
Could you pls provide the exact flag value or command you use to start vmalert?
So, I think vmalert works like expected and the empty http response from Victoria metrics is the problem.
Interesting. I'd try to reproduce this case and get back to you.
Could you pls provide the exact flag value or command you use to start
vmalert?
-rule=/alert.rules, -datasource.url=http://x.x.x.x:8086, -notifier.url=http://x.x.x.x:9093, -datasource.lookback=1h
Hi @CWollinger!
So, I think vmalert works like expected and the empty http response from Victoria metrics is the problem.
Related issue for empty results from instant queries https://github.com/VictoriaMetrics/VictoriaMetrics/issues/845. Issue was fixed in v1.45.0. Can you verify if you still observe described behaviour?
Hi @hagen1778
After upgrading to v1.45.0, I have no problems with vmalert and I can't find any empty response from vm in the tcpdump.
So, thank you very much. Fix in #845 solved the issue.
Most helpful comment
Hi @hagen1778
After upgrading to v1.45.0, I have no problems with vmalert and I can't find any empty response from vm in the tcpdump.
So, thank you very much. Fix in #845 solved the issue.