Victoriametrics: [vmalert] Incompatible parsing fails but not with prometheus

Created on 7 Oct 2020  路  8Comments  路  Source: VictoriaMetrics/VictoriaMetrics

Describe the bug
The following rule:

      - alert: ClusterMemoryRequestsHigh
        expr: (sum(kube_pod_container_resource_requests_memory_bytes) by (kubernetes_cluster_name) / sum(kube_node_status_allocatable_memory_bytes) by (kubernetes_cluster_name)) > 0.8
        for: 1d
        labels:
          severity: critical
        annotations:
          msgfwd_mail_to: [email protected]
          summary: 'Workload on cluster {{ $labels.kubernetes_cluster_name }} has a memory request > 80%'
          helper_url: 'https://grafana.pnet.ch/d/U_gA5P2mz/kubernetes-cluster-resources?orgId=1&var-Cluster={{ $labels.kubernetes_cluster_name }}'
          description: "Check the cluster resources and if feasible extend the cluster"

fails with vmalert with the following error:

cannot unmarshal !!str `1d` into time.Duration

To Reproduce
Create a rule with above alert.

Expected behavior
Since people vailidate rules with promtool it should also be valid vor vmalert.

Version
1.43.0

bug vmalert

All 8 comments

Confirmed the bug. Temporary workaround is to substitute 1d with 24h.

@zbindenren , the bug should be fixed in the commit f4e8687c88cf11ffef514acd9b3a4cc5b9348483 . Could you try building vmalert from this commit according to these docs and verify whether it works as expected?

Confirmed the bug. Temporary workaround is to substitute 1d with 24h.

Yes I am already doing this :smile:

@valyala , the parsing error is gone with vmagent-20201009-051113-heads-master-0-gf4e8687c, but now I get a panic:

Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: 2020-10-09T05:15:07.828Z        info        lib/cgroup/cpu.go:37        updating GOMAXPROCS to 8 according to cgroup CPU quota
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: 2020-10-09T05:15:07.828Z        info        app/vmagent/main.go:84        starting vmagent at ":8880"...
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: 2020-10-09T05:15:07.833Z        info        lib/memory/memory.go:43        limiting caches to 15059465011 bytes, leaving 10039643341 bytes to the OS according to -memory.allowedPercent=60.000000
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: 2020-10-09T05:15:07.833Z        error        lib/persistentqueue/persistentqueue.go:151        cannot open persistent queue at "vmagent-remotewrite-data/persistent-queue/1_8C2800B1F469896A": cannot create directory "vmagent-remotewrite-data/persistent-queue/1_8C2800B1F469896A": mkdir vmage
nt-remotewrite-data: permission denied; cleaning it up and trying again
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: 2020-10-09T05:15:07.833Z        panic        lib/persistentqueue/persistentqueue.go:155        FATAL: cannot create directory "vmagent-remotewrite-data/persistent-queue/1_8C2800B1F469896A": mkdir vmagent-remotewrite-data: permission denied
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: panic: FATAL: cannot create directory "vmagent-remotewrite-data/persistent-queue/1_8C2800B1F469896A": mkdir vmagent-remotewrite-data: permission denied
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: goroutine 1 [running]:
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logMessage(0xad7b9a, 0x5, 0xc0002781b0, 0x90, 0x4)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/lib/logger/logger.go:203 +0xca5
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevelSkipframes(0x1, 0xad7b9a, 0x5, 0xadb498, 0x9, 0xc00011fb40, 0x1, 0x1)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/lib/logger/logger.go:125 +0xd1
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevel(...)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/lib/logger/logger.go:117
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.Panicf(...)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/lib/logger/logger.go:113
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue.mustOpen(0xc0000241c0, 0x3c, 0xc000022070, 0xc, 0x20000080, 0x2000000, 0x0, 0xac4120)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/lib/persistentqueue/persistentqueue.go:155 +0x371
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue.MustOpen(...)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/lib/persistentqueue/persistentqueue.go:139
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/lib/persistentqueue.MustOpenFastQueue(0xc0000241c0, 0x3c, 0xc000022070, 0xc, 0xc8, 0x0, 0x3c)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/lib/persistentqueue/fastqueue.go:43 +0x92
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.newRemoteWriteCtx(0x0, 0xc0000240d0, 0x22, 0xc8, 0xc000022070, 0xc, 0xc)
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/app/vmagent/remotewrite/remotewrite.go:208 +0x1c7
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: github.com/VictoriaMetrics/VictoriaMetrics/app/vmagent/remotewrite.Init()
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/app/vmagent/remotewrite/remotewrite.go:93 +0x2bf
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]: main.main()
Oct 09 07:15:07 t1-vmselect-alsu101 vmalert[365251]:         /tmp/VictoriaMetrics/app/vmagent/main.go:86 +0x398

which I don't get with 1.43.0.

Try removing vmagent-remotewrite-data directory. It looks like the directory has been created by another user or permissions for this directory have been changed during vmagent upgrade.

I copied the new binary to the location of the old binary. Tried to restart the service and the above error came. I put the old binary in place and everything worked again. I did not change permissions.

copied the new binary to the location of the old binary. Tried to restart the service and the above error came. I put the old binary in place and everything worked again. I did not change permissions.

@hagen1778 , could you look into this issue?

FYI, the bugfix for proper handling of for: 1d config has been included in v1.44.0.

Everything works.

Was this page helpful?
0 / 5 - 0 ratings