How to use Prometheus to monitor velero?
I didn't find this item in the document. Is there a link to the relevant document?
Hi @my1990,
If you are using the helm chart you already have a servicemonitor for prometheus operator. Take a look here https://github.com/vmware-tanzu/helm-charts
Also, if you are into prometheus operator you can use these rules for backup failure alert :
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: velero
spec:
groups:
- name: velero-failures
rules:
- alert: VeleroBackupPartialFailures
annotations:
message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} partialy failed backups.
expr: |-
velero_backup_partial_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
for: 15m
labels:
severity: warning
- alert: VeleroBackupFailures
annotations:
message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} failed backups.
expr: |-
velero_backup_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
for: 15m
labels:
severity: warning
@my1990 You can find the troubleshooting steps on our docs website at https://velero.io/docs/v1.4/troubleshooting/#velero-is-not-publishing-prometheus-metrics
I am going to close this issue as the question has been answered.
Most helpful comment
Also, if you are into prometheus operator you can use these rules for backup failure alert :