Velero: How to use Prometheus to monitor velero?

Created on 20 Jul 2020  路  3Comments  路  Source: vmware-tanzu/velero

How to use Prometheus to monitor velero?
I didn't find this item in the document. Is there a link to the relevant document?

Metrics Question

Most helpful comment

Also, if you are into prometheus operator you can use these rules for backup failure alert :

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: velero
spec:
  groups:
  - name: velero-failures
    rules:
    - alert: VeleroBackupPartialFailures
      annotations:
        message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} partialy failed backups.
      expr: |-
        velero_backup_partial_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
      for: 15m
      labels:
        severity: warning
    - alert: VeleroBackupFailures
      annotations:
        message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} failed backups.
      expr: |-
        velero_backup_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
      for: 15m
      labels:
        severity: warning

All 3 comments

Hi @my1990,

If you are using the helm chart you already have a servicemonitor for prometheus operator. Take a look here https://github.com/vmware-tanzu/helm-charts

Also, if you are into prometheus operator you can use these rules for backup failure alert :

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: velero
spec:
  groups:
  - name: velero-failures
    rules:
    - alert: VeleroBackupPartialFailures
      annotations:
        message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} partialy failed backups.
      expr: |-
        velero_backup_partial_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
      for: 15m
      labels:
        severity: warning
    - alert: VeleroBackupFailures
      annotations:
        message: Velero backup {{ $labels.schedule }} has {{ $value | humanizePercentage }} failed backups.
      expr: |-
        velero_backup_failure_total{schedule!=""} / velero_backup_attempt_total{schedule!=""} > 0.25
      for: 15m
      labels:
        severity: warning

@my1990 You can find the troubleshooting steps on our docs website at https://velero.io/docs/v1.4/troubleshooting/#velero-is-not-publishing-prometheus-metrics

I am going to close this issue as the question has been answered.

Was this page helpful?
0 / 5 - 0 ratings