Once Ark has exposed prometheus metrics (#84 / #531), it would be great if we had an example Grafana dashboard for visualizing the state of Ark.
In the simplest form, this could be checked into the Ark repo. It could also be published to the Grafana dashboard community (https://grafana.com/dashboards).
Variables:
Potential end-user graphs:
@ashish-amarnath if you're working on #531 and find yourself creating a grafana dashboard to test with, here's an initial take at what some useful graphs might be.
Good idea!
Hi! Does a Grafana dashboard for Velero/Ark exist yet? I managed to have Prometheus operator scrape the Velero metrics but I don't know how to use them. Thanks
@vitobotta we don't have a sample one, but maybe another user has something they can share.
see also #1136


Hi i created a velero dashboard but the following metrics are missing:
velero_restore_duration_seconds_bucket
velero_restore_tarball_size_bytes
Maybe it will be useful as blueprint. Any suggestions are welcome.
https://gist.github.com/HaveFun83/57b41e85fde4249daab74a9850885f6a#file-kubernetes-_-addons-_-velero-stats-1568113703354-json
Hi i created a velero dashboard but the following metrics are missing:
velero_restore_duration_seconds_bucket
velero_restore_tarball_size_bytesMaybe it will be useful as blueprint. Any suggestions are welcome.
https://gist.github.com/HaveFun83/57b41e85fde4249daab74a9850885f6a#file-kubernetes-_-addons-_-velero-stats-1568113703354-json
Hi @HaveFun83 , thanks for a dashboard!
I see you calculate 'Active backup' with sum(rate(velero_backup_attempt_total[15m])) / sum(rate(velero_backup_success_total[15m]))
May I kindly ask you to spread some light on that? Thanks in advnace!
Hi i created a velero dashboard but the following metrics are missing:
velero_restore_duration_seconds_bucket
velero_restore_tarball_size_bytes
Maybe it will be useful as blueprint. Any suggestions are welcome.
https://gist.github.com/HaveFun83/57b41e85fde4249daab74a9850885f6a#file-kubernetes-_-addons-_-velero-stats-1568113703354-jsonHi @HaveFun83 , thanks for a dashboard!
I see you calculate 'Active backup' withsum(rate(velero_backup_attempt_total[15m])) / sum(rate(velero_backup_success_total[15m]))
May I kindly ask you to spread some light on that? Thanks in advnace!
the graph should represent @rosskukulinski suggestion:
Gauge showing number of active backups
But you are right this expression makes no sense i changed it but currently only active scheduled backups will be count
Hi @HaveFun83 ! Thanks for the dashboard, very useful. What is "Backup Time"? I thought it's the duration but it only shows a flat bar on the zero for me. Also, what does "Backup Success" show in case there are failed backups? Thanks!
Hi @HaveFun83 ! Thanks for the dashboard, very useful. What is "Backup Time"? I thought it's the duration but it only shows a flat bar on the zero for me. Also, what does "Backup Success" show in case there are failed backups? Thanks!
Backup Time shows the "velero_backup_duration_seconds_bucket" metric can you check your Prometheus if there is any data available?
Backup Success rate must be 1 if its below something is wrong.
Failed backups should be visible in "Backup Total Count"
@mtritabaugh You can work on this if you would like. I am not finding a way to assign this to you either :)
@nrb @ashish-amarnath Please assign this to me, thank you.
Hello @mtritabaugh, thanks for working on this topic !!
May I ask you if everything goes well ?
if you already have a draft, I would be glad to test it !
Most helpful comment
Hi i created a velero dashboard but the following metrics are missing:
velero_restore_duration_seconds_bucket
velero_restore_tarball_size_bytes
Maybe it will be useful as blueprint. Any suggestions are welcome.
https://gist.github.com/HaveFun83/57b41e85fde4249daab74a9850885f6a#file-kubernetes-_-addons-_-velero-stats-1568113703354-json