Prometheus is quickly becoming de-facto for monitoring and alerting. The model of pull instead of push would fit well in the salt-master/salt-minion topology. Exposing metrics in salt-minion and salt-master would allow more flexibility in monitoring large salt environments.
Here is a list of other software that have metrics exposed:
https://prometheus.io/docs/instrumenting/exporters/
As an example of metric data that could be exposed by the minion:
salt_minion_last_run_state_completion_time
salt_minion_last_run_state_executed
salt_minion_last_run_state_error
salt_minion_last_run_state_success
There is already an exporter out there https://github.com/BonnierNews/saltstack_exporter
But i don't see a reason to not add an engine that can do this inside of salt.
Marked as a feature request.
:+1: I would really appreciate internal metrics.
@gtmanfred Yes there is an exporter, but it is necessary to run regular dry Highstates to get the data you want. The default is every 5 minutes, which also makes sense if you want to see the changes which would happen on your next Highstate before you run it yourself.
But in large environments running dry Highstates every 5 minutes will cause many side effects (blocked minions, high load on the Master,...) which will affect you system in a negative way.
Real internal metrics which are being collected continuously are a whole different and much more reliable story.
Additionally I would like to have the internal metrics not only for the minion but also for the master service to be able to get an idea how many minions are connected, how many states are being run over time and what the success ratio is.
Does it make sense to start a list which metrics we want to have from such an internal metrics endpoint before somebody starts implementing?
This sounds like the job for a custom engine, that would be super awesome if it were contributed back to salt, but is probably something that the community will need to do.
@gtmanfred Does this exist in Saltstack Enterprise? It sounds like it must...
I don't really know anything about enterprise. I also no longer work for salt.
Oh sorry @gtmanfred didn't realize. Hope you are well though!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
not stale
Thank you for updating this issue. It is no longer marked as stale.
Any progress?
This is very much needed for our use case. We have a few multi-master saltstack deployments each with few hundred minions connected with automation. Monitoring based on prometheus, exporter, grafana would help a lot.
I would like to suggest some possible salt-master side metrics to expose.
salt_master_keys{key_state="accepted"}
salt_master_gitfs_lock
salt_master_number_of_scheduled_jobs
salt_master_number_of_threads
salt_master_number_of_jobs_active
salt_master_number_of_minions_return
salt_master_running_process
salt_syndic_running_process
salt_syndic_master_sync
I think these are some of the things I would like to be able to track and possibly alert on. For the masters, part of it is knowing the master is healthy, but then also being able to track load over time as more minions are added to it. These types of metrics can help figure out that right balance of resources and scale.
Most helpful comment
I would like to suggest some possible salt-master side metrics to expose.
I think these are some of the things I would like to be able to track and possibly alert on. For the masters, part of it is knowing the master is healthy, but then also being able to track load over time as more minions are added to it. These types of metrics can help figure out that right balance of resources and scale.