Docker provides a RestartCount stats.
This stats is used to monitor how many times a container was restarted, and this count increase may represent some kind of problem (like an OOM kill).
It would be great if cAdvisor collects this metric also, so we can put this on Prometheus and alert on some unexpected problem.
Thanks!
How does docker expose this value? We currently only monitor by using cgroups, so this may be a large change if we need to use docker's api.
Hi @dashpole.
Actually it exposes through API and not through cgroups. I was trying to search were in the code cAdvisor gets its data.
I don't think it's limited to cgroups, because of the 'Labels' and Image retrieval, but I couldn't figure out how to add also the restartCount metrics :)
Something probably specified here: https://github.com/google/cadvisor/blob/d7a44cb1a2c66e1688ccdc5d09e56069eecb659a/info/v1/container.go#L128
Thanks!
@dashpole FIY, made a PR here using DockerHandler and the API :)
Hope this is right!
Thanks!
This was closed in PR #1649
Any chance of a follow-up where this would be a separate metric, instead of or in addition to being a metric label?
Most helpful comment
Any chance of a follow-up where this would be a separate metric, instead of or in addition to being a metric label?