Cadvisor: prometheus metric for container healthcheck status

Created on 6 Feb 2019 · 11Comments · Source: google/cadvisor

Hi,

As far as I know, no metrics are available for healthcheck status of a container.

I see a metric about the "up" state of a container (container_last_seen) but nothing about what can be checked over State.Health.Status with docker

This statistic isn't really a metric because it return a string but i would guess that a bolean for each possible value would be useful (running, healthy, unhealthy for the ones I know )

kinenhancement

Source

replicajune

👍20

Most helpful comment

This would be one very useful addition.

xavs on 29 May 2019

👍7

All 11 comments

Does an equivalent exist for all container runtimes cAdvisor supports (mesos, containerd, rkt, docker)?

We usually try and stay away from spec-based metrics, as they tend to be runtime-specific, and generate large numbers of metric streams for each container.

dashpole on 6 Feb 2019

I'm quite unaware of all specifications that could exist at this time. I'm under the impression (and could be wrong) that the OCI had or would propose something standard for this.

So, I've no idea unfortunately

The need I have is to have a metric that is about the work produced by a container rather than a state (container_tasks_state) of a processus or the fact that a container might be up or not.

The healthcheck instruction and related statistics with docker helps to really figure out if a container actually does what it should and I don't really see metrics about that for now

replicajune on 6 Feb 2019

👍5

This would be one very useful addition.

xavs on 29 May 2019

👍7

Does anyone find the workaround?

BulatSaif on 21 Oct 2019

I am also looking to accomplish this.

fontanacalifornia on 21 Nov 2019

The kubelet does have these kind of metrics: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/prober/prober_manager.go#L38.
Those metrics are registered at /metrics/probes on the kubelet's port.

But that doesn't help anyone not using kubernetes...

I'm not sure if cAdvisor should take on metrics collection on probes, as it isn't performing them. I believe we currently only fetch the container from docker at container creation time, so this would require us to poll the runtime for the information. I'm not sure we can provide accurate cumulative probe metrics based on sampling the state. It seems like we are bound to miss probe failures.

dashpole on 23 Nov 2019

Hi Team,

Any advice/update/workaround here is much helpful for everyone. We needed this "health_check" very badly.

anil4u-04 on 2 Dec 2019

👍3

Hi everybody! sum(time() - container_last_seen) by (name) is a workaround for me, but sometimes it works really bad.

serhiiromaniuk on 18 Feb 2020

Also, for alerts sum(rate(container_last_seen{name=~".+"}[5m])) by (container_label_com_docker_compose_service) < 1, with 15s scrapes helps me to stop crying all day.

serhiiromaniuk on 20 Feb 2020

👍1

It's hard to create alerts based on metrics that disappear and it also goes against prometheus best practices. I still don't understand why we can't just use absent and move on but you can read more about it here:

https://www.robustperception.io/existential-issues-with-metrics

Recently, a coworker discovered this exporter:

https://github.com/prometheus-net/docker_exporter

Which exposed a very valuable metric: docker_container_running_state, this metric won't disappear when the container stops!

Here's an example:

$ sudo docker run \
    --name docker_exporter \
    --detach \
    --restart always \
    --volume /var/run/docker.sock:/var/run/docker.sock \
    --publish 9417:9417 \
    prometheusnet/docker_exporter
$ sudo docker create --name foo -it ubuntu sleep 10
$ sudo docker start foo
$ curl -s localhost:9417/metrics | grep state
docker_container_running_state{name="foo"} 1
docker_container_running_state{name="docker_exporter"} 1
# wait ten seconds
$ curl -s localhost:9417/metrics | grep state
docker_container_running_state{name="foo"} 0
docker_container_running_state{name="docker_exporter"} 1