Cadvisor: understand "container_cpu_load_average_10s" metric's value

Created on 13 Aug 2019 · 4Comments · Source: google/cadvisor

I have enabled "container_cpu_load_average_10s" for my cluster and its working well. Now when I see this metric on my Prometheus browser, it gives values like 1000, 1100.

As per my understanding load average is the number of processes waiting to get processed and if this definition is true then I really doubt the calculated value is giving the actual scenario.

I want to understand what these values ( metric value) indicate and if there is anything to make these values useful.

Source

vivekj11

Most helpful comment

From https://github.com/google/cadvisor/blob/master/info/v1/container.go#L320:

"Smoothed average of number of runnable threads x 1000. We multiply by thousand to avoid using floats, but preserving precision. Load is smoothed over the last 10 seconds. Instantaneous value can be read from LoadStats.NrRunning."

dashpole on 10 Oct 2019

👍3

All 4 comments

I'm seeing values I don't understand too. A quick look with ps within my container shows only one or two processes in the R or D state:

root@nginx-7f679d96bc-9t7zr:/# ps -eLo state,tid,args | awk '$1 ~ /^(R|D)/'
R   591 stress-ng -c 1 - l 100
R   750 ps -eLo state,tid,args

However, container_cpu_load_average_10s is approximately 800.

Not sure if CPU limits play a part in the cAdvisor calculation, but this container has a limit of 250m on a VM with 4vCPU, and I'm pushing the CPU using stress-ng to test my prometheus setup (perhaps CPU throttling has an impact on this metric?)