Cadvisor: PercpuUsage errors in logs

Created on 5 Oct 2017 · 5Comments · Source: google/cadvisor

After upgrading to Kubernetes 1.8, the kubelet logs is flooded with the following error :

helpers.go:468] PercpuUsage had 0 cpus, but the actual number is 2; ignoring extra CPUs

Looks like it's related to https://github.com/google/cadvisor/pull/1711. Should those extras 0 filter log an error ?

We're running CoreOS beta (1520.4.0) in AWS :
Linux ip-10-150-16-116.ap-southeast-2.compute.internal 4.13.3-coreos #1 SMP Wed Sep 20 22:17:11 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux

Source

ghost

👍6

Most helpful comment

Also seeing this on GCE with CentOS 7.4 and Kubernetes 1.8:

Oct 10 05:29:39 staging-head-2 kubelet[1513]: E1010 05:29:39.001380    1513 helpers.go:468] PercpuUsage had 0 cpus, but the actual number is 1; ignoring extra CPUs

Linux staging-head-2 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I understood from #1711 that this should only affect 4.7+, so I'm not entirely sure it's the same issue, but RedHat backports a lot of changes from newer kernels.

unixwitch on 10 Oct 2017

👍4

All 5 comments

Also seeing this on GCE with CentOS 7.4 and Kubernetes 1.8:

Oct 10 05:29:39 staging-head-2 kubelet[1513]: E1010 05:29:39.001380    1513 helpers.go:468] PercpuUsage had 0 cpus, but the actual number is 1; ignoring extra CPUs

Linux staging-head-2 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I understood from #1711 that this should only affect 4.7+, so I'm not entirely sure it's the same issue, but RedHat backports a lot of changes from newer kernels.

unixwitch on 10 Oct 2017

👍4

cc @euank

dashpole on 10 Oct 2017

@dada941 How are you running the kubelet? Can you also post the output of cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu (or cpu or cpuacct depending on which exist) from the Kubelet's perspective (e.g. if it's in some sort of container, from within the container)?

We should probably treat '0' as a special value and shortcut out early since if len(s.CpuStats.CpuUsage.PercpuUsage) == 0 that could indicate an error condition I think, such as cgroups not mounted or that cgroup feature not being available.

After I understand why this is happening I can make a patch for that.

euank on 10 Oct 2017

Thanks for looking into this @euank

Yes, the Kubelet runs in a RKT container in CoreOS, but I've tried out of a container with the binary, and get the same result.

Here's the output of the command, run from the RKT container :

# cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu
87983471343 95801030681 0 0 0 0 0 0 0 0 0 0 0 0 0

ghost on 11 Oct 2017

I was able to reproduce that log message.

I've so far only been able to observe it once per cgroup creation at most due to a race right around creation.

In my reproduction, it doesn't have any ill effect beyond the noisy logline. stats/summary for the kubelet still shows usageCoreNanoSeconds correctly, which I believe is directly calculated from the sum of per cpu usage.

I created a PR (#1769) to avoid the noisy error log.

Thanks for reporting the issue!

euank on 11 Oct 2017

Was this page helpful?

0 / 5 - 0 ratings