After upgrading to Kubernetes 1.8, the kubelet logs is flooded with the following error :
helpers.go:468] PercpuUsage had 0 cpus, but the actual number is 2; ignoring extra CPUs
Looks like it's related to https://github.com/google/cadvisor/pull/1711. Should those extras 0 filter log an error ?
We're running CoreOS beta (1520.4.0) in AWS :
Linux ip-10-150-16-116.ap-southeast-2.compute.internal 4.13.3-coreos #1 SMP Wed Sep 20 22:17:11 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux
Also seeing this on GCE with CentOS 7.4 and Kubernetes 1.8:
Oct 10 05:29:39 staging-head-2 kubelet[1513]: E1010 05:29:39.001380 1513 helpers.go:468] PercpuUsage had 0 cpus, but the actual number is 1; ignoring extra CPUs
Linux staging-head-2 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
I understood from #1711 that this should only affect 4.7+, so I'm not entirely sure it's the same issue, but RedHat backports a lot of changes from newer kernels.
cc @euank
@dada941 How are you running the kubelet? Can you also post the output of cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu (or cpu or cpuacct depending on which exist) from the Kubelet's perspective (e.g. if it's in some sort of container, from within the container)?
We should probably treat '0' as a special value and shortcut out early since if len(s.CpuStats.CpuUsage.PercpuUsage) == 0 that could indicate an error condition I think, such as cgroups not mounted or that cgroup feature not being available.
After I understand why this is happening I can make a patch for that.
Thanks for looking into this @euank
Yes, the Kubelet runs in a RKT container in CoreOS, but I've tried out of a container with the binary, and get the same result.
Here's the output of the command, run from the RKT container :
# cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu
87983471343 95801030681 0 0 0 0 0 0 0 0 0 0 0 0 0
I was able to reproduce that log message.
I've so far only been able to observe it once per cgroup creation at most due to a race right around creation.
In my reproduction, it doesn't have any ill effect beyond the noisy logline. stats/summary for the kubelet still shows usageCoreNanoSeconds correctly, which I believe is directly calculated from the sum of per cpu usage.
I created a PR (#1769) to avoid the noisy error log.
Thanks for reporting the issue!
Most helpful comment
Also seeing this on GCE with CentOS 7.4 and Kubernetes 1.8:
I understood from #1711 that this should only affect 4.7+, so I'm not entirely sure it's the same issue, but RedHat backports a lot of changes from newer kernels.