Cadvisor: machine.go reaching code on aarch64 it shouldn't

Created on 26 Apr 2019  路  13Comments  路  Source: google/cadvisor

I'm seeing odd behaviour with cadvisor-0.33.1 (rev 7e9ea00b) bundled with kubelet-1.14.1 on arm64. In the logs I see messages like:

Apr 25 22:56:31 kubemaster kubelet: E0425 22:56:31.381303    9928 machine.go:288] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache/index0/size: no such file or directory

This is odd because lines 253-256 in machine.go, there's code to prevent reaching line 288. (Note: This was introduced with https://github.com/google/cadvisor/pull/2114)

The host, kubemaser, is an aarch64 host. I wrote something to mimic what cadvisor is doing to determine if the architecture, and the output is as follows

[root@kubemaster ~]# go run aarch64.go 
getMachineArch() err = <nil>
getMachineArch() sysname = Linux
getMachineArch() nodename = kubemaster
getMachineArch() release = 4.20.0-1090-ayufan-gd1277c20e10d
getMachineArch() version = #ayufan SMP PREEMPT Sun Feb 24 11:51:32 UTC 2019
getMachineArch() machine = aarch64
isAArch64() arch = aarch64, err = <nil>
am I aarch64 => true?

The hardware is a Rock64 arm64 board, and it's answering to isAArch64 in my test, so I'm confused as to why the code isn't continueing.

All 13 comments

cc @lubinszARM

@lisa @dashpole I see. Let me check it. Thanks.

Hi @lisa
It should not be a problem.
Firstly, the lines 253-256 in machine.go does not to prevent reaching line 288.
Because they are belong to different logic codes.
So, on Arm platform, line 288 can be touched.

The second point is that, one of the biggest problem with Arm platform is that the various hardware models are not uniform.
On some Arm boards, there is this cache information, and some do not.
Such as, on Calvium-thunderx1 board, no cache information in /sys/devices/system/cpu/cpu0/cache/index0/
But in Calvium-thunderx2 board, we can get the information:
root@entos-thunderx2-02:/sys/devices/system/cpu/cpu0/cache/index0# uname -m
aarch64
root@entos-thunderx2-02:/sys/devices/system/cpu/cpu0/cache/index0# cat size
32K

@lisa
So can k8s run smoothly on your Rock64 board?

@lubinsz I'm not sure how I missed the logic difference! Yes, it should reach line 288 in all cases; perhaps the presence of those interesting file(s) should be checked before reading?

The answer to your second question is yes, for the most part, Kubernetes runs smoothly on my Rock64 cluster. There are some kernel issues that I live with, but I think they are unrelated to cadvisor.

@lisa
I see.
If you have any problems in k8s related to Arm platform, please let me know.
Thanks.

I am also seeing this error in the journal on my RPi with Raspbian. It would be great if the error could be silenced for platforms where this does not apply.

Yes, @lubinsz it appears that not all CPUs have cache, so it should not be logged as an error. At best it should perhaps be logged as a warning once per process, but not once per poll cycle.

Hi @brandond @alexellis
I have changed the Errorf as Warningf in the PR:https://github.com/google/cadvisor/pull/2456

thanks @lubinsz, just one question on the change - does that mean that we'll still get the message filling up the systemd journal with the word "warning" instead of "error", or will it be silenced/hidden by default?

Alex

@alexellis
Unfortunately, maybe this is still in the syslog.
I will resubmit a PR to solve your troubles.
Thank you for using arm.

cc @dashpole @dims

@lubinsz Thanks for the PR!

Was this page helpful?
0 / 5 - 0 ratings