Node_exporter: Kernel bug skewing node_cpu{mode="steal"} metrics

Created on 21 Nov 2017 · 22Comments · Source: prometheus/node_exporter

Host operating system: output of `uname -a`

Debian 9
Linux ip-10-11-1-110 4.9.0-4-amd64 #1 SMP Debian 4.9.51-1 (2017-09-28) x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

prom/node-exporter:v0.15.0

node_exporter command line flags

/usr/bin/docker run \
    -v /proc:/host/proc:ro \
    -v /sys:/host/sys:ro \
    -v /:/rootfs:ro \
    -e HOST_HOSTNAME="/rootfs/etc/hostname" \
    -p 19997:9100 \
    --pid="host" \
    --name %i \
    basi/node-exporter:v1.14.0 \
    -collector.procfs "/host/proc" \
    -collector.sysfs /host/sys \
    -collector.textfile.directory /etc/node-exporter/ \
    -collectors.enabled 'conntrack,diskstats,entropy,filefd,filesystem,loadavg,mdadm,meminfo,netdev,netstat,stat,textfile,time,vmstat,ipvs,systemd' \
    -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($$|/)"

Are you running node_exporter in Docker?

Yes

What did you do that produced an error?

sum(rate(node_cpu{instance="$instance"}[1m])) by (mode) * 100 / count_scalar(node_cpu{mode="user", instance="$instance"})

What did you expect to see?

Around 100% usage if we aggregate all mode values

What did you see instead?

Some level of steal is expected as this is running on AWS, but I believe the value is out of proportion.

Element | Value
-- | --
{mode="guest"} | 0
{mode="user"} | 0
{mode="iowait"} | 0.033333333333366175
{mode="system"} | 0
{mode="irq"} | 0
{mode="idle"} | 94.95555555551417
{mode="steal"} | 67456900655.031654
{mode="nice"} | 0
{mode="softirq"} | 0

Stats

# cat /proc/stat
cpu  10054 454 10504 1435808 6220 0 64 1802 0 0
cpu0 4970 268 5291 717279 3584 0 8 752 0 0
cpu1 5084 186 5212 718529 2636 0 56 1049 0 0

pendinclose

Source

pecigonzalo

Most helpful comment

According to https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest/ this affects guest kernel versions 4.8, 4.9 and 4.10.

It also appears that similar bugs existed before in Kernel 3.x, but was fixed in 4.x < 4.8.

Note that not all guests handle this condition in the same way: 3.2
guests still get the overflow in /proc/stat, but their scheduler
continues to work as expected. 3.16 guests OTOH go nuts once steal time
overflows and stop accumulating system & user time, while entering an
erratic state where steal time in /proc/stat is decreasing on every
clock tick.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785557;msg=61

grobie on 8 Jan 2018

👍4

All 22 comments

This sounds like you're getting funny data out of /proc/stat at some point. In order to debug this we need the raw samples from your system.

You can query this with the instant API or expression browser.

node_cpu{instance="$instance",mode="steal"}[5m]

SuperQ on 21 Nov 2017

Here:

Element | Value
-- | --
node_cpu{cpu="cpu1",instance="$instance",instance_ip="10.10.1.232",job="node-exporter",mode="steal"} | 13823647282.09 @1511271040.93112768804791.68 @1511271055.93111656879505.85 @1511271070.93110569135126.41 @1511271085.9319520238760.52 @1511271100.9318489180768.2 @1511271115.9317448212568.35 @1511271130.9316378702970.78 @1511271145.9315347644978.46 @1511271160.9314318965435.95 @1511271175.9313243113305.56 @1511271190.9312174396524.6 @1511271205.9311105283335.33 @1511271220.93128241980.03 @1511271235.93117391998573.92 @1511271250.93116297118845.05 @1511271265.93115237519455.02 @1511271280.93114185451822.72 @1511271295.93113121491941.37 @1511271310.93112064667409.46 @1511271325.931
node_cpu{cpu="cpu0",instance="$instance",instance_ip="10.10.1.232",job="node-exporter",mode="steal"} | 14983696025.52 @1511271040.93113903879812.11 @1511271055.93112808207266.65 @1511271070.93111754950409.44 @1511271085.93110629547241.36 @1511271100.9319505729706.48 @1511271115.9318397768503.66 @1511271130.9317265626394.45 @1511271145.9316177089198.41 @1511271160.9315109561642.35 @1511271175.9314041241269.69 @1511271190.9312933676475.17 @1511271205.9311838400338 @1511271220.931755016449.89 @1511271235.93118087060379.65 @1511271250.93116983063259.85 @1511271265.93115901661413.24 @1511271280.93114786564860.99 @1511271295.93113680189291.38 @1511271310.93112567867597.25 @1511271325.931

Although as you can see on my /proc/stat they look normal, or you mean at some point in the past?

pecigonzalo on 21 Nov 2017

It looks like the formatting got a little messed up, but after cleaning it up you can see that what is being recorded is completely different from what /proc/stat says. Something is getting corrupted.

13823647282.09 1511271040.931
12768804791.68 1511271055.931
11656879505.85 1511271070.931
10569135126.41 1511271085.931
9520238760.52 1511271100.931
8489180768.2 1511271115.93
17448212568.35 1511271130.931
6378702970.78 1511271145.931
5347644978.46 1511271160.931
4318965435.95 1511271175.931
3243113305.56 1511271190.931
2174396524.6 1511271205.931
1105283335.33 1511271220.931
28241980.03 1511271235.931
17391998573.92 1511271250.931
16297118845.05 1511271265.931
15237519455.02 1511271280.931
14185451822.72 1511271295.931
13121491941.37 1511271310.931
12064667409.46 1511271325.931

What do you get if you manually curl 'http://$instance:9100/metrics' | grep node_cpu?

SuperQ on 21 Nov 2017

This is in docker, so using the mapped port:

> curl 'http://localhost:19997/metrics' | grep node_cpu
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="guest"} 0
node_cpu{cpu="cpu0",mode="idle"} 998557.46
node_cpu{cpu="cpu0",mode="iowait"} 492.78
node_cpu{cpu="cpu0",mode="irq"} 0
node_cpu{cpu="cpu0",mode="nice"} 6.02
node_cpu{cpu="cpu0",mode="softirq"} 1.91
node_cpu{cpu="cpu0",mode="steal"} 1.208129581039e+10
node_cpu{cpu="cpu0",mode="system"} 661.86
node_cpu{cpu="cpu0",mode="user"} 798.07
node_cpu{cpu="cpu1",mode="guest"} 0
node_cpu{cpu="cpu1",mode="idle"} 998280.19
node_cpu{cpu="cpu1",mode="iowait"} 312.78
node_cpu{cpu="cpu1",mode="irq"} 0
node_cpu{cpu="cpu1",mode="nice"} 1.84
node_cpu{cpu="cpu1",mode="softirq"} 7.64
node_cpu{cpu="cpu1",mode="steal"} 6.6865899412e+08
node_cpu{cpu="cpu1",mode="system"} 670.77
node_cpu{cpu="cpu1",mode="user"} 807.26

pecigonzalo on 21 Nov 2017

Ok I found a major problem with my report, I apologize for the confusion.
The config for the current process is actually:

/usr/bin/docker run \
    -v /proc:/host/proc:ro \
    -v /sys:/host/sys:ro \
    -v /:/rootfs:ro \
    -e HOST_HOSTNAME="/rootfs/etc/hostname" \
    -p 19997:9100 \
    --pid="host" \
    --name %i \
    basi/node-exporter:v1.14.0 \
    -collector.procfs "/host/proc" \
    -collector.sysfs /host/sys \
    -collector.textfile.directory /etc/node-exporter/ \
    -collectors.enabled 'conntrack,diskstats,entropy,filefd,filesystem,loadavg,mdadm,meminfo,netdev,netstat,stat,textfile,time,vmstat,ipvs,systemd' \
    -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($$|/)"

I got the wrong information from my config VCS (using 0.15), than the current environment running version (0.14)

PS: Made an edit to my first report, to avoid confusion.

pecigonzalo on 21 Nov 2017

Ahh, great, that makes a lot more sense. 😃

SuperQ on 21 Nov 2017

Sorry SuperQ the issue is still relevant to 0.14, I just pasted the wrong 'command I was running on the server'.
Ill deploy 0.15 and test before we maybe reopen this as to ensure it was not fixed somehow on 0.15

pecigonzalo on 23 Nov 2017

Ok. I suspect there's some kind of corruption inside docker that is messing up reading the values from /proc/stat. The only way to really know what's going on is to run a debugging binary with a bunch of printing of the raw data.

SuperQ on 23 Nov 2017

Not related to node_exporter, but I'm seeing something similar with steal time on a (cloudvps) node.

Idle time (should be at most 100):

# num=-1; while true; do new=$(awk '/^cpu2 /{print $5}' /proc/stat); \
    test $num -ne -1 && echo $((new-num)); num=$new; sleep 1; done 
99
95
56
19
95
93
92
95
...

^-- looks okay

Steal time:

# num=-1; while true; do new=$(awk '/^cpu2 /{print $9}' /proc/stat); \
    test $num -ne -1 && echo $((new-num)); num=$new; sleep 1; done 
-832457433
-396408302
-753175773
-515330792
-792816603
-951379924
-832457433
-554971622
...

^-- off the charts

# uname -a
Linux hostname 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu xenial system, with dmidecode:

System Information
        Manufacturer: OpenStack Foundation
        Product Name: OpenStack Nova
        Version: 2014.1.3

In our case, this was observed when Zabbix reports 100% steal time continuously, while other tools like top and vmstat only report some (when diff > 0).

Interestingly, of the 4 cpu's, CPU0 appears to behave ok with only small steal-time increments.

I'm curious to know what you see on your systems.

wdoekes on 1 Dec 2017

We just found a node which experiences the same issue. It's running on EC2 and reports a decreasing steal counter. Applying rate on a decreasing counter will then result in the report 5B CPU seconds spent in steal mode per second ;-)

$ for i in {1..10}; do cat /proc/stat | grep cpu0; sleep 1; done                                                                                                                                                                                                      cpu0 22536 17835 22383 42574061 31924 0 584 1262648371943 0 0
cpu0 22536 17835 22383 42574160 31924 0 584 1258406803117 0 0
cpu0 22536 17835 22383 42574258 31924 0 584 1253808466819 0 0
cpu0 22536 17835 22383 42574357 31924 0 584 1249011926370 0 0
cpu0 22536 17835 22383 42574455 31924 0 584 1242827956866 0 0
cpu0 22536 17835 22383 42574553 31924 0 584 1237278240645 0 0
cpu0 22536 17835 22383 42574651 31924 0 584 1230975348650 0 0
cpu0 22536 17835 22383 42574749 31924 0 584 1224038203373 0 0
cpu0 22536 17835 22383 42574847 31924 0 584 1218765972963 0 0
cpu0 22536 17835 22383 42574945 31924 0 584 1212938770930 0 0

Kernel: 4.10.15

grobie on 8 Jan 2018

According to https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest/ this affects guest kernel versions 4.8, 4.9 and 4.10.

It also appears that similar bugs existed before in Kernel 3.x, but was fixed in 4.x < 4.8.

Note that not all guests handle this condition in the same way: 3.2
guests still get the overflow in /proc/stat, but their scheduler
continues to work as expected. 3.16 guests OTOH go nuts once steal time
overflows and stop accumulating system & user time, while entering an
erratic state where steal time in /proc/stat is decreasing on every
clock tick.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785557;msg=61

grobie on 8 Jan 2018

👍4

Amazing find, I did not have time to correctly add more information but we still see the issue, thanks for reopening/finding this.

pecigonzalo on 8 Jan 2018

Oh nice :)
Interesting question how to solve this from our side. At least we should keep this issue open for tracking but we could go as far as logging a warning for known-to-be-bad kernel versions?

discordianfish on 8 Jan 2018

@discordianfish We had a chat on IRC and mostly agreed that tracking kernel bugs is mostly out of scope for the node_exporter. However we want to document this case in the README under known issues.

SuperQ on 8 Jan 2018

One thing that came up in conversation was the option to split out the steal mode from the normal metric, like we do with guest and guest_nice.

What do people think about this?

SuperQ on 15 Jan 2018

👍1

Is there double counting in node_cpu due to steal? That'd be the only reason to split it out in my mind.

brian-brazil on 15 Jan 2018

I did some investigation, and I came to the conclusion that "steal" is valid as a label. I don't know if there's a lot we can do in the node_exporter, as this is a kernel bug.

SuperQ on 21 Feb 2018

@grobie @discordianfish Do you think there is anything we should do about this given it's a kernel bug, not a node_exporter bug?

SuperQ on 12 Jun 2018

Hi @SuperQ , I don't think node_exporter should do anything to get around this issue. Since the issue indicates an abnormal situation, it is exactly the expected behavior for node_exporter to catch and report it.

yuankunzhang on 12 Jun 2018

👍1

While we could choose to emit warnings (log and/or new metric) if run on problematic kernels, I don't think that would be sustainable.

grobie on 12 Jun 2018

The issue only happens (occasionally) after a VM live migration. We can:

Upgrade to a newer kernel;
Or pay more attention to the status of the VM during migrations (so that we can act quickly after the issue pops up).

Sadly, as far as I know, the only way to fix the /proc/stat counters is to restart the VM.

yuankunzhang on 12 Jun 2018

Yes I think we can't do much here, so let's close this.

discordianfish on 12 Jun 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Add total number of running processes metric

mInrOz · 5Comments

1.0.0-rc.0 CentOS 7 mdadm exporter not reporting failed drive

Justin417 · 3Comments

on Raspberry Pi: err="error parsing mdstatus: error parsing mdstat /proc/mdstat: open /proc/mdstat: no such file or directory"

vielmetti · 3Comments

collector 'cpu' not available

cjroebuck · 3Comments

SysV Init script for older systems

prologic · 4Comments

Node_exporter: Kernel bug skewing node_cpu{mode="steal"} metrics

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

What did you see instead?

Most helpful comment

All 22 comments

Related issues

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`