Origin: inode usage metrics seem incorrect

Created on 12 Dec 2017 · 39Comments · Source: openshift/origin

cadvisor seems to expose metrics related to inode usage in containers (cc @stevekuznetsov) but from a look it seems that some of these metrics are incorrect for some containers.

For example, container_fs_inodes_free == 0 returns containers that have actually pretty low inode usage, eg.

container_fs_inodes_free{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="n1-standard-2",beta_kubernetes_io_os="linux",container_name="POD",device="/dev/sda1",failure_domain_beta_kubernetes_io_region="us-central1",failure_domain_beta_kubernetes_io_zone="us-central1-a",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9abca0ce_c014_11e7_86d0_42010a800002.slice/docker-de9b645a537f1702cb32222dd1428a4af3c23a03f197032be2494cdd41b3b912.scope",image="openshift/origin-pod:v3.7.0-rc.0",instance="origin-ci-ig-m-11v4",job="kubernetes-cadvisor",kubernetes_io_hostname="origin-ci-ig-m-11v4",name="k8s_POD_registry-console-1-w927b_default_9abca0ce-c014-11e7-86d0-42010a800002_1",namespace="default",pod_name="registry-console-1-w927b",role="infra",subrole="master"}

$ oc exec -it registry-console-1-w927b -n default -- df -i
Filesystem       Inodes  IUsed    IFree IUse% Mounted on
overlay        78641672 173610 78468062    1% /
tmpfs            936810     18   936792    1% /dev
tmpfs            936810     16   936794    1% /sys/fs/cgroup
/dev/sda1      78641672 173610 78468062    1% /etc/hosts
shm              936810      1   936809    1% /dev/shm
tmpfs            936810     11   936799    1% /run/secrets/kubernetes.io/serviceaccount

This means we cannot reliably build alerts on top of these metrics. For example, we just realized one of our Jenkins masters is out of inodes for some days now.

@openshift/sig-pod

componenmetrics kinbug lifecyclrotten sipod ¯\_(ツ)¯

Source

kargakis

All 39 comments

/cc @derekwaynecarr @sjenning

stevekuznetsov on 12 Dec 2017

@ingvagabund PTAL

sjenning on 22 Jan 2018

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 23 Apr 2018

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 23 May 2018

/remove-lifecycle rotten
/cc @sjenning @ingvagabund

stevekuznetsov on 31 May 2018

Issues go stale after 90d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 29 Aug 2018

/remove-lifecycle stale
/cc @sjenning @ingvagabund @openshift/sig-pod

stevekuznetsov on 29 Aug 2018

Issues go stale after 90d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 28 Nov 2018

/shrug

stevekuznetsov on 28 Nov 2018

@RobertKrawitz could you take a look?

sjenning on 28 Nov 2018

@kargakis is there a reproducible case (or a running container I can look at)?

RobertKrawitz on 28 Nov 2018

Ask at forum-testplatform in Slack, this may not be an issue anymore.

On Wed, Nov 28, 2018, 19:01 Robert Krawitz <[email protected] wrote:

@kargakis https://github.com/kargakis is there a reproducible case (or
a running container I can look at)?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/issues/17732#issuecomment-442544820,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADuFfz0iaw9Xmp1X5eEDR_HJg7cQiS_Nks5uzs9UgaJpZM4Q_OxB
.

kargakis on 28 Nov 2018

👍1

@RobertKrawitz you could run the query in the parent on the api.ci cluster and see if you can reproduce it

stevekuznetsov on 28 Nov 2018

How would I go about doing that (specific commands to run)?

Running out of inodes means that the entire filesystem in question is out of inodes. I would need to see the actual filesystem and container that's exhibiting this in order to investigate what's going on here. That would certainly include shell access to the container that's showing the problem, and perhaps on the node as well.

RobertKrawitz on 28 Nov 2018

The issue is that the cadvisor report, which is exposed by Prometheus, is not accurate. You should be able to run the linked prometheus query in the OP and see if any of the containers actually have inodes left?

stevekuznetsov on 28 Nov 2018

Reproduced on api.ci.openshift.org. Determined that all pods show zero available inodes; the kubelets (which are running in containers) are showing correct information.
Suspecting issue to be in vendor/github.com/google/cadvisor/fs/fs.go:getDMStats, which does not populate the inode field. Will need to investigate on a running cluster where I can watch at low level what's going on.

RobertKrawitz on 29 Nov 2018

@stevekuznetsov are we still running with the devicemapper storage driver on api.ci? if so, any particular reason

sjenning on 29 Nov 2018

by definition a container backed by devicemapper has no inodes free, the alert should be silenced if on a cluster configured with devicemapper. I think we should migrate to overlay2

derekwaynecarr on 30 Nov 2018

Very interesting, that would make sense of what we were seeing.

/cc @smarterclayton

stevekuznetsov on 30 Nov 2018

/close

Thanks @derekwaynecarr @sjenning @RobertKrawitz

stevekuznetsov on 30 Nov 2018

@stevekuznetsov: Closing this issue.

In response to this:

/close

Thanks @derekwaynecarr @sjenning @RobertKrawitz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot on 30 Nov 2018

wait, we should be on overlay on api.ci

smarterclayton on 30 Nov 2018

/reopen

stevekuznetsov on 30 Nov 2018

@stevekuznetsov: Reopening this issue.

In response to this:

/reopen

openshift-ci-robot on 30 Nov 2018

$ gcloud compute --project "openshift-ci-infra" ssh --zone "us-east1-c" origin-ci-ig-n-jxzw
Warning: Permanently added 'compute.7789015054248891544' (ECDSA) to the list of known hosts.
[skuznets@origin-ci-ig-n-jxzw ~]$ sudo docker info
Containers: 581
 Running: 22
 Paused: 0
 Stopped: 559
Images: 200
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true

We are on Overlay2

stevekuznetsov on 30 Nov 2018

Reviewing the data I collected from the api.ci cluster.

RobertKrawitz on 3 Dec 2018

Reproduced on 4.0 on AWS.

```/home/rkrawitz/go/src/github.com/openshift/installer/bin/openshift-install v0.5.0-master-7-g4ec4ca69c2b49b3c339e11241babb5cd84047b74
ostree 47.184
Installer 4ec4ca69c2b49b3c339e11241babb5cd84047b74

RobertKrawitz on 5 Dec 2018

Have a reproducible case that I can perturb with kubelet changes.

RobertKrawitz on 7 Dec 2018

❤1

Amazing!

stevekuznetsov on 7 Dec 2018

The source of the problem is that cadvisor doesn't support retrieving filesystem information (see vendor/github.com/google/cadvisor/container/common/fsHandler.go). So containers that are managed by cadvisor don't report inodes (or free disk space either). I've verified by perturbing the number of inodes reported in use (adding a large number to it); the number reported within prometheus tracks the perturbed value. I haven't yet traced exactly how that data bubbles up, but I'm quite confident that the lack of filesystem data within cadvisor is ultimately what results in no inode free data being reported in Prometheus.
Not all containers are managed by cadvisor; in particular, some system containers (static pods, by the look of it) have their data managed by vendor/k8s.io/kubernetes/pkg/volume/metrics_du.go, which does retrieve filesystem information. Those containers can retrieve free inodes and disk space.
I will investigate possibilities for addressing this. However, there's a broader issue, namely that reporting free inodes (or disk space) for a _container_ is meaningless; those (unlike inodes or disk space in use) are properties of filesystems, not of containers.

RobertKrawitz on 13 Dec 2018

If it's a meaningless thing maybe it should be removed from the metrics -- trying to alert on it would also be worthless.

stevekuznetsov on 17 Dec 2018

It depends upon what you're looking at. If you're looking at an individual _volume_ then the free metrics are useful. But they cannot be meaningfully aggregated for an entire container or pod.

RobertKrawitz on 17 Dec 2018

See https://github.com/openshift/origin/pull/21690

RobertKrawitz on 19 Dec 2018

BTW, note that as is you can determine the number of free inodes for the filesystems on each node with
container_fs_inodes_free{container_name=""}


container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.1.128:10250",job="kubelet",node="ip-10-0-1-128.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.11.53:10250",job="kubelet",node="ip-10-0-11-53.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.136.4:10250",job="kubelet",node="ip-10-0-136-4.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.141.162:10250",job="kubelet",node="ip-10-0-141-162.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.155.31:10250",job="kubelet",node="ip-10-0-155-31.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.28.147:10250",job="kubelet",node="ip-10-0-28-147.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.1.128:10250",job="kubelet",node="ip-10-0-1-128.us-west-1.compute.internal",service="kubelet"} | 62697945
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.11.53:10250",job="kubelet",node="ip-10-0-11-53.us-west-1.compute.internal",service="kubelet"} | 62698529
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.136.4:10250",job="kubelet",node="ip-10-0-136-4.us-west-1.compute.internal",service="kubelet"} | 8157043
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.141.162:10250",job="kubelet",node="ip-10-0-141-162.us-west-1.compute.internal",service="kubelet"} | 8141007
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.155.31:10250",job="kubelet",node="ip-10-0-155-31.us-west-1.compute.internal",service="kubelet"} | 8156287
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.28.147:10250",job="kubelet",node="ip-10-0-28-147.us-west-1.compute.internal",service="kubelet"} | 62696488
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.1.128:10250",job="kubelet",node="ip-10-0-1-128.us-west-1.compute.internal",service="kubelet"} | 1021563
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.11.53:10250",job="kubelet",node="ip-10-0-11-53.us-west-1.compute.internal",service="kubelet"} | 1021563
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.136.4:10250",job="kubelet",node="ip-10-0-136-4.us-west-1.compute.internal",service="kubelet"} | 1021564
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.141.162:10250",job="kubelet",node="ip-10-0-141-162.us-west-1.compute.internal",service="kubelet"} | 1021564
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.155.31:10250",job="kubelet",node="ip-10-0-155-31.us-west-1.compute.internal",service="kubelet"} | 1021564
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.28.147:10250",job="kubelet",node="ip-10-0-28-147.us-west-1.compute.internal",service="kubelet"} | 1021564

RobertKrawitz on 20 Dec 2018

If devicemapper is used as the storage backend, then free metrics for writable layers are meaningful, as they are independent filesystems.

RobertKrawitz on 8 Jan 2019

Issues go stale after 90d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 9 Apr 2019

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 9 May 2019

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 8 Jun 2019

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close