Origin: inode usage metrics seem incorrect

Created on 12 Dec 2017  Â·  39Comments  Â·  Source: openshift/origin

cadvisor seems to expose metrics related to inode usage in containers (cc @stevekuznetsov) but from a look it seems that some of these metrics are incorrect for some containers.

For example, container_fs_inodes_free == 0 returns containers that have actually pretty low inode usage, eg.

container_fs_inodes_free{beta_kubernetes_io_arch="amd64",beta_kubernetes_io_instance_type="n1-standard-2",beta_kubernetes_io_os="linux",container_name="POD",device="/dev/sda1",failure_domain_beta_kubernetes_io_region="us-central1",failure_domain_beta_kubernetes_io_zone="us-central1-a",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod9abca0ce_c014_11e7_86d0_42010a800002.slice/docker-de9b645a537f1702cb32222dd1428a4af3c23a03f197032be2494cdd41b3b912.scope",image="openshift/origin-pod:v3.7.0-rc.0",instance="origin-ci-ig-m-11v4",job="kubernetes-cadvisor",kubernetes_io_hostname="origin-ci-ig-m-11v4",name="k8s_POD_registry-console-1-w927b_default_9abca0ce-c014-11e7-86d0-42010a800002_1",namespace="default",pod_name="registry-console-1-w927b",role="infra",subrole="master"}
$ oc exec -it registry-console-1-w927b -n default -- df -i
Filesystem       Inodes  IUsed    IFree IUse% Mounted on
overlay        78641672 173610 78468062    1% /
tmpfs            936810     18   936792    1% /dev
tmpfs            936810     16   936794    1% /sys/fs/cgroup
/dev/sda1      78641672 173610 78468062    1% /etc/hosts
shm              936810      1   936809    1% /dev/shm
tmpfs            936810     11   936799    1% /run/secrets/kubernetes.io/serviceaccount

This means we cannot reliably build alerts on top of these metrics. For example, we just realized one of our Jenkins masters is out of inodes for some days now.

@openshift/sig-pod

componenmetrics kinbug lifecyclrotten sipod ¯\_(ツ)¯

All 39 comments

/cc @derekwaynecarr @sjenning

@ingvagabund PTAL

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

/remove-lifecycle rotten
/cc @sjenning @ingvagabund

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/remove-lifecycle stale
/cc @sjenning @ingvagabund @openshift/sig-pod

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

/shrug

@RobertKrawitz could you take a look?

@kargakis is there a reproducible case (or a running container I can look at)?

Ask at forum-testplatform in Slack, this may not be an issue anymore.

On Wed, Nov 28, 2018, 19:01 Robert Krawitz <[email protected] wrote:

@kargakis https://github.com/kargakis is there a reproducible case (or
a running container I can look at)?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/issues/17732#issuecomment-442544820,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADuFfz0iaw9Xmp1X5eEDR_HJg7cQiS_Nks5uzs9UgaJpZM4Q_OxB
.

@RobertKrawitz you could run the query in the parent on the api.ci cluster and see if you can reproduce it

How would I go about doing that (specific commands to run)?

Running out of inodes means that the entire filesystem in question is out of inodes. I would need to see the actual filesystem and container that's exhibiting this in order to investigate what's going on here. That would certainly include shell access to the container that's showing the problem, and perhaps on the node as well.

The issue is that the cadvisor report, which is exposed by Prometheus, is not accurate. You should be able to run the linked prometheus query in the OP and see if any of the containers actually have inodes left?

Reproduced on api.ci.openshift.org. Determined that all pods show zero available inodes; the kubelets (which are running in containers) are showing correct information.
Suspecting issue to be in vendor/github.com/google/cadvisor/fs/fs.go:getDMStats, which does not populate the inode field. Will need to investigate on a running cluster where I can watch at low level what's going on.

@stevekuznetsov are we still running with the devicemapper storage driver on api.ci? if so, any particular reason

by definition a container backed by devicemapper has no inodes free, the alert should be silenced if on a cluster configured with devicemapper. I think we should migrate to overlay2

Very interesting, that would make sense of what we were seeing.

/cc @smarterclayton

/close

Thanks @derekwaynecarr @sjenning @RobertKrawitz

@stevekuznetsov: Closing this issue.

In response to this:

/close

Thanks @derekwaynecarr @sjenning @RobertKrawitz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wait, we should be on overlay on api.ci

/reopen

@stevekuznetsov: Reopening this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

$ gcloud compute --project "openshift-ci-infra" ssh --zone "us-east1-c" origin-ci-ig-n-jxzw
Warning: Permanently added 'compute.7789015054248891544' (ECDSA) to the list of known hosts.
[skuznets@origin-ci-ig-n-jxzw ~]$ sudo docker info
Containers: 581
 Running: 22
 Paused: 0
 Stopped: 559
Images: 200
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true

We are on Overlay2

Reviewing the data I collected from the api.ci cluster.

Reproduced on 4.0 on AWS.

```/home/rkrawitz/go/src/github.com/openshift/installer/bin/openshift-install v0.5.0-master-7-g4ec4ca69c2b49b3c339e11241babb5cd84047b74
ostree 47.184
Installer 4ec4ca69c2b49b3c339e11241babb5cd84047b74

Have a reproducible case that I can perturb with kubelet changes.

Amazing!

The source of the problem is that cadvisor doesn't support retrieving filesystem information (see vendor/github.com/google/cadvisor/container/common/fsHandler.go). So containers that are managed by cadvisor don't report inodes (or free disk space either). I've verified by perturbing the number of inodes reported in use (adding a large number to it); the number reported within prometheus tracks the perturbed value. I haven't yet traced exactly how that data bubbles up, but I'm quite confident that the lack of filesystem data within cadvisor is ultimately what results in no inode free data being reported in Prometheus.
Not all containers are managed by cadvisor; in particular, some system containers (static pods, by the look of it) have their data managed by vendor/k8s.io/kubernetes/pkg/volume/metrics_du.go, which does retrieve filesystem information. Those containers can retrieve free inodes and disk space.
I will investigate possibilities for addressing this. However, there's a broader issue, namely that reporting free inodes (or disk space) for a _container_ is meaningless; those (unlike inodes or disk space in use) are properties of filesystems, not of containers.

If it's a meaningless thing maybe it should be removed from the metrics -- trying to alert on it would also be worthless.

It depends upon what you're looking at. If you're looking at an individual _volume_ then the free metrics are useful. But they cannot be meaningfully aggregated for an entire container or pod.

BTW, note that as is you can determine the number of free inodes for the filesystems on each node with
container_fs_inodes_free{container_name=""}


container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.1.128:10250",job="kubelet",node="ip-10-0-1-128.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.11.53:10250",job="kubelet",node="ip-10-0-11-53.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.136.4:10250",job="kubelet",node="ip-10-0-136-4.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.141.162:10250",job="kubelet",node="ip-10-0-141-162.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.155.31:10250",job="kubelet",node="ip-10-0-155-31.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda1",endpoint="https-metrics",id="/",instance="10.0.28.147:10250",job="kubelet",node="ip-10-0-28-147.us-west-1.compute.internal",service="kubelet"} | 153273
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.1.128:10250",job="kubelet",node="ip-10-0-1-128.us-west-1.compute.internal",service="kubelet"} | 62697945
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.11.53:10250",job="kubelet",node="ip-10-0-11-53.us-west-1.compute.internal",service="kubelet"} | 62698529
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.136.4:10250",job="kubelet",node="ip-10-0-136-4.us-west-1.compute.internal",service="kubelet"} | 8157043
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.141.162:10250",job="kubelet",node="ip-10-0-141-162.us-west-1.compute.internal",service="kubelet"} | 8141007
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.155.31:10250",job="kubelet",node="ip-10-0-155-31.us-west-1.compute.internal",service="kubelet"} | 8156287
container_fs_inodes_free{device="/dev/xvda2",endpoint="https-metrics",id="/",instance="10.0.28.147:10250",job="kubelet",node="ip-10-0-28-147.us-west-1.compute.internal",service="kubelet"} | 62696488
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.1.128:10250",job="kubelet",node="ip-10-0-1-128.us-west-1.compute.internal",service="kubelet"} | 1021563
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.11.53:10250",job="kubelet",node="ip-10-0-11-53.us-west-1.compute.internal",service="kubelet"} | 1021563
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.136.4:10250",job="kubelet",node="ip-10-0-136-4.us-west-1.compute.internal",service="kubelet"} | 1021564
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.141.162:10250",job="kubelet",node="ip-10-0-141-162.us-west-1.compute.internal",service="kubelet"} | 1021564
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.155.31:10250",job="kubelet",node="ip-10-0-155-31.us-west-1.compute.internal",service="kubelet"} | 1021564
container_fs_inodes_free{device="tmpfs",endpoint="https-metrics",id="/",instance="10.0.28.147:10250",job="kubelet",node="ip-10-0-28-147.us-west-1.compute.internal",service="kubelet"} | 1021564

If devicemapper is used as the storage backend, then free metrics for writable layers are meaningful, as they are independent filesystems.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings