Prometheus-operator: Monitoring Kubernetes PersistentVolumes

Created on 19 Jul 2017 · 30Comments · Source: prometheus-operator/prometheus-operator

What did you expect to see?
Metrics for PersistentVolumes (disk usage, size, free, io? ..)

Environment

Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes cluster kind:
kops

Unsure if that's the right place to ask for that feature though... (should I do on prometheus/node_exporter ?)

It basically would be nice to monitor / alert if PVs are running out of storage..

Source

hartmut-pq

Most helpful comment

On Kubernetes v1.9.4-gke.1 there are metrics for persistent volumes available i.e.

curl -s localhost:10255/metrics | grep kafka-0
kubelet_volume_stats_available_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 1.27256612864e+11
kubelet_volume_stats_capacity_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 1.34208294912e+11
kubelet_volume_stats_inodes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 8.388608e+06
kubelet_volume_stats_inodes_free{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 8.388593e+06
kubelet_volume_stats_inodes_used{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 15
kubelet_volume_stats_used_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 6.2959616e+07

gabreal on 28 Mar 2018

👍24

All 30 comments

I haven't actually looked into this in terms of PVs, but there are a bunch of container_fs_* metrics that are exposed by cAdvisor, those seem like what you are looking for, but as far as I can tell it's hard to make a connection as to which concrete PV a particular Pod has mounted.

brancz on 20 Jul 2017

@brancz thx a lot for pointing into an initial direction, I may have a look as soon as I've got some spare minutes.
I guess as long as I can trigger a alert at all that's a good start - even if no concrete PV may be resolved... will keep this thread up to date!

hartmut-pq on 20 Jul 2017

Awesome! Thanks a lot for all your insight and contributions @hartmut-pq !

brancz on 20 Jul 2017

Hi @hartmut-pq and @brancz,

I've been strugling myself trying to find the same information, because our PVs get filled and I haven't found any way to detect and alarm about it.
Checking the container_fs_* metrics there's no info at all about the filesystems (PVs) mounted in the containers, and it's very weird (i give you an example at the end).
Also, checking the node_exporter metrics (node_filesystem_size for example) of the cluster nodes themselves there's no info because the PVs are not mounted by the nodes.

I think at the moment this is a lack of functionality in K8s 1.6 (of course not related with prometheus operator), and I have read that in 1.7 the plan was to provide PV related metrics directly by kubernetes controller-manager (and not by kubelet), but i'm not 100% sure.

But if any of you find the way to get info about the usage (total space, used space, free space) of the PVs it will be great, because at the moment I feel we are blind.

The only valid info I have found is direclty in Kubelet here:
http://IP_node:10255/stats/summary
There we can see the persistent volumes information, but that's not translated to metrics anywhere yet, as far as I have seen....

Example1 (via kubelet metrics):
POD with one container and the following mounted "disks" (df -h directly from a bash of the container):
overlay 154G 12G 136G 9% /
/dev/xvdbk 20G 11G 7.7G 59% /kafka (THIS IS THE PERSISTENT VOLUME)
/dev/xvda1 154G 7.2G 140G 5% /etc/hosts

container_fs_usage_bytes information about that pod/container has only:
container_fs_usage_bytes{container_name="kafka",device="/dev/xvda1",.....}
Which represents actually the physical disk of the k8s node, owning the container.
But there's noting else about /dev/xvdbk, and that's what I would be looking for.

Is there a way to configure kubelet to report that as well?

My conclusion is that we have the metrics about the physical disk of the k8s node repeated many times at many levels, but there's no info about persistent volumes (or the associated devices) anywhere :(

Thanks and sorry for my long explanation!

eedugon on 21 Jul 2017

👍1

@eedugon thanks for the extended insight, very valuable! It was my suspicion that we won't get PV metrics from cAdvisor.

My suggestion on where to go from here:

Start a discussion with the cAdvisor developers to see if there is a possibility to get metrics for these
Start a discussion on upstream Kubernetes on having total space, used space, and free space "metrics" as status fields of PVs
Look for metrics through the volume provider and then link the raw volume to the Kubernetes volume through kube-state-metrics for example

Let me know what you think.

brancz on 21 Jul 2017

Hi!

My view is that somehow both ways that you mention should be available, because:

From cAdvisor point of view there's a docker with a filesystem (PV) mounted on it, and that filesystem has to be "provided" by the node, so the node/cadvisor probably has the information available somewhere.
From kubernetes point of view there should be a way to keep track of the k8s resources, and the pv is a resource. Here I don't know what's the best option, if via the volume provider or directly connecting to the volumes.

But take a look at this, because maybe they are solving this in 1.7:
https://kubernetes.io/docs/concepts/cluster-administration/controller-metrics/
"Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and Openstack. These metrics can be used to monitor health of persistent volume operations"

Anyway I'm surprised cAdvisor is not giving the info from "container" point of view, so starting that discussion would be interesting in my opinion :)

eedugon on 21 Jul 2017

Agreed, cAdvisor should be exposing them either way. I actually reviewed the cloud provider metrics upstream, IIRC the metrics were only about API requests as well as attach/detach durations, rather than the state of the PV itself. I contacted the author regarding the cloudprovider metrics.

brancz on 21 Jul 2017

Hi, nice to see so much instant movement here!
Agree -> cAdvisor, too - metrics/monitoring for PVs on-prem (non-cloud) would be quite important for us, too...

hartmut-pq on 21 Jul 2017

If you want to open an issue or start the discussion with the cAdvisor devs, feel free to tag me in an issue, so I can follow up.

brancz on 21 Jul 2017

Yes, @brancz , i can do it, but it will be my first time.... should I raise the topic directly towards google/cadvisor project? Or is there any other cadvisor project closer to kubernetes/prometheus?

Regards!
Edu

eedugon on 21 Jul 2017

Yep, I'd suggest to just open an issue on the cadvisor repo and describe that you were expecting the volume metrics to show up. I'm sure they won't bite, and if you tag me I'll be able to comment as well if necessary ;)

brancz on 21 Jul 2017

pls tag me, too - or at least reference this issue and the new one - happy to help / contribute!

hartmut-pq on 21 Jul 2017

👍1

Done!
I have also included some extra information, and i'm suspecting now the problem might be in how node-exporter and cadvisor are configured, because the k8s node has all the info available (the disk is listed with lsblk and also the disk is mounted for the container), so it's very weird.

Let's see what they say.... for the moment I can check the usage of the persistent volumes in a very ugly way, which is doing ssh to the node and then df -h | grep "aws-ebs" (because in my example all persistent volumes are AWS EBS volumes):

For example this is how I get the info about the 6PVs I have:

$ ./check_space_pvs.sh
# Connecting to x.x.x.x
ip-172-20-38-36
/dev/xvdbg       20G   46M   19G   1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2a/vol-0bafa0be2b20de87a

# Connecting to x.x.x.x
ip-172-20-33-58
/dev/xvdbm       20G   46M   19G   1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2a/vol-017461b1e0f9d3bb4
/dev/xvdbr       20G   20G     0 100% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2a/vol-00413c4320b63fd31

# Connecting to x.x.x.x
ip-172-20-38-77

# Connecting to x.x.x.x
ip-172-20-34-13

# Connecting to x.x.x.x
ip-172-20-43-180

# Connecting to x.x.x.x
ip-172-20-47-213
/dev/xvdbh       20G   46M   19G   1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-04e5b51c65dc374b2
/dev/xvdbk       20G   20G     0 100% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0a6f49ef759d2af4b

# Connecting to x.x.x.x
ip-172-20-43-132
/dev/xvdbg       20G   20G     0 100% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0be6ec486bcea1d06

In the script i just find my aws instances with awscli and then ssh to each of them with a simple "sudo df -h | grep aws-ebs". But definitely that's not the way to go...

eedugon on 23 Jul 2017

👍2

For a slightly better hack, I'd suggest making use of the node-exporters file exporter functionality, in which you can just write your metrics that you gather in a file and the node-exporter adds them to it's metrics output.

brancz on 24 Jul 2017

Awesome, I didn't know that feature! very very interesting for integrating custom checks into metrics, thanks a lot!

eedugon on 24 Jul 2017

@brancz would you have a reference to documentation/example? Sounds quite interesting. That would need to run like as a cron directly on the node then?!?

hartmut-pq on 24 Jul 2017

I'd only suggest using it if there is no other way, as we already concluded above, best would be if cAdvisor would expose these metrics in the first place. A description on how to use the textfile exporter functionality can be found here: https://github.com/prometheus/node_exporter#textfile-collector

@hartmut-pq yes that would have to run in a cron like fashion.

brancz on 24 Jul 2017

As a side note, I spoke to the creator of the existing PV metrics (those that describe the operations on PVs) and there is also a proposal in the works for metrics of PVs themselves (currently targeted for 1.8).

brancz on 24 Jul 2017

FWIW a very high level metric has been merged into kube-state-metrics: https://github.com/kubernetes/kube-state-metrics/pull/179

brancz on 28 Jul 2017

Have you information on integration of V1.0.0 of kube-states-metrics on prometheus-operator?

sebglon on 11 Aug 2017

There are some very high level metrics landed in the v1.0.0 release: https://github.com/kubernetes/kube-state-metrics/pull/179 ... getting the v1.0.0 manifests into kube-prometheus is on our list, but haven't gotten to it yet.

brancz on 11 Aug 2017

As a side note, I spoke to the creator of the existing PV metrics (those that describe the operations on PVs) and there is also a proposal in the works for metrics of PVs themselves (currently targeted for 1.8).

Hi @brancz do you have any update if that's still on track / is there a github issue?

hartmut-pq on 11 Sep 2017

Looks like they're landing in 1.8 https://github.com/kubernetes/kubernetes/pull/51553

brancz on 13 Sep 2017

However as already mentioned the PR has some problems (particularly what Piotr pointed out), so I wouldn't count on them just yet.

brancz on 13 Sep 2017

thx for the update and -> PR !

hartmut-pq on 13 Sep 2017

IGNOREME: I am commenting here so I can easily find this issue again in the future, since I can't do that by subscribing alone.

rocketraman on 9 Nov 2017

@brancz is this feature rolled out with 1.8?

vivek-jain-mt on 12 Feb 2018

@vivek-jain I don't have a 1.8+ cluster handy that actually uses volumes. I believe this did land in 1.8, but I would have to check myself. Could you verify and report back here?

brancz on 12 Feb 2018

On Kubernetes v1.9.4-gke.1 there are metrics for persistent volumes available i.e.

curl -s localhost:10255/metrics | grep kafka-0
kubelet_volume_stats_available_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 1.27256612864e+11
kubelet_volume_stats_capacity_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 1.34208294912e+11
kubelet_volume_stats_inodes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 8.388608e+06
kubelet_volume_stats_inodes_free{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 8.388593e+06
kubelet_volume_stats_inodes_used{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 15
kubelet_volume_stats_used_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 6.2959616e+07

gabreal on 28 Mar 2018

👍24

Great! I think there is ultimately no issue in the Prometheus Operator present here, so I will close this issue, but feel free to keep discussing if anything comes up.

brancz on 3 Apr 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Exporter] How to use the blackbox_exporter with prometheus-operator?

galexrt · 81Comments

RBAC on GKE - extra step needed

gytisgreitai · 30Comments

node_exporter should use recording rules to publish node name

rocketraman · 29Comments

Support / Document Prometheus Remote Storage

wleese · 49Comments

Using bearerTokenSecret doesn't include the Bearer Token in the Authentication Header

ellen-lau · 32Comments