What did you expect to see?
Metrics for PersistentVolumes (disk usage, size, free, io? ..)
Environment
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}
Unsure if that's the right place to ask for that feature though... (should I do on prometheus/node_exporter ?)
It basically would be nice to monitor / alert if PVs are running out of storage..
I haven't actually looked into this in terms of PVs, but there are a bunch of container_fs_* metrics that are exposed by cAdvisor, those seem like what you are looking for, but as far as I can tell it's hard to make a connection as to which concrete PV a particular Pod has mounted.
@brancz thx a lot for pointing into an initial direction, I may have a look as soon as I've got some spare minutes.
I guess as long as I can trigger a alert at all that's a good start - even if no concrete PV may be resolved... will keep this thread up to date!
Awesome! Thanks a lot for all your insight and contributions @hartmut-pq !
Hi @hartmut-pq and @brancz,
I've been strugling myself trying to find the same information, because our PVs get filled and I haven't found any way to detect and alarm about it.
Checking the container_fs_* metrics there's no info at all about the filesystems (PVs) mounted in the containers, and it's very weird (i give you an example at the end).
Also, checking the node_exporter metrics (node_filesystem_size for example) of the cluster nodes themselves there's no info because the PVs are not mounted by the nodes.
I think at the moment this is a lack of functionality in K8s 1.6 (of course not related with prometheus operator), and I have read that in 1.7 the plan was to provide PV related metrics directly by kubernetes controller-manager (and not by kubelet), but i'm not 100% sure.
But if any of you find the way to get info about the usage (total space, used space, free space) of the PVs it will be great, because at the moment I feel we are blind.
The only valid info I have found is direclty in Kubelet here:
http://IP_node:10255/stats/summary
There we can see the persistent volumes information, but that's not translated to metrics anywhere yet, as far as I have seen....
Example1 (via kubelet metrics):
POD with one container and the following mounted "disks" (df -h directly from a bash of the container):
overlay 154G 12G 136G 9% /
/dev/xvdbk 20G 11G 7.7G 59% /kafka (THIS IS THE PERSISTENT VOLUME)
/dev/xvda1 154G 7.2G 140G 5% /etc/hosts
container_fs_usage_bytes information about that pod/container has only:
container_fs_usage_bytes{container_name="kafka",device="/dev/xvda1",.....}
Which represents actually the physical disk of the k8s node, owning the container.
But there's noting else about /dev/xvdbk, and that's what I would be looking for.
Is there a way to configure kubelet to report that as well?
My conclusion is that we have the metrics about the physical disk of the k8s node repeated many times at many levels, but there's no info about persistent volumes (or the associated devices) anywhere :(
Thanks and sorry for my long explanation!
@eedugon thanks for the extended insight, very valuable! It was my suspicion that we won't get PV metrics from cAdvisor.
My suggestion on where to go from here:
Let me know what you think.
Hi!
My view is that somehow both ways that you mention should be available, because:
But take a look at this, because maybe they are solving this in 1.7:
https://kubernetes.io/docs/concepts/cluster-administration/controller-metrics/
"Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and Openstack. These metrics can be used to monitor health of persistent volume operations"
Anyway I'm surprised cAdvisor is not giving the info from "container" point of view, so starting that discussion would be interesting in my opinion :)
Agreed, cAdvisor should be exposing them either way. I actually reviewed the cloud provider metrics upstream, IIRC the metrics were only about API requests as well as attach/detach durations, rather than the state of the PV itself. I contacted the author regarding the cloudprovider metrics.
Hi, nice to see so much instant movement here!
Agree -> cAdvisor, too - metrics/monitoring for PVs on-prem (non-cloud) would be quite important for us, too...
If you want to open an issue or start the discussion with the cAdvisor devs, feel free to tag me in an issue, so I can follow up.
Yes, @brancz , i can do it, but it will be my first time.... should I raise the topic directly towards google/cadvisor project? Or is there any other cadvisor project closer to kubernetes/prometheus?
Regards!
Edu
Yep, I'd suggest to just open an issue on the cadvisor repo and describe that you were expecting the volume metrics to show up. I'm sure they won't bite, and if you tag me I'll be able to comment as well if necessary ;)
pls tag me, too - or at least reference this issue and the new one - happy to help / contribute!
Done!
I have also included some extra information, and i'm suspecting now the problem might be in how node-exporter and cadvisor are configured, because the k8s node has all the info available (the disk is listed with lsblk and also the disk is mounted for the container), so it's very weird.
Let's see what they say.... for the moment I can check the usage of the persistent volumes in a very ugly way, which is doing ssh to the node and then df -h | grep "aws-ebs" (because in my example all persistent volumes are AWS EBS volumes):
For example this is how I get the info about the 6PVs I have:
$ ./check_space_pvs.sh
# Connecting to x.x.x.x
ip-172-20-38-36
/dev/xvdbg 20G 46M 19G 1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2a/vol-0bafa0be2b20de87a
# Connecting to x.x.x.x
ip-172-20-33-58
/dev/xvdbm 20G 46M 19G 1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2a/vol-017461b1e0f9d3bb4
/dev/xvdbr 20G 20G 0 100% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2a/vol-00413c4320b63fd31
# Connecting to x.x.x.x
ip-172-20-38-77
# Connecting to x.x.x.x
ip-172-20-34-13
# Connecting to x.x.x.x
ip-172-20-43-180
# Connecting to x.x.x.x
ip-172-20-47-213
/dev/xvdbh 20G 46M 19G 1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-04e5b51c65dc374b2
/dev/xvdbk 20G 20G 0 100% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0a6f49ef759d2af4b
# Connecting to x.x.x.x
ip-172-20-43-132
/dev/xvdbg 20G 20G 0 100% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-2b/vol-0be6ec486bcea1d06
In the script i just find my aws instances with awscli and then ssh to each of them with a simple "sudo df -h | grep aws-ebs". But definitely that's not the way to go...
For a slightly better hack, I'd suggest making use of the node-exporters file exporter functionality, in which you can just write your metrics that you gather in a file and the node-exporter adds them to it's metrics output.
Awesome, I didn't know that feature! very very interesting for integrating custom checks into metrics, thanks a lot!
@brancz would you have a reference to documentation/example? Sounds quite interesting. That would need to run like as a cron directly on the node then?!?
I'd only suggest using it if there is no other way, as we already concluded above, best would be if cAdvisor would expose these metrics in the first place. A description on how to use the textfile exporter functionality can be found here: https://github.com/prometheus/node_exporter#textfile-collector
@hartmut-pq yes that would have to run in a cron like fashion.
As a side note, I spoke to the creator of the existing PV metrics (those that describe the operations on PVs) and there is also a proposal in the works for metrics of PVs themselves (currently targeted for 1.8).
FWIW a very high level metric has been merged into kube-state-metrics: https://github.com/kubernetes/kube-state-metrics/pull/179
Have you information on integration of V1.0.0 of kube-states-metrics on prometheus-operator?
There are some very high level metrics landed in the v1.0.0 release: https://github.com/kubernetes/kube-state-metrics/pull/179 ... getting the v1.0.0 manifests into kube-prometheus is on our list, but haven't gotten to it yet.
As a side note, I spoke to the creator of the existing PV metrics (those that describe the operations on PVs) and there is also a proposal in the works for metrics of PVs themselves (currently targeted for 1.8).
Hi @brancz do you have any update if that's still on track / is there a github issue?
Looks like they're landing in 1.8 https://github.com/kubernetes/kubernetes/pull/51553
However as already mentioned the PR has some problems (particularly what Piotr pointed out), so I wouldn't count on them just yet.
thx for the update and -> PR !
IGNOREME: I am commenting here so I can easily find this issue again in the future, since I can't do that by subscribing alone.
@brancz is this feature rolled out with 1.8?
@vivek-jain I don't have a 1.8+ cluster handy that actually uses volumes. I believe this did land in 1.8, but I would have to check myself. Could you verify and report back here?
On Kubernetes v1.9.4-gke.1 there are metrics for persistent volumes available i.e.
curl -s localhost:10255/metrics | grep kafka-0
kubelet_volume_stats_available_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 1.27256612864e+11
kubelet_volume_stats_capacity_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 1.34208294912e+11
kubelet_volume_stats_inodes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 8.388608e+06
kubelet_volume_stats_inodes_free{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 8.388593e+06
kubelet_volume_stats_inodes_used{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 15
kubelet_volume_stats_used_bytes{namespace="default",persistentvolumeclaim="datadir-kafka-0"} 6.2959616e+07
Great! I think there is ultimately no issue in the Prometheus Operator present here, so I will close this issue, but feel free to keep discussing if anything comes up.
Most helpful comment
On Kubernetes v1.9.4-gke.1 there are metrics for persistent volumes available i.e.