uname -aLinux foo1.example.com 4.19.67 #1 SMP Thu Aug 22 16:06:16 EDT 2019 x86_64 GNU/Linux
node_exporter --versionnode_exporter, version 0.18.1 (branch: release-0.18, revision: 0037b4808adeaa041eb5dd699f7427714ca8b8c0)
build user: foo
build date: 20191030-22:24:38
go version: go1.13.3
/usr/sbin/node_exporter --collector.zfs
No
N/A
We'd like to collect the per-dataset contents of /proc/spl/kstat/zfs/ZPOOL NAME/objset-*
E.g.:
root@foo:~$ cat /proc/spl/kstat/zfs/ZPOOL NAME/objset-0x4c3
49 1 0x01 7 1904 882869614188 7661358045725488
name type data
dataset_name 7 ZPOOL NAME/DATASET NAME
writes 4 162659962
nwritten 4 169357302418427
reads 4 19860562
nread 4 20787773826774
nunlinks 4 5326
nunlinked 4 5326
The contents of this file are not collected by node_exporter
Is that available as unprivileged user? Or does it require root permissions to read?
What version of ZFS is that? I'm not seeing it on my 0.7.5.
I tried reading it from a non root user and it worked.
My ZFS version is : 0.8.3-1~bpo10+1
EDIT: Also, can I work on this issue?
I have some ideas about the metric and labels names to accomplish this task.
The existing structure to query metric in the IO file is:
node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1"}
My proposals are very similar. I have two alternatives, please highlight on which is better:
node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=鈥漝atasetName鈥潁
Pros:
datasetCons:
A bit confusing because some of the metric name for querying ZFS per pool metrics and ZFS per pool per dataset metrics are the same. Eg: nread header is there is both zfs per pool metrics and per pool per dataset metric.
dataset label is left blank, users can understand that they are querying the metrics per pool. When they specify the dataset, they know they are querying metics per pool per dataset. All the metric names in the io file are not there in the objset file. For ex: There is wlentime in the io file but not in the objset file. Similarly, there is nunlinks in the objset but not in the io file.
node_zfs_zpool_dataset_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=鈥漝atasetName鈥潁
Pros:
dataset string in metric name will lead to no confusion.dataset.Cons:
dataset string in both metric and label. Is it necessary to be so explicit about it?I like option 2 better, option 1 might break existing alerts / notifications / etc
But I would like to support this issue, since I was just looking for exactly that.
The metrics are available starting with ZFS 0.8
Thanks for the response @Thoro.
Would also like to let you know that I've been programming the solution for quite some time. My teammate raised the issue and we would be using this feature even if its not finally included in this repo. :)
Hello moderators Brian and Johannes!
Please do give your feedback on the two alternatives proposed when possible. :)
I would also like to let you know that the second alternative didn't work as planned. I'm guessing its because of this. The number of elements appearing in the drop-down menu in the Prometheus UI was significantly lesser than expected. I think some of the data is being overwritten. I tried adding a unique job tag as suggested here, but that didn't work too. FYI, I'm using a similar code structure to this.
However, the query structure below worked like a charm: node_zfs_zpool_poolName_datasetName_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=鈥漝atasetName鈥潁
The above results in a plethora of metric names and dosent really seem very practical. How should I proceed?
FYI, I figured out what the problem was. Like it says in the docstring, the combination of metric name, label name, help string that I was using was getting overwritten and behaving erratically.
Credit goes to @mknapphrt for helping me debug this.
Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?
In that case we don't need both, so we could either drop node_zfs_zpool_nread or make node_zfs_zpool_nread not include the aggregate but only the per dataset metrics.
Does that make sense?
Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?
Actually they don't match because node_zfs_zpool_dataset_nread counts any read that happens on the dataset, including those served from ARC cache. On the other hand node_zfs_zpool_nread only counts reads that hit disk - including zpool scrubs, which don't show in any dataset.
Makes sense, then go with option 2
I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?
Edit: just realised it's probably "dataset" since it's dataset statistics..
While we're at it, it would be nice to make the new metric names follow Prometheus conventions.
I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?
Edit: just realised it's probably "dataset" since it's dataset statistics..
Using the example in my initial comment, It's 'ZPOOL NAME/DATASET NAME' .
@pmb311 yeah, I just wondered if the dataset name would end up in the metric name since the dataset name could be something like "Remote systems/backups/webservers/web0001/customerdata/customer0200/staticdata" for example.
This functionality was added in #1632. Perhaps this issue can be closed?
Yes, this feature has been added! @aqw
Most helpful comment
Actually they don't match because node_zfs_zpool_dataset_nread counts any read that happens on the dataset, including those served from ARC cache. On the other hand node_zfs_zpool_nread only counts reads that hit disk - including zpool scrubs, which don't show in any dataset.