Node_exporter: Expose per dataset ZFS metrics

Created on 11 Feb 2020 · 17Comments · Source: prometheus/node_exporter

Host operating system: output of `uname -a`

Linux foo1.example.com 4.19.67 #1 SMP Thu Aug 22 16:06:16 EDT 2019 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

node_exporter, version 0.18.1 (branch: release-0.18, revision: 0037b4808adeaa041eb5dd699f7427714ca8b8c0)
build user: foo
build date: 20191030-22:24:38
go version: go1.13.3

node_exporter command line flags

/usr/sbin/node_exporter --collector.zfs

Are you running node_exporter in Docker?

What did you do that produced an error?

N/A

What did you expect to see?

We'd like to collect the per-dataset contents of /proc/spl/kstat/zfs/ZPOOL NAME/objset-*
E.g.:
root@foo:~$ cat /proc/spl/kstat/zfs/ZPOOL NAME/objset-0x4c3
49 1 0x01 7 1904 882869614188 7661358045725488
name type data
dataset_name 7 ZPOOL NAME/DATASET NAME
writes 4 162659962
nwritten 4 169357302418427
reads 4 19860562
nread 4 20787773826774
nunlinks 4 5326
nunlinked 4 5326

What did you see instead?

The contents of this file are not collected by node_exporter

accepted enhancement

Source

pmb311

👍9 ❤1

Most helpful comment

Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?

Actually they don't match because node_zfs_zpool_dataset_nread counts any read that happens on the dataset, including those served from ARC cache. On the other hand node_zfs_zpool_nread only counts reads that hit disk - including zpool scrubs, which don't show in any dataset.

gerardba on 17 Apr 2020

👍3

All 17 comments

Is that available as unprivileged user? Or does it require root permissions to read?

discordianfish on 25 Feb 2020

What version of ZFS is that? I'm not seeing it on my 0.7.5.

brian-brazil on 25 Feb 2020

I tried reading it from a non root user and it worked.
My ZFS version is : 0.8.3-1~bpo10+1

EDIT: Also, can I work on this issue?

Sudhar287 on 25 Feb 2020

I have some ideas about the metric and labels names to accomplish this task.

The existing structure to query metric in the IO file is:
node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1"}

My proposals are very similar. I have two alternatives, please highlight on which is better:

node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}

Pros:

Adhering to existing standards. Just including an extra label: dataset

Cons:

A bit confusing because some of the metric name for querying ZFS per pool metrics and ZFS per pool per dataset metrics are the same. Eg: nread header is there is both zfs per pool metrics and per pool per dataset metric.
- Or maybe maybe this is actually an advantage. When the dataset label is left blank, users can understand that they are querying the metrics per pool. When they specify the dataset, they know they are querying metics per pool per dataset.
All the metric names in the io file are not there in the objset file. For ex: There is wlentime in the io file but not in the objset file. Similarly, there is nunlinks in the objset but not in the io file.

node_zfs_zpool_dataset_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}

Pros:

Explicit dataset string in metric name will lead to no confusion.
Easier to query the prometheus metric names with dataset.

Cons:

Has dataset string in both metric and label. Is it necessary to be so explicit about it?

Sudhar287 on 2 Mar 2020

I like option 2 better, option 1 might break existing alerts / notifications / etc

But I would like to support this issue, since I was just looking for exactly that.

The metrics are available starting with ZFS 0.8

Thoro on 4 Mar 2020

Thanks for the response @Thoro.
Would also like to let you know that I've been programming the solution for quite some time. My teammate raised the issue and we would be using this feature even if its not finally included in this repo. :)

Sudhar287 on 4 Mar 2020

Hello moderators Brian and Johannes!
Please do give your feedback on the two alternatives proposed when possible. :)

I would also like to let you know that the second alternative didn't work as planned. I'm guessing its because of this. The number of elements appearing in the drop-down menu in the Prometheus UI was significantly lesser than expected. I think some of the data is being overwritten. I tried adding a unique job tag as suggested here, but that didn't work too. FYI, I'm using a similar code structure to this.

However, the query structure below worked like a charm: node_zfs_zpool_poolName_datasetName_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}
The above results in a plethora of metric names and dosent really seem very practical. How should I proceed?

Sudhar287 on 5 Mar 2020

FYI, I figured out what the problem was. Like it says in the docstring, the combination of metric name, label name, help string that I was using was getting overwritten and behaving erratically.
Credit goes to @mknapphrt for helping me debug this.

Sudhar287 on 10 Mar 2020

Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?
In that case we don't need both, so we could either drop node_zfs_zpool_nread or make node_zfs_zpool_nread not include the aggregate but only the per dataset metrics.

Does that make sense?

discordianfish on 17 Apr 2020

Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?

gerardba on 17 Apr 2020

👍3

Makes sense, then go with option 2

discordianfish on 20 Apr 2020

I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?

Edit: just realised it's probably "dataset" since it's dataset statistics..

thulle on 26 May 2020

While we're at it, it would be nice to make the new metric names follow Prometheus conventions.

SuperQ on 26 May 2020

I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?

Edit: just realised it's probably "dataset" since it's dataset statistics..

Using the example in my initial comment, It's 'ZPOOL NAME/DATASET NAME' .

pmb311 on 27 May 2020

@pmb311 yeah, I just wondered if the dataset name would end up in the metric name since the dataset name could be something like "Remote systems/backups/webservers/web0001/customerdata/customer0200/staticdata" for example.

thulle on 28 May 2020

This functionality was added in #1632. Perhaps this issue can be closed?