Zfs: Publish extra zpool metrics to sysfs?

Created on 10 Jun 2019 · 13Comments · Source: openzfs/zfs

I'm working on collecting metrics from ZFS disk pools, and using them in a system like prometheus' node exporter; things like number of spares available and in use, number of disks currently being repaired, number of corrupted files, etc.

As far as I can tell, there's no standard way to retrieve these metrics and use them, apart from running shell commands and parsing their output.

An idea I had to solve this issue was to publish these metrics to sysfs and have node exporter read them. If such a feature does not exist, I would be happy to implement it into ZFS.

Stale

Source

dipack95

All 13 comments

sysfs really doesn't work well for this data. Most of the stats are currently
available via procfs and node_exporter can get them.

For the pool configuration information and stats, as seen via the zpool
command, they are gathered via an ioctl. I'm considering adding
https://github.com/richardelling/zpool_prometheus
to the ZFS repo under contrib, once it stabilizes a bit. Check it out and see
if that does what you want.

richardelling on 10 Jun 2019

Also, all the sysfs symbols are GPL'd

tonyhutter on 10 Jun 2019

Thanks everyone for your feedback!

@richardelling If I understand correctly, the repo you've linked essentially collects the metrics printed by zpool status, and prints them to STDOUT in a Prometheus friendly fashion?

dipack95 on 10 Jun 2019

In general, think of it as a zpool command replacement for TSDBs.
It collects the metrics you'll see in all zpool status and zpool iostat at a minimum.

NB, it is not feasible to fully screen scrape zpool status.

richardelling on 11 Jun 2019

I've found this repo that exposes a lot of metrics that I require, and then some.

Is it acceptable if I implement similar metric fetch calls and then publish them to /proc/spl/kstat/zfs, so that node_exporter can read them from there?

I think this method makes the most sense, rather than running a separate binary and relying on its output to collect metrics.

dipack95 on 11 Jun 2019

As I tried to explain in the zpool_prometheus readme, it is not suitable to put an ioctl
reader in generic collectors like node_exporter because they can block forever. So it
should remain an external, single purpose program that can hang without impacting
other things.

That said, if what you really want is a single program serving a prometheus-style
endpoint, then :

C is a terrible language for writing REST servers, but it is the language of core ZFS
python can easily do this, but the pyzfs included in the ZoL repo doesn't have everything
go can easily do this, but there is no go code in ZoL repo

Solving 2 & 3 adds technical debt as the C-to-python and C-to-go interfaces must now
be maintained in coordination with core C changes, while many of the core C devs aren't
also delivering python and go projects.

I'm open to working on this sort of project and I've scoped some of the work to do all of
the above. Meanwhile, does zpool_prometheus do what you want and can it be an
interim solution?

richardelling on 11 Jun 2019

@richardelling If I understand this correctly, the information that I need, needs to be collected using an ioctl call, which could potentially block forever, and thus it does not make sense for it to be included as part of the ZFS repo?

If that's the case, then I agree that a stand-alone binary approach that collects the metrics would be the better approach. The repo that I linked above already collects most of the metrics I need, and is written in Go, so I think I will use that in conjunction with node_exporter.

Thank you!

dipack95 on 11 Jun 2019

Cool. I've got some updates for node_exporter to collect more ZFS stats, I'll try
to send a PR soon. These will be similar to https://github.com/richardelling/telegraf/tree/zfs_linux_4/plugins/inputs/zfs
which include objset performance.

Similarly, cstor commit https://github.com/openebs/cstor/commit/4c1ad8131d1c7c38b2cd8e39b4901832427d1cc7
adds a zpool dump command to output the config as raw json. This is definitely not user friendly, but is simple to implement.

richardelling on 11 Jun 2019

👍1

@richardelling After a lot of deliberation, I've come to the conclusion that the approach that would make the most sense for me would be to extend pyzfs, and create a interface for libzfs in Python, similar to how libzfs_core has currently been implemented using CFFI.

Using the new Python interface for libzfs, I then plan to expose the metrics I require to a text file, and have node_exporter pick them up from there, with the script being run periodically (60s interval).

Obviously, I will create a PR to merge the Python iface for libzfs into pyzfs.

dipack95 on 18 Jun 2019

I'm still deliberating on how best to do this :-)

I think you have a good approach. If the daemon hangs, then it won't affect node_exporter.

Others can add info on pyzfs, but IIRC it was originally intended to be a libzfs_core consumer only.
Those are stable interfaces, whereas libzfs is not stable. Once you get it to work, let's discuss
what would be required to add a libzfs_core stable interface for those metrics.

richardelling on 18 Jun 2019

🚀1

@richardelling I've gotten quite far into creating an interface for libzfs in Python 3 using CFFI. I've also created a corresponding node exporter using the Prometheus client libs.

However, I've run into an issue whereby I cannot get the list of all the zpools + datasets using libzfs. Do you know of any way to do so?

dipack95 on 21 Jun 2019

it is difficult to do design in github issues, can you contact me directly at richard.[email protected]?

richardelling on 22 Jun 2019

👍1

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.