Node_exporter: btrfs stats collector

Created on 7 Oct 2018  路  16Comments  路  Source: prometheus/node_exporter

Enhancement

Please develop a collector of BTRFS stats. They look like this:

$ sudo btrfs dev stats /mnt/btrfs/
[/dev/mmcblk0p3].write_io_errs   0
[/dev/mmcblk0p3].read_io_errs    0
[/dev/mmcblk0p3].flush_io_errs   0
[/dev/mmcblk0p3].corruption_errs 0
[/dev/mmcblk0p3].generation_errs 0
[/dev/sda].write_io_errs   72430
[/dev/sda].read_io_errs    76151
[/dev/sda].flush_io_errs   61
[/dev/sda].corruption_errs 0
[/dev/sda].generation_errs 0

Documentation: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-device#DEVICE_STATS

accepted enhancement

Most helpful comment

Update on this:

The easiest way to accomplish this - and everything else - will probably be to use [python-btrfs]
(https://github.com/knorrie/python-btrfs) to make a standalone btrfs_exporter in python instead of
recreating all the kernel/userspace/ioctl mappings in golang.

I started a standalone btrfs_exporter based on the official python client and python-btrfs, but stopped because it was horrible to develop. I don't really know python and the official client library is not only super weird and fragile, it also leaks memory. It took me less than three days to get somewhere in C++ (using prometheus-cpp) and I already have more working than before, plus it is faster, doesn't leak memory and uses 1/10th the disk space/memory.
No timeline yet since I just got dragged into a surprise work contract.

All 16 comments

Do you know where in /proc or /sys this information is available?

I previously looked into writing an exporter for btrfs but got sidetracked. The information exposed
in /sys/fs/btrfs/ is quite exhaustive, but AFAICT the device stats/errors as shown above are
currently (as of 4.18.12) not exposed in sysfs and would require ioctls.
I vaguely remember seeing some patches to expose them in sysfs but just checked again and it
seems they never made it upstream.
The easiest way to accomplish this - and everything else - will probably be to use python-btrfs to make a standalone btrfs_exporter in python instead of recreating all the kernel/userspace/ioctl mappings in golang.
I will raise the issue on the btrfs list and see if I can scrape together willing participants.

I see a few Go libraries for handling btrfs. They seem focused around sub volume management.

Just a reminder about implementation requirements here:

  • No subprocesses.
  • No extra privileges.

Just a reminder about implementation requirements here:

  • No subprocesses.
  • No extra privileges.

At least the dev stats/error information currently requires privileged access.

Reading device stats does not require root, the only part that does is the reset. https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/tree/fs/btrfs/ioctl.c?h=v4.18#n4692

Hi David,

Reading device stats does not require root, the only part that does is the reset.

..and yet it doesn't work for me on 4.18.12 with -progs v4.17.1. end of strace:

openat(AT_FDCWD, "/mnt/backup", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=276, ...}) = 0
ioctl(4, BTRFS_IOC_FS_INFO, {max_id=1, num_devices=1, fsid=d163af2f-6e03-4972-bfd6-30c68b6ed312, nodesize=16384, sectorsize=4096, clone_alignment=4096}) = 0
ioctl(4, BTRFS_IOC_TREE_SEARCH, {key={tree_id=BTRFS_CHUNK_TREE_OBJECTID, min_objectid=BTRFS_ROOT_TREE_OBJECTID, max_objectid=BTRFS_ROOT_TREE_OBJECTID, min_offset=1, max_offset=UINT64_MAX, min_transid=0, max_transid=UINT64_MAX, min_type=BTRFS_DEV_ITEM_KEY, max_type=BTRFS_DEV_ITEM_KEY, nr_items=30}}) = -1 EPERM (Operation not permitted)
close(4)                                = 0
write(2, "ERROR: ", 7ERROR: )                  = 7
write(2, "getting device info for /mnt/bac"..., 67getting device info for /mnt/backup failed: Operation not permitted) = 67
write(2, "\n", 1

Both btrfs_ioctl_tree_search{_v2} unconditionally check for CAP_SYS_ADMIN.

Something is calling the search tree ioctl that's not accessible. btrfs dev stats /mnt/path works for me here.

For the peanut gallery:

I tracked the privilege issue down to difference in behaviour of btrfs-progs
when querying a mountpoint vs. querying a device (and can now reproduce it reliably).
The mailing list thread has it all, but the tl;dr: is that getting stats
directly from a single device works without CAP_SYS_ADMIN, whereas querying the mount point
of a filesystem - even if it has only a single device - will fail without privileges.

However, invoking the ioctl programmatically on a mount point will work (here /tmp/test is loop0):

$python3.6
Python 3.6.6 (default, Oct  1 2018, 11:15:11) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import btrfs
>>> fs = btrfs.FileSystem("/tmp/test")
>>> btrfs.utils.pretty_print(fs.dev_stats(1)) 
devid 1 write_errs 0 read_errs 0 flush_errs 0 generation_errs 0 corruption_errs 0

So at least for accessing the device stats/errors programmatically no privilege is necessary
after all. Mystery solved!

It would be useful if it could expose metrics relating to the usage too:
data_ratio:
1.00
device_allocated:
21.02GiB
device_missing:
0.00B
device_size:
50.00GiB
device_unallocated:
28.98GiB
free_estimated:
33.76GiB
free_estimated_min:
19.27GiB
global_reserve:
88.55MiB
global_reserve_used:
0.00B
metadata_ratio:
2.00
used:
12.02GiB

+1 for @Andysimcoe btrfs metrics list.

Any idea when this might start to emerge?

Adding more procfs/sysfs parsing should be added to https://github.com/prometheus/procfs. Once we have parsing in the library, we can add it to the exporter.

After having a btrfs system crash because the metadata allocation ran full, I'd suggest also exposing the data from btrfs fi df /mnt/. Is that available through python-btrfs?

Update: this looks like what I'm looking for: https://github.com/knorrie/python-btrfs/blob/master/examples/btrfs-fi-df.py

I might look into creating a separate exporter for this.

Update on this:

The easiest way to accomplish this - and everything else - will probably be to use [python-btrfs]
(https://github.com/knorrie/python-btrfs) to make a standalone btrfs_exporter in python instead of
recreating all the kernel/userspace/ioctl mappings in golang.

I started a standalone btrfs_exporter based on the official python client and python-btrfs, but stopped because it was horrible to develop. I don't really know python and the official client library is not only super weird and fragile, it also leaks memory. It took me less than three days to get somewhere in C++ (using prometheus-cpp) and I already have more working than before, plus it is faster, doesn't leak memory and uses 1/10th the disk space/memory.
No timeline yet since I just got dragged into a surprise work contract.

@hhoffstaette Which is the official python client?

@hhoffstaette Which is the official python client?

https://github.com/prometheus/client_python

See #1200, apparently some stats are available in procfs. Ideally somebody would add support for that to procfs and we could have a 'real' collector.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tmegow picture tmegow  路  5Comments

hryamzik picture hryamzik  路  5Comments

belm0 picture belm0  路  4Comments

lesovsky picture lesovsky  路  3Comments

cjroebuck picture cjroebuck  路  3Comments