Please develop a collector of BTRFS stats. They look like this:
$ sudo btrfs dev stats /mnt/btrfs/
[/dev/mmcblk0p3].write_io_errs 0
[/dev/mmcblk0p3].read_io_errs 0
[/dev/mmcblk0p3].flush_io_errs 0
[/dev/mmcblk0p3].corruption_errs 0
[/dev/mmcblk0p3].generation_errs 0
[/dev/sda].write_io_errs 72430
[/dev/sda].read_io_errs 76151
[/dev/sda].flush_io_errs 61
[/dev/sda].corruption_errs 0
[/dev/sda].generation_errs 0
Documentation: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-device#DEVICE_STATS
Do you know where in /proc or /sys this information is available?
I previously looked into writing an exporter for btrfs but got sidetracked. The information exposed
in /sys/fs/btrfs/
currently (as of 4.18.12) not exposed in sysfs and would require ioctls.
I vaguely remember seeing some patches to expose them in sysfs but just checked again and it
seems they never made it upstream.
The easiest way to accomplish this - and everything else - will probably be to use python-btrfs to make a standalone btrfs_exporter in python instead of recreating all the kernel/userspace/ioctl mappings in golang.
I will raise the issue on the btrfs list and see if I can scrape together willing participants.
I see a few Go libraries for handling btrfs. They seem focused around sub volume management.
Just a reminder about implementation requirements here:
Just a reminder about implementation requirements here:
- No subprocesses.
- No extra privileges.
At least the dev stats/error information currently requires privileged access.
Reading device stats does not require root, the only part that does is the reset. https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/tree/fs/btrfs/ioctl.c?h=v4.18#n4692
Hi David,
Reading device stats does not require root, the only part that does is the reset.
..and yet it doesn't work for me on 4.18.12 with -progs v4.17.1. end of strace:
openat(AT_FDCWD, "/mnt/backup", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
fstat(4, {st_mode=S_IFDIR|0755, st_size=276, ...}) = 0
ioctl(4, BTRFS_IOC_FS_INFO, {max_id=1, num_devices=1, fsid=d163af2f-6e03-4972-bfd6-30c68b6ed312, nodesize=16384, sectorsize=4096, clone_alignment=4096}) = 0
ioctl(4, BTRFS_IOC_TREE_SEARCH, {key={tree_id=BTRFS_CHUNK_TREE_OBJECTID, min_objectid=BTRFS_ROOT_TREE_OBJECTID, max_objectid=BTRFS_ROOT_TREE_OBJECTID, min_offset=1, max_offset=UINT64_MAX, min_transid=0, max_transid=UINT64_MAX, min_type=BTRFS_DEV_ITEM_KEY, max_type=BTRFS_DEV_ITEM_KEY, nr_items=30}}) = -1 EPERM (Operation not permitted)
close(4) = 0
write(2, "ERROR: ", 7ERROR: ) = 7
write(2, "getting device info for /mnt/bac"..., 67getting device info for /mnt/backup failed: Operation not permitted) = 67
write(2, "\n", 1
Both btrfs_ioctl_tree_search{_v2} unconditionally check for CAP_SYS_ADMIN.
Something is calling the search tree ioctl that's not accessible. btrfs dev stats /mnt/path works for me here.
For the peanut gallery:
I tracked the privilege issue down to difference in behaviour of btrfs-progs
when querying a mountpoint vs. querying a device (and can now reproduce it reliably).
The mailing list thread has it all, but the tl;dr: is that getting stats
directly from a single device works without CAP_SYS_ADMIN, whereas querying the mount point
of a filesystem - even if it has only a single device - will fail without privileges.
However, invoking the ioctl programmatically on a mount point will work (here /tmp/test is loop0):
$python3.6
Python 3.6.6 (default, Oct 1 2018, 11:15:11)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import btrfs
>>> fs = btrfs.FileSystem("/tmp/test")
>>> btrfs.utils.pretty_print(fs.dev_stats(1))
devid 1 write_errs 0 read_errs 0 flush_errs 0 generation_errs 0 corruption_errs 0
So at least for accessing the device stats/errors programmatically no privilege is necessary
after all. Mystery solved!
It would be useful if it could expose metrics relating to the usage too:
data_ratio:
1.00
device_allocated:
21.02GiB
device_missing:
0.00B
device_size:
50.00GiB
device_unallocated:
28.98GiB
free_estimated:
33.76GiB
free_estimated_min:
19.27GiB
global_reserve:
88.55MiB
global_reserve_used:
0.00B
metadata_ratio:
2.00
used:
12.02GiB
+1 for @Andysimcoe btrfs metrics list.
Any idea when this might start to emerge?
Adding more procfs/sysfs parsing should be added to https://github.com/prometheus/procfs. Once we have parsing in the library, we can add it to the exporter.
After having a btrfs system crash because the metadata allocation ran full, I'd suggest also exposing the data from btrfs fi df /mnt/. Is that available through python-btrfs?
Update: this looks like what I'm looking for: https://github.com/knorrie/python-btrfs/blob/master/examples/btrfs-fi-df.py
I might look into creating a separate exporter for this.
Update on this:
The easiest way to accomplish this - and everything else - will probably be to use [python-btrfs]
(https://github.com/knorrie/python-btrfs) to make a standalone btrfs_exporter in python instead of
recreating all the kernel/userspace/ioctl mappings in golang.
I started a standalone btrfs_exporter based on the official python client and python-btrfs, but stopped because it was horrible to develop. I don't really know python and the official client library is not only super weird and fragile, it also leaks memory. It took me less than three days to get somewhere in C++ (using prometheus-cpp) and I already have more working than before, plus it is faster, doesn't leak memory and uses 1/10th the disk space/memory.
No timeline yet since I just got dragged into a surprise work contract.
@hhoffstaette Which is the official python client?
@hhoffstaette Which is the official python client?
See #1200, apparently some stats are available in procfs. Ideally somebody would add support for that to procfs and we could have a 'real' collector.
Most helpful comment
Update on this:
I started a standalone btrfs_exporter based on the official python client and python-btrfs, but stopped because it was horrible to develop. I don't really know python and the official client library is not only super weird and fragile, it also leaks memory. It took me less than three days to get somewhere in C++ (using prometheus-cpp) and I already have more working than before, plus it is faster, doesn't leak memory and uses 1/10th the disk space/memory.
No timeline yet since I just got dragged into a surprise work contract.