Current master as of 67d3010 has problems on a kernel without CONFIG_XFS_QUOTA:
ERRO[0005] ERROR: xfs collector failed after 0.000211s: failed to retrieve XFS stats: incorrect number of values for XFS quota stats: 9 source="collector.go:132"
Not sure when exactly this happened but it works fine with the last release 0.18.1. The last commit to xfs_linux.go was d8e47a9 which (to me) doesn't seem to be the cause since it only added new unrelated metrics. Kernels without XFS quota support are rather common, so it's probably a good idea to catch this.
This probably broke it: https://github.com/prometheus/procfs/commit/0d6861afbc9f5dbff3877c51e42e843578d47d3e
@SuperQ We should definitely fix this before the next release.
@hhoffstaette can you share your /proc/fs/xfs/stat? There doesn't seem to be much documentation on the quota (qm) line.
@hhoffstaette can you share your
/proc/fs/xfs/stat? There doesn't seem to be much documentation on the quota (qm) line.
Sure:
$cat /proc/fs/xfs/stat
extent_alloc 171938 3485146 164127 3408129
abt 0 0 0 0
blk_map 40371558 6977480 529266 172696 297031 49232513 0
bmbt 0 0 0 0
dir 1360150 394508 381598 4388407
trans 593 3547247 0
ig 1262569 394431 0 868138 0 61794 775922
log 45377 1717324 1491 40824 9967
push_ail 3554222 0 4788823 277498 0 96922 8 417015 0 7222
xstrat 153418 0
rw 6722103 59805018
attr 1553479 3 679 6767
icluster 139407 55955 436245
vnodes 806344 0 0 0 325081 325081 325081 0
buf 13880262 43934 13836331 23338 14937 43931 0 122902 39893
abtb2 337628 4736190 88564 92185 0 0 399323 183894 7628 6878 21 27 21 27 42161828
abtc2 675928 9117153 250928 254550 0 0 13929 3587 5255 7792 17 24 17 24 98605085
bmbt2 24 171 11 1 0 0 0 0 0 0 0 0 0 0 393
ibt2 639803 7277206 0 7 0 0 0 0 0 0 0 0 0 0 277
fibt2 916788 7421473 207977 209337 6 7 2322 10 1290 706 13 21 19 28 49359764
rmapbt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
refcntbt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
qm 0 0 0 0 0 0 0 0 0
xpc 14172860416 42919401002 239118175922
debug 0
Not much to see except a bunch of nulls. ¯\_(ツ)_/¯
Sure
What kernel are you running? I was expecting when you set CONFIG_XFS_QUOTA to no it would only have a single field and this has 9. I will need to take a peek at the xfs code
What kernel are you running? I was expecting when you set
CONFIG_XFS_QUOTAto no it would only have a single field and this has 9. I will need to take a peek at the xfs code
Latest mainline with inhouse patches, but those are completely unrelated to XFS quota:
$cat /proc/version
Linux version 5.4.2 (root@ragnarok) (gcc version 9.2.0 (Gentoo 9.2.0-r2 p3)) #1 SMP Wed Dec 4 01:09:17 CET 2019
$grep CONFIG_XFS /etc/kernels/kernel-config-x86_64-5.4.2
CONFIG_XFS_FS=y
# CONFIG_XFS_QUOTA is not set
CONFIG_XFS_POSIX_ACL=y
# CONFIG_XFS_RT is not set
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_XFS_ONLINE_REPAIR=y
# CONFIG_XFS_WARN is not set
# CONFIG_XFS_DEBUG is not set
It's been like this forever, except for scrubbing which only recently landed.
It looks like back in 2012 commit https://github.com/torvalds/linux/commit/48776fd22344ad80adcbac0abc9c0da60c6481d2 to "use common code for quota statistics" which also introduced a stat called xs_qm_dquot_unused but due to an off by one that counter was never actually exposed so qm would only display 8 out of the 9 counters. In 2018 commit https://github.com/torvalds/linux/commit/26ca39015ef210d728df53d66c1ae85e8b48b2f3 to use offsetof() in place of offset macros for __xfsstats fixed the off by one which was included as part of kernel 4.20. Unfortunately, when I was working on this my references hosts ran 4.3 and 4.14
This should be fairly easy to deal with. I'll open a pull request soonish
I was able to reproduce this on RHEL 8 and also saw 9 fields instead of the expected 8. Looking at the kernel code there seem to only be 8 fields (https://github.com/torvalds/linux/blob/master/fs/xfs/xfs_stats.h) so I'm not sure where the extra one is coming from.
I created a new PR (https://github.com/prometheus/procfs/pull/245) to just ignore the extra fields if they exist to fix this for now.
Edit: oops, wrote this before reading comment from @skreuzer. Feel free to open a new PR in procfs if you want, and I can close mine.
I just opened a new PR (prometheus/procfs#249) which will include the xs_qm_dquot_unused counter if qm has 9 attributes.
I've integrated the procfs change, I think this should be fixed. Can anyone confirm?
I've integrated the procfs change, I think this should be fixed. Can anyone confirm?
Built & ran master, curled metrics, no errors and valid xfs metrics. :)
Nice, thanks for confirming.
Most helpful comment
It looks like back in 2012 commit https://github.com/torvalds/linux/commit/48776fd22344ad80adcbac0abc9c0da60c6481d2 to "use common code for quota statistics" which also introduced a stat called
xs_qm_dquot_unusedbut due to an off by one that counter was never actually exposed soqmwould only display 8 out of the 9 counters. In 2018 commit https://github.com/torvalds/linux/commit/26ca39015ef210d728df53d66c1ae85e8b48b2f3 to use offsetof() in place of offset macros for __xfsstats fixed the off by one which was included as part of kernel 4.20. Unfortunately, when I was working on this my references hosts ran 4.3 and 4.14This should be fairly easy to deal with. I'll open a pull request soonish