The disk plugin should export stats as it has done in the past
No disk stats are exported at all
With 4.19, Linux gained extra fields in /proc/diskstats, indicating r/w stats for Discard operations on SSDs. From linux/Documentation/iostats.txt:
Field 12 -- # of discards completed
This is the total number of discards completed successfully.
Field 13 -- # of discards merged
See the description of field 2
Field 14 -- # of sectors discarded
This is the total number of sectors discarded successfully.
Field 15 -- # of milliseconds spent discarding
This is the total number of milliseconds spent by all discards (as
measured from __make_request() to end_that_request_last()).
Collectd simply quits parsing if the number of fields is not exactly what it expects. I think that may be overly conservative, since the nuber of fields usually only gets added to, without rearrangement of existing fields.
I recommend making the checks for field numbers not X!=10 but X<10, that way, we don't lose stats in these cases, and can take time to implement parsing out extra stats if and when 4.19 becomes stable.
Hi @klausman, thanks for reporting this! Could you give #2952 a try and see if it fixes the issue?
Yes, that change seems to make disk stats work again. Thanks! I'll patch my local install until the change makes it into a release.
Correction: it is not fixed.
But now I think the breakage is in the udev name mangling. Specifically, after a few loops of reading, output_name would be, say sda4 on line 843, just before the libudev-specific block. Just before the continue in the _next_ libudev block (line 860), ouput_name is garbage, e.g. 脨X, indicating uninitialized memory. This name of course doesn't match the whitelist and this every disk gets skipped.
My provider for libudev is eudev 3.2.6 (https://github.com/gentoo/eudev). I'll try without having libudev enabled and see if that gives me metrics.
Yes, not using libudev makes the disks appear again. My C is unfortunately not good enough to tell whether the uninitialized memory is a bug in eudev or in collectd.
Could you run this with debugging enabled and look for a message like:?
disk plugin: renaming foo => bar
I see this, once per poll interval:
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sda => /dev/sda
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sda1 => /dev/sda1
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sda2 => /dev/sda2
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sda3 => /dev/sda3
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sda4 => /dev/sda4
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sda5 => /dev/sda5
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sdb => /dev/sdb
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sdb1 => /dev/sdb1
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sdb2 => /dev/sdb2
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sdb3 => /dev/sdb3
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sdb4 => /dev/sdb4
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming sdb5 => /dev/sdb5
2018 Oct 12 15:37:00 skade collectd(41043) disk plugin: renaming md0 => /dev/md0
However:
# curl -s localhost:9103|grep -i sd
#
It looks to me like the udev name mangling is working and output_name points to the same memory as alt_name. Just before the continue, alt_name is freed after which output_name is expected to point to "random" memory.
Can you check output_name just before ignorelist_match() is called?
Just before the continue in the next libudev block (line 860), ouput_name is garbage, e.g. 脨X, indicating uninitialized memory.
As @octo said, that is due to alt_name is already freed. But that happened after ignorelist was checked.
This name of course doesn't match the whitelist and this every disk gets skipped.
Name doesn't match due to renaming, I think you have Disk sdb in your config while renamed value is /dev/sdb, so you have to use Disk "/dev/sdb" in your config.
so you have to use
Disk "/dev/sdb/"in your config.
With a leading and trailing slash, this is interpreted as the dev/sdb regular expression, which may match more than you intended. /dev/sdb will match literally though.
2018 Oct 15 09:52:56 skade collectd(59813) Just before ignorelist_match(), output_name is '/dev/sda'
In my config I have (note that this is the default config as shipped, with the disk plugin config section uncommented for debugging):
<Plugin disk>
Disk "/^[hs]d[a-z]+[0-9]?$/"
IgnoreSelected false
UdevNameAttr "DEVNAME"
</Plugin>
So it seems the combination of the last line of that section (turning the device names int, e.g. sda, with the regex of the the first line results in nothing. Subtle. If I comment out the first (regex line), I get device names (prometheus metrics are easiest to check for me):
collectd_disk_disk_ops_read_total{disk="dev_sda",instance="skade"} 6253762 1539590200508
collectd_disk_disk_ops_read_total{disk="dev_sda1",instance="skade"} 72 1539590200508
collectd_disk_disk_ops_read_total{disk="dev_sda2",instance="skade"} 550 1539590200509
collectd_disk_disk_ops_read_total{disk="dev_sda3",instance="skade"} 1369542 1539590200509
collectd_disk_disk_ops_read_total{disk="dev_sda4",instance="skade"} 560857 1539590200509
collectd_disk_disk_ops_read_total{disk="dev_sda5",instance="skade"} 4322683 1539590200510
With only the udev line commented out, I get short names:
collectd_disk_disk_ops_read_total{disk="sda",instance="skade"} 6253762 1539590300181
collectd_disk_disk_ops_read_total{disk="sda1",instance="skade"} 72 1539590300182
collectd_disk_disk_ops_read_total{disk="sda2",instance="skade"} 550 1539590300182
collectd_disk_disk_ops_read_total{disk="sda3",instance="skade"} 1369542 1539590300182
collectd_disk_disk_ops_read_total{disk="sda4",instance="skade"} 560857 1539590300182
collectd_disk_disk_ops_read_total{disk="sda5",instance="skade"} 4322683 1539590300182
collectd_disk_disk_ops_read_total{disk="sdb",instance="skade"} 6175453 1539590300182
collectd_disk_disk_ops_read_total{disk="sdb1",instance="skade"} 72 1539590300183
collectd_disk_disk_ops_read_total{disk="sdb2",instance="skade"} 295 1539590300183
collectd_disk_disk_ops_read_total{disk="sdb3",instance="skade"} 1529108 1539590300183
collectd_disk_disk_ops_read_total{disk="sdb4",instance="skade"} 695123 1539590300183
collectd_disk_disk_ops_read_total{disk="sdb5",instance="skade"} 3950795 1539590300183
So this second part was operator error (me thinking I can just blindly uncomment the whole disk plugin section). Hope if someone stumbles across that, this comment will help.
As far as I am concerned, this issue can be closed now. Thanks, everybody!
With a leading and trailing slash,
Trailing slash is a typo, thanks ! ))
Disk "/^[hs]d[a-z]+[0-9]?$/"
Please note that this is a regex and is anchored at the beginning of the line. That means that the regex will match sda1. The udef name mangling will convert that to /dev/sda1 which does not match.
I'll take the thumbs up to mean that this issue is resolved. (If not, please re-open!) Thanks for the heads up regarding Linux 4.19, @klausman!