Node_exporter: Add disk UUIDs as labels.

Created on 15 Sep 2016  路  28Comments  路  Source: prometheus/node_exporter

In short:

Today:
node_disk_sectors_written{device="sdj"}

Suggestion:
node_disk_sectors_written{device="sdj", uuid="e7821b62-64a0-4f24-a19a-85ed74da0c14"}

Reason for this request is that we have external USB devices that we want monitored, but dashboards and so forth break when devices are occasionally mapped to new device-names. My understanding is that UUIDs are persistent, or maybe not?

enhancement requirfeedback

Most helpful comment

@raypettersen So here's some data we found on how to get UUID information from udev.

Get the udev info from /sys

$ cat  /sys/class/block/sda1/uevent 
MAJOR=8
MINOR=1
DEVNAME=sda1
DEVTYPE=partition

Then you can get the current udev data from /run.

$ cat  /run/udev/data/b8\:1 
S:disk/by-uuid/1196ae70-dca7-4c89-8ea7-52456bf23052
S:disk/by-id/wwn-0x5001b449ce83154a-part1
S:disk/by-id/ata-SanDisk_SD5SG2256G1052E_132119402826-part1
S:disk/by-path/pci-0000:00:1f.2-ata-1-part1
W:18
I:1643451
E:ID_ATA=1
E:ID_ATA_DOWNLOAD_MICROCODE=1
E:ID_ATA_FEATURE_SET_APM=1
E:ID_ATA_FEATURE_SET_APM_CURRENT_VALUE=128
E:ID_ATA_FEATURE_SET_APM_ENABLED=1
E:ID_ATA_FEATURE_SET_HPA=1
E:ID_ATA_FEATURE_SET_HPA_ENABLED=1
E:ID_ATA_FEATURE_SET_PM=1
E:ID_ATA_FEATURE_SET_PM_ENABLED=1
E:ID_ATA_FEATURE_SET_SECURITY=1
E:ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E:ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=18
E:ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=2
E:ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E:ID_ATA_FEATURE_SET_SMART=1
E:ID_ATA_FEATURE_SET_SMART_ENABLED=1
E:ID_ATA_ROTATION_RATE_RPM=0
E:ID_ATA_SATA=1
E:ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E:ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E:ID_ATA_WRITE_CACHE=1
E:ID_ATA_WRITE_CACHE_ENABLED=1
E:ID_BUS=ata
E:ID_MODEL=SanDisk_SD5SG2256G1052E
E:ID_MODEL_ENC=SanDisk\x20SD5SG2256G1052E\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E:ID_PART_TABLE_TYPE=dos
E:ID_PART_TABLE_UUID=00052f59
E:ID_PATH=pci-0000:00:1f.2-ata-1
E:ID_PATH_TAG=pci-0000_00_1f_2-ata-1
E:ID_REVISION=10.04.01
E:ID_SERIAL=SanDisk_SD5SG2256G1052E_132119402826
E:ID_SERIAL_SHORT=132119402826
E:ID_TYPE=disk
E:ID_WWN=0x5001b449ce83154a
E:ID_WWN_WITH_EXTENSION=0x5001b449ce83154a
E:ID_FS_UUID=1196ae70-dca7-4c89-8ea7-52456bf23052
E:ID_FS_UUID_ENC=1196ae70-dca7-4c89-8ea7-52456bf23052
E:ID_FS_VERSION=1.0
E:ID_FS_TYPE=ext2
E:ID_FS_USAGE=filesystem
E:ID_PART_ENTRY_SCHEME=dos
E:ID_PART_ENTRY_UUID=00052f59-01
E:ID_PART_ENTRY_TYPE=0x83
E:ID_PART_ENTRY_FLAGS=0x80
E:ID_PART_ENTRY_NUMBER=1
E:ID_PART_ENTRY_OFFSET=2048
E:ID_PART_ENTRY_SIZE=497664
E:ID_PART_ENTRY_DISK=8:0
G:systemd

All 28 comments

To answer my own question, UUIDs seem to be persistent as they are generated from device metadata. Ref: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/persistent_naming-uuid_and_others.html

:+1: I think this would be useful. The difficulty is getting the UUIDs based on the name of the device read from /proc/diskstats.

Since the UUIDs are generated from metadata, I guess its just a matter of figuring out how the generation is made and do the same?

Sorta, it's complicated. Many of the UUIDs come from reading the filesystem metadata, not generating them programatically.

For example:

# ls -l /dev/disk/by-uuid/ | grep sda1
lrwxrwxrwx 1 root root 10 Jul 24 08:29 1196ae70-dca7-4c89-8ea7-52456bf23052 -> ../../sda1
# tune2fs -l /dev/sda1 | grep UUID
Filesystem UUID:          1196ae70-dca7-4c89-8ea7-52456bf23052

Investing this further shows that you can infact manipulate the UUIDs. This would break any logic in node_exporter if it were to auto-generate them. I guess the easiest thing to do, would be to read /dev/disk/by-uuid, but that is not optimal. For example, my zfs array is not visible on my storage server. blkid however manages to fetch everything. Perhaps there is something in the blkid source worth looking at?

It wouldn't be appropriate to have both labels on the disk metrics, as each uniquely identifies a disk. This may be best something handled by textfile collector.

/dev/sdx does not uniquely identify a disk. If you for example plug-in random usb devices you're going to get burned if you're using node_exporter to monitor these disks. The devicelabel alone is not a good solution. UUIDs on the other hand, does in fact uniquely identify a disk.

That depends on entirely your use case.

From my point of view, I could debate that it makes more sense having the UUID as a label than the /dev/ name. It offers superior identification as it can even identify disks that are moved across servers/instances. I'm just sharing my suggestion that will make the node_exporter perform better in storage-environments. If its more hazzle than it's worth, then I'll just close this issue and do my own work-around.

I agree that device label is insufficient to uniquely identify devices. I'd prefer both device and UUID.

Device label is sufficient, and labels should be minimal. You either get UUID or device as a label.

Respectfully disagree that device label is sufficient. It's not a unique identifier for a drive or a partition.

You can't have two /dev/sda, thus it is unique. It uniquely identifies a controller. These are per-device stats, not per-partition stats.

@brian-brazil Sorry, but that's not how hardware works. Device label is an indication of where it is connected, and UUID is an indication of what is connected. We want both.

I am well aware of how hardware works. I consider the UUID to be an annotation, so it doesn't belong on these metrics. I'd expect the vast majority of our users not to care about UUIDs and device name order is pretty consistent these days particularly in cloud environments.

If you're looking for this then it should come as other annotations do, via another metric taking the machine roles approach.

And that's the difference, I don't think they're annotations. The exist as separate unique identifiying dimensions of the block device. "Where" and "Which".

Generally when this happens you choose one to avoid having more labels to work with, and have the other via the machine roles approach. Of the two I believe the device name is what most users would want.

@raypettersen So here's some data we found on how to get UUID information from udev.

Get the udev info from /sys

$ cat  /sys/class/block/sda1/uevent 
MAJOR=8
MINOR=1
DEVNAME=sda1
DEVTYPE=partition

Then you can get the current udev data from /run.

$ cat  /run/udev/data/b8\:1 
S:disk/by-uuid/1196ae70-dca7-4c89-8ea7-52456bf23052
S:disk/by-id/wwn-0x5001b449ce83154a-part1
S:disk/by-id/ata-SanDisk_SD5SG2256G1052E_132119402826-part1
S:disk/by-path/pci-0000:00:1f.2-ata-1-part1
W:18
I:1643451
E:ID_ATA=1
E:ID_ATA_DOWNLOAD_MICROCODE=1
E:ID_ATA_FEATURE_SET_APM=1
E:ID_ATA_FEATURE_SET_APM_CURRENT_VALUE=128
E:ID_ATA_FEATURE_SET_APM_ENABLED=1
E:ID_ATA_FEATURE_SET_HPA=1
E:ID_ATA_FEATURE_SET_HPA_ENABLED=1
E:ID_ATA_FEATURE_SET_PM=1
E:ID_ATA_FEATURE_SET_PM_ENABLED=1
E:ID_ATA_FEATURE_SET_SECURITY=1
E:ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E:ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=18
E:ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=2
E:ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E:ID_ATA_FEATURE_SET_SMART=1
E:ID_ATA_FEATURE_SET_SMART_ENABLED=1
E:ID_ATA_ROTATION_RATE_RPM=0
E:ID_ATA_SATA=1
E:ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E:ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E:ID_ATA_WRITE_CACHE=1
E:ID_ATA_WRITE_CACHE_ENABLED=1
E:ID_BUS=ata
E:ID_MODEL=SanDisk_SD5SG2256G1052E
E:ID_MODEL_ENC=SanDisk\x20SD5SG2256G1052E\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E:ID_PART_TABLE_TYPE=dos
E:ID_PART_TABLE_UUID=00052f59
E:ID_PATH=pci-0000:00:1f.2-ata-1
E:ID_PATH_TAG=pci-0000_00_1f_2-ata-1
E:ID_REVISION=10.04.01
E:ID_SERIAL=SanDisk_SD5SG2256G1052E_132119402826
E:ID_SERIAL_SHORT=132119402826
E:ID_TYPE=disk
E:ID_WWN=0x5001b449ce83154a
E:ID_WWN_WITH_EXTENSION=0x5001b449ce83154a
E:ID_FS_UUID=1196ae70-dca7-4c89-8ea7-52456bf23052
E:ID_FS_UUID_ENC=1196ae70-dca7-4c89-8ea7-52456bf23052
E:ID_FS_VERSION=1.0
E:ID_FS_TYPE=ext2
E:ID_FS_USAGE=filesystem
E:ID_PART_ENTRY_SCHEME=dos
E:ID_PART_ENTRY_UUID=00052f59-01
E:ID_PART_ENTRY_TYPE=0x83
E:ID_PART_ENTRY_FLAGS=0x80
E:ID_PART_ENTRY_NUMBER=1
E:ID_PART_ENTRY_OFFSET=2048
E:ID_PART_ENTRY_SIZE=497664
E:ID_PART_ENTRY_DISK=8:0
G:systemd

I completely agree with @raypettersen, uuid is the only way of identifying a volume or partition. The /dev point is actually irrelevant for metrics. One is monitoring a disk or partition, not a mounting point. Mounting points can change for various reasons.

I'd like to remind ye that these are per-block device stats we're talking about in this issue, not volumes, filesystems or mount points.

We could probably discuss this for hours. I do not agree with your logic Brian. Perhaps the solution is to create a new metric instead of messing with labels. I get you want labels to a minimum, but you should recognize that without a _true_ identification of disks and partitions - metrics is at risk for becoming corrupt. Let's say you're monitoring backup storage that is mounted each night, and throughput performance is what you're after along with a couple of alarms. Somehow the backup media is mounted with a new label and the data you're getting is false. This would never happen if we had a way of pinpointing the with the help of UUID. This is a real life scenario from where I'm coming from.

As SuperQ nailed it:

They exist as separate unique identifiying dimensions of the block device. "Where" and "Which".

That is, how I see it the key to this argument.

What @brian-brazil is suggesting is this is solved by having a metric that contains the UUID and device labels and use PromQL to join them.

node_disk_sectors_written * on (device) group_left (uuid) node_disk_info

I consider the block device UUID to be more important than the device name, and I think the device and UUID are separate metric dimensions that should always be included, but I understand where Brian is coming from.

The one plus side to the info metric is that basically everything in in the udev info can be included as labels, this allows for very flexible matching without having to include every possible info option in the source metric.

Sounds like a good solution/workaround.

I agree that technically UUID shouldn't be a label because the metric is about a device, not a volume. A UUID is also unbounded. Not sure if this has partical implications but I could imaging systems building/mounting images which always would cause a new timeseries to get created.

I think join as @SuperQ describes is the right way. But think it would be nice if we could provide node_disk_info in the collector instead of having a user manage that via the textfile collector.

If we can agree on this, I'd close the issue and create a new one for adding such metric.

No complaints from me.

Yes, let's make this a text file collector for now. It could possibly be triggered/managed by udev infrastructure.

Was this page helpful?
0 / 5 - 0 ratings