Beats: Inconsistency of representation diskio from windows and linux

Created on 23 Mar 2017  路  22Comments  路  Source: elastic/beats

For confirmed bugs, please report:

  • Version: metricbeat 5.2.2
  • Operating System: windows server 2012 r2, ubuntu server 16.04

There is inconsistency between diskio Fields from linux and from windows machines. From linux hosts we get cumulative value, however from windows hosts we get "normal" value. It applies to

system.diskio.read.count
system.diskio.write.count
system.diskio.read.bytes
system.diskio.write.bytes
system.diskio.read.time
system.diskio.write.time
system.diskio.io.time
:Windows Metricbeat Integrations bug module

All 22 comments

Pinging @andrewkroh.

Yeah, it does look inconsistent.

On Windows the third party lib is using this WMI query to get the data: SELECT * FROM Win32_PerfFormattedData_PerfDisk_LogicalDisk

Then is just writes the averaged values into the fields that are supposed to be counters. https://github.com/shirou/gopsutil/blob/v2.0.0/disk/disk_windows.go#L144-L152

From Windows docs:

Classes derived from Win32_PerfRawData contain raw, or "uncooked" performance data, and are supported by the Performance Counter provider. In contrast, classes derived from Win32_PerfFormattedData contain "cooked", or formatted data...

Maybe we can find a table that has raw values. Anyone want to research this?

I also think these fields do not work right. I did small experiment, copied files to windows machine and this action was not reflected in elastic index.

The averaging from Windows is basically acting as a low-pass filter so those values are not very useful because you won't be able to observe high frequency signals in the data. Getting at the raw data (with relatively high sampling period) would solve this.

Can I hope to see fix of this? We have mixed infrastructure (linux + windows) and this bug preventing us from bits adoption.

Hi @andrewkroh. Now our perfmon metricset is merged what do you think of to use this as an internal API to collect these data. The next feature I would add to metricset is to get raw data values.

@maddin2016 If it's possible to retrieve the raw values through perfmon then I think that would be a great solution to this issue.

@andrewkroh, can you please add this to #3828 so we don't miss this?

I would not add it into #3828 as it is not really a cleanup to perfmon metricset but enhancing a feature of an other metricset. I think it is perfectly tracked here as this is the issue that will be closed, in case there will be a PR for it.

FYI: Another report for diskio problems on Windows due to the usage of WMI. https://discuss.elastic.co/t/system-diskio-io-time-0-on-windows/85508

What do you think of adding a helper which collect these counters through the perfmon api? Or is it possible to add a diskio_windows.go file?

It should be possible to have a separate implementation for windows by using build tags (i.e. diskio_windows.go).

Is this bug was fixed in 6.0?

@andrewkroh, i'm currenty implement a solution to retrieve the raw values with perfmon. Are there a any example how the linux and windows outputs differ?

There is a sample document in the diskio docs: https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-system-diskio.html#_fields_57

I don't have a sample from Windows, but the main difference that the values coming from WMI are averages rather than cumulative. On *nix we get an ever increasing counter (until rollover) that contains the total number for all time. So the differences is basically that on *nix we have counters and on Windows we have gauges (counter vs gauges).

@andrewkroh, i have started to implement diskio_windows to collect these counters with perfmon. See here. But I'm afraid that this can not solve the main problem between counters and gauges. With performance counters mostly you only get real time values. There are some counters which stores values since the system is booted but not for logical disk. So if we want cumulative values perfmon is not the right tool. I think we should use the DeviceIoControl function. Together with IOCTL_DISK_PERFORMANCE control code which gives us real counts for a disk

Good find. It does sounds like the DISK_PERFORMANCE data would be good for this.

Two questions.

  1. Should i keep the new created function PdhGetRawCounterArray?

  2. syscall has the functions DeviceIoControl and CreateFile. To get the local drive names we need a function like GetLogicalDriveStrings. This function is not implemented in syscall. So question is to implement all functions we need on our own or to use existing functions from sycall?

Should i keep the new created function PdhGetRawCounterArray?

Why would you remove it? Isn't it used by the perfmon metricset?

So question is to implement all functions we need on our own or to use existing functions from sycall?

Reuse what's already provided if possible. So only add a way to use GetLogicalDriveStrings.

PdhGetRawCounterArray and PdhGetRawCounterValue are not used by perfmon. We added these functions to offer an API if you want to calculate Raw values. But maybe in the future we can add an option for perfmon to use it?!

Any news on this issue?

It would be cool to investigate if other metrics like bellow have equivalent on windows.
system.diskio.iostat.read.request.merges_per_sec
system.diskio.iostat.request.avg_size

Was this page helpful?
0 / 5 - 0 ratings