Zfs: Fix the ambiguous units in OpenZFS

Created on 10 Oct 2020  Ā·  6Comments  Ā·  Source: openzfs/zfs

The issue
I just saw a bug report that contained the output of a ā€˜zpool status’ command and realised that the units are actually oddly ambiguous for a filesystem. "2.84T scanned at 586M/s, 852G issued at 172M/s, 9.52T totalā€.

SI defines prefixes such as the T as tera (1012), the G as giga (109) and the M as mega (106) but that should be followed by the actual unit. That unit is missing here.

It’s obvious that OpenZFS is not reporting Megawatts or Terahertz. It is, however, not clear whether it actually means TB or TiB, GB or GiB and MB or MiB. Is ā€œ172M/sā€ 172 Megabyte or Megabit per second? If I have a file that is reported to be 0.98GiB in Nautilus, will that fit on a dataset that ā€˜zfs list’ reports as having ā€œ1Gā€ available? And a file that mac OS Finder reports as being 1GB?

What should the actual unit be here, MB (10002) and equivalents or MiB (10242) and equivalents? With the size of modern day data sets the 7% difference between the two is getting quite big in actual sizes and it’s important for a file system to report this correctly. People can currently come across unclear mismatches between GUIs and the CLI or between various operating systems.

The solution
Microsoft has made quite a mess with this on Windows by pairing 10242 values with 10002 units, showing everyone how not to do it (https://www.nextofwindows.com/gb-gigabyte-and-mb-megabyte-is-not-what-you-think-is-in-windows). Fortunately macOS solved the issue in 2007 (https://support.apple.com/en-us/HT201402), Debian in 2009 (https://wiki.debian.org/ConsistentUnitPrefixes), Ubuntu in 2010 (https://wiki.ubuntu.com/UnitsPolicy). Linux now has a man page dedicated to units (https://man7.org/linux/man-pages/man7/units.7.html)

Some higher level applications such as Nautilus, XigmaNAS, TrueNAS etc. have dealt with this by opting for IEC units and values, Finder in macOS has opted for SI units and values. Personally I don’t care if the displayed unit is MB or MiB as long as it’s the correct unit to go with the actual value.

Is it time to make this less ambiguous and replace T, G and M for the correct units throughout the OpenZFS code? Perhaps as a goal for v3.0?

Defect good first issue

Most helpful comment

This is one of the better request this month!
Yes, this is an actual issue everywhere on the internet, including OpenZFS and we should be more carefull both in the CLI, the docs AND the code itself.

All 6 comments

This is one of the better request this month!
Yes, this is an actual issue everywhere on the internet, including OpenZFS and we should be more carefull both in the CLI, the docs AND the code itself.

Thanks!

I think the ā€˜easiest’ route to a fix will be to see what the code actually does and use that as the ’source of truth’. Then make sure the output uses units that correctly reflect the values that the code produces and lastly change the docs to correctly reflect the code and the output.

I’m not a coder so please take the following analysis with some caution. I assume the code actually mainly deals in bytes and not groupings of bytes until it needs to report something to the user.

When I look in cmd/zpool/zpool_main.c I see the following code responsible for the progress output I referred to earlier:

if (pause == 0) {
        (void) printf(gettext("\t%s scanned at %s/s, "
            "%s issued at %s/s, %s total\n"),
            scanned_buf, srate_buf, issued_buf, irate_buf, total_buf);
    } else {
        (void) printf(gettext("\t%s scanned, %s issued, %s total\n"),
            scanned_buf, issued_buf, total_buf);
    }

Those *_buf values are ultimately formatted in zfs/lib/libzutil/zutil_nicenum.c:

{
    uint64_t n = num;
    int index = 0;
    const char *u;
    const char *units[3][7] = {
        [ZFS_NICENUM_1024] = {"", "K", "M", "G", "T", "P", "E"},
        [ZFS_NICENUM_BYTES] = {"B", "K", "M", "G", "T", "P", "E"},
        [ZFS_NICENUM_TIME] = {"ns", "us", "ms", "s", "?", "?", "?"}
    };

So that is where the (lack of) unit is set.

That also suggests to me that OpenZFS indeed uses Base2 and not Base10 for the values it displays. In that case the correct output units would be KiB, MiB, GiB, TiB, PiB and EiB.

What to do?
To fix the ambiguity OpenZFS would have to opt for adding either *B or *iB units to the prefix it currently displays.

The benefit of using *B is that it’s in line with how HDDs and SSDs are sold and what most novice users are used to. It would, however, require that the 1024 based code should be changed to 1000. That is a lot more involved, and risky, than changing the unit.

The benefit of using *iB is that it signals that this has been thought through. When you see a *B you never know whether it’s legacy (Microsoft) or fixed (Apple). Also, at a very low level a base2 unit such as *iB falls nicely along disk sector lines whereas *B does not. Lastly, using *iB would put OpenZFS in line with how Ubuntu, Debian, XigmaNAS, TrueNAS and many open source tools report it.

NB for any future rollout of this, many people have screen scrapers that parse the output of zpool status Any fix would need to be well communicated to the entire ZFS community. Also, there needs to be an easy way for an administrator to get the previous behaviour in case this change breaks their scrapers.

NB for any future rollout of this, many people have screen scrapers that parse the output of zpool status Any fix would need to be well communicated to the entire ZFS community.

I agree, that's why I think the change should be implemented soon but not go in production earlier than OpenZFS 3.0 so it has a year or so to bake and for people to discover if and how it impacts them.

I can imagine quite a few scrapers just parse until they reach a "T", "G" or "M" so changing those to "TB" or TiB" etc. may not make any difference. For some other scrapers it may well make a difference.

There may also be issues with column spacing in ZFS list and others if we add one or two characters after the prefix.

I have three questions at this stage:

1) What would happen if lib/libzutil/zutil_nicenum.c were to be changed as follows?

[ZFS_NICENUM_1000] = {"", "KBā€, "MBā€, "GBā€, "TBā€, "PBā€, "EBā€},
[ZFS_NICENUM_1024] = {"", "KiBā€, "MiBā€, "GiBā€, "TiBā€, "PiBā€, "EiBā€},

Would the addition of a ā€˜1000’ line just mean that, at a later stage elsewhere in the code, someone could use a base10 value and SI unit if they thought it was more appropriate in the specific use case they are working on?

Potential use cases for a ZFS_NICENUM_1000 would be code that needs to reflect the size that HDDs and SSDs are advertised as, or code that needs to interact with macOS where Finder uses base10 and SI units.

2) What is this line in lib/libzutil/zutil_nicenum.c for?

[ZFS_NICENUM_BYTES] = {"B", "K", "M", "G", "T", "P", "E"},

In libzutil.h it says:

 * ZFS_NICENUM_BYTES:   Print single bytes ("13B"), kilo, mega, tera...

However, either you print single bytes ("13B") or you print groupings of bytes ("13MB", 13 MiB"). Printing "single" bytes as mega does not make sense. Or am I missing something?

3) What does this code in zfs/include/libzutil.h do?

/*
 * Formats for iostat numbers.  Examples: "12K", "30ms", "4B", "2321234", "-".
 *
 * ZFS_NICENUM_1024:    Print kilo, mega, tera, peta, exa..
 * ZFS_NICENUM_BYTES:   Print single bytes ("13B"), kilo, mega, tera...
 * ZFS_NICENUM_TIME:    Print nanosecs, microsecs, millisecs, seconds...
 * ZFS_NICENUM_RAW: Print the raw number without any formatting
 * ZFS_NICENUM_RAWTIME: Same as RAW, but print dashes ('-') for zero.
 */
enum zfs_nicenum_format {
    ZFS_NICENUM_1024 = 0,
    ZFS_NICENUM_BYTES = 1,
    ZFS_NICENUM_TIME = 2,
    ZFS_NICENUM_RAW = 3,
    ZFS_NICENUM_RAWTIME = 4
};

I agree with @richardelling For this very reason, I would consider sufficient to simply clearly document it, without touching the reported values at all. And I really like ZFS to continue using Base2 units (as basically all others linux block device tools and filesystems) rather than Base10 units.

Was this page helpful?
0 / 5 - 0 ratings