Zfs: scrub status: Inaccurate percents output

Created on 11 Apr 2019  路  11Comments  路  Source: openzfs/zfs

  scan: scrub in progress since Thu Apr 11 14:41:12 2019
    23,5G scanned out of 23,5G at 22,5M/s, (scan is slow, no estimated time)
    0B repaired, 100,09% done

100,09% done

zfs version 0.7.11-1 (Debian packages)

Defect

Most helpful comment

Thanks for the additional examples of this issue. @tcaputi was able to reproduce it and identify the root cause. This appears to be solely a reporting issue and the fix itself should be pretty straight forward.

I've reopened this issue and added it to the 0.8 milestone.

All 11 comments

Slightly exceeding 100% is possible if the pool changes in size during the scrub.

Slightly exceeding 100% is possible if the pool changes in size during the scrub.

No, the pool did not changed in size

I regularly see this when writing data to the pool after the scrub begins. Are you sure you didn鈥檛 write new data?

I regularly see this when writing data to the pool after the scrub begins. Are you sure you didn鈥檛 write new data?

I sure what I wrote new data into pool.

But here is no difference between change of pool size and write new data into pool?

@denizzzka stored data size on pool, not pool size.

袟邪薪褟褌芯械 屑械褋褌芯, 薪械 褉邪蟹屑械褉 锌褍谢邪.

One possible answer here would be to artificially cap the result to 99.99% (if we're showing two decimal places) if the scrub is in progress or 100.00% if it has completed. Of course, then you might get the same question with the raw values.

Perhaps it鈥檚 better to just reformulate text output?
Something like: "100,09% done of initial size"

When you can accurately predict the future, then you can accurately get to 100%.

When a scan is started, the amount of data to be scanned is stored in the scan stats struct. As the scan progresses, the amount of data already scanned is known. So the math is relatively simple. However, scans can take weeks or months and for busy systems it is perfectly ok that the data changes during the scan.

This becomes more complicated because scans are done per-dataset and temporally, so the scan stats might be very precise while scanning idle datasets, but less accurate for busy datasets.

It gets even more complicated because scans can be suspended and resumed, which further increased the time between knowing the amount of data to scan and the final tally.

In short, don鈥檛 worry about it, this is how it works.

I have encountered a similar issue on 0.8-rc4, though it's considerably more than 100%.
The scrub was paused, the pool was then exported and imported, and after restarting the scrub
it seemed to restart from the beginning.

~ > zpool status backup
  pool: backup
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: scrub in progress since Thu Apr 25 11:40:36 2019
    2.18T scanned at 112M/s, 525T issued at 26.3G/s, 2.92T total
    0B repaired, 17961.30% done, 7573 days 11:24:53 to go
remove: Removal of vdev 1 copied 384K in 0h0m, completed on Sun Apr 14 14:18:20 2019
    240 memory used for removed device mappings
config:

    NAME          STATE     READ WRITE CKSUM
    backup        ONLINE       0     0     0
      mirror-0    ONLINE       0     0     0
        backup2   ONLINE       0     0     0
        backup1   ONLINE       0     0     0

errors: No known data errors

According to IRC, this happened to multiple other users as well:

PMT | specifically: 11:47 < woffs> 0B repaired, 22817,39% done, 838 days 05:01:04 to go

and

jasonwc | 74.7T scanned at 1.24G/s, 75.7T issued at 1.25G/s, 74.7T total

and

scan: scrub in progress since Fri Apr 26 09:58:08 2019 ; 159G scanned at 2.78G/s, 218T issued at 3.82T/s, 1.39T total; 0B repaired, 15681.71% done, 50 days 19:50:22 to go

Maybe this comment should have gone to #7720

This was my full zpool status output right before completion. What's interesting is that the total amount issued is 1TiB greater than the amount scanned by the metadata scanner. It's also 1 TiB greater than the pool size as reported by zpool list. This was run on a build from MASTER, approximately a week or two older than rc4 (contains the #8453 fix). I haven't seen anything like this on my system with 0.7.12.

root@backup-server:~# zpool status
  pool: bigbackup
 state: ONLINE
  scan: scrub in progress since Tue Apr 16 00:00:02 2019
        74.7T scanned at 1.24G/s, 75.7T issued at 1.25G/s, 74.7T total
        0B repaired, 101.32% done, 158831 days 07:42:27 to go
config:

        NAME                                                  STATE     READ WRITE CKSUM
        bigbackup                                             ONLINE       0     0     0
          raidz2-0                                            ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_2YGVRXYD                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGRVHMN                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGS0N1N                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGTP5KN                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGTRHYN                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGU38NN                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGU8J6N                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGUDUNN                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGUG0ZN                ONLINE       0     0     0
            ata-WDC_WD100EMAZ-00WJTA0_JEGUH7PN                ONLINE       0     0     0
        special
          mirror-1                                            ONLINE       0     0     0
            ata-Samsung_SSD_840_Series_S14GNEACB80960K-part3  ONLINE       0     0     0
            ata-Crucial_CT256MX100SSD1_14370D32A086-part3     ONLINE       0     0     0

errors: No known data errors
root@backup-server:~# zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
bigbackup     90.9T  74.7T  16.1T        -         -     0%    82%  1.00x    ONLINE  -

Thanks for the additional examples of this issue. @tcaputi was able to reproduce it and identify the root cause. This appears to be solely a reporting issue and the fix itself should be pretty straight forward.

I've reopened this issue and added it to the 0.8 milestone.

Was this page helpful?
0 / 5 - 0 ratings