Zfs: Feature request: incremental scrub

Created on 7 Jan 2019 · 16Comments · Source: openzfs/zfs

Based on the comment https://github.com/zfsonlinux/zfs/issues/2526#issuecomment-58229535
It would be useful to have a flag to zfs scrub command that when present would scrub blocks written since the previous scrub finished.

So for example after I run zfs scrub -i poolname for the first time, zfs stores (where? volatile in memory? or persistent on disk, with a feature flag or as a bookmark?) the last txg scrubbed.

Then when I run it again, it reads the stored value from previous run and scans from that txg to the latest.

This should be in no way a replacement for regular scrubs but rather complementary for busy pools

Feature

Source

mailinglists35

👎1

Most helpful comment

you can simply read the files to verify their contents. the application doesn't have to do anything. ZFS will detect errors on read.

Unless the read is served from ARC or, on a redundant vdev, serviced from a good copy.

GregorKopka on 8 Jan 2019

👍2

All 16 comments

FYI, scrub does not stop at the txg when it was started. It stops at the latest txg for each dataset. Since scrubs are done by-dataset and not by-pool, each dataset can have a different completion time. Thus a UI for this could easily become unmanageable.

richardelling on 7 Jan 2019

As the option would boil down to 'scrub only data not yet scrubbed' this would be a single checkmark in an UI. No problems on that front.

GregorKopka on 7 Jan 2019

@kpande Should I have gotten it correctly the feature request stems from https://github.com/zfsonlinux/zfs/issues/2526 (Verify disks writes), thus the ability to 'continue' scrub to only process the new data could do that particular trick.
Certainly normal scrubs should still be performed to catch rot in old data.

@richardelling UI wouldn't be difficult at all, a single switch (eg. -i, as the OP mentioned) could let the scrub continue at the TXG it left off last time, tracked in a (possibly hidden) dataset property (defaulting to 0) - no problems with different completion times of different datasets then as each would track its own, tracking completion/continue TXG in a dataset property would also automatially give an interface to the user to modify the value should the need arise.

GregorKopka on 7 Jan 2019

👎1

what you are asking for seems a lot like the existing scrub pause functionality.

Perhaps you can adjust your workflow to just pause the scrub when it's almost done; then resume later on once more data has been written and pause again when the scrub is almost done and keep that going in perpetuity.

Having said that, and as kpande pointed out, scrubbing only new data doesn't seem like a good idea since you probably would want to be able to ensure that all data is scrubbed periodically, not just the new stuff

alek-p on 7 Jan 2019

if you only scrub whatever most recent data was added

when I filed this I meant it as complementary to the regular scrubs for large and busy pools, until there will be a read-after-write mechanism available (ie. delay the write acknowledge until read what was written and return error on write to the user if it's read a different thing than was written) tracked in the issue I linked

mailinglists35 on 8 Jan 2019

currently there is no quick way to verify that the hardware lied to you for recent data that you just wrote unless you scrub the entire pool from the beginning, even if you just finished a scrub yesterday, or do the verification on the application that wrote the data (which is unlikely to happen)

mailinglists35 on 8 Jan 2019

you can simply read the files to verify their contents. the application doesn't have to do anything. ZFS will detect errors on read.

Unless the read is served from ARC or, on a redundant vdev, serviced from a good copy.

GregorKopka on 8 Jan 2019

👍2

you can simply read the files to verify their contents.

this is impractical, for example you cannot explain to users why would they need to copy back files they just uploaded to a network share

mailinglists35 on 8 Jan 2019

FWIW, the SCSI protocol has an operation, VERIFY, that is intended to perform verification that the data written on medium matches the request. AFAIK, nobody uses this. Likely because it will be verrryyyyy slllllloooowwww ooww. From a ZFS perspective, in order to perform a verification that data is on medium, the write() to vdev will need to be replaced by verify(). If you cannot do this, then it is not possible to actually verify the data, because you'll be reading it back from the block device's write cache.

richardelling on 8 Jan 2019

@richardelling that feature is tracked separated in issue #2526
this request is just to quickly scrub new data written since last scrub and not wait until the next cron scrub job

mailinglists35 on 8 Jan 2019

The systemic issue is that quickly scrubbing does not, and cannot, guarantee data is on medium. Therefore it is not clear what benefit this feature brings.

richardelling on 8 Jan 2019

@mailinglists35 the opening comment for #2526 describes a generic "read after write" implementation, and does not specifically call for using SCSI verify(). Would you be ok with closing this bug in favour of #2526, since it essentially accomplishes what you're asking for here?

tonyhutter on 8 Jan 2019

@tonyhutter if I understand correctly, Brian talks in #2526 only about labels, not all data written since previous scrub, hence the idea in the original comment to perform a "tail scrub", which is different from #2526 in that it extends the read after write verification to all new data via this type of requested scrub

mailinglists35 on 10 Jan 2019

@mailinglists35 yes, that's correct. @tonyhutter and I spent some time kicking around ideas for how something like this could be implemented. The new two-phase scrub code could be adapted to perform this kind of tail-scrubbing, in some ways it wouldn't be that different from the resilver code. But there may more efficient and user friendly ways to achieve this functionality.

I don't have any objection to leaving this open to discuss possible solutions. As I understand it, the primary functionality being requested here is to have some (optional) mechanism which verifies that newly written data is in fact correct on-disk. With the understanding that this level of additional verification will have some performance impact (which we want to minimize).

behlendorf on 11 Jan 2019

If i may say something: Wouldn´t it be better to have some sort of always running background scrub with extremely low priority? It would read data all the time (e.g. at a rate of 1 MB/s or so) and start all over when finished.
That way new and old data is more or less continously checked.

ronnyegner on 14 Jan 2019

If i may say something: Wouldn´t it be better to have some sort of always running background scrub with extremely low priority? It would read data all the time (e.g. at a rate of 1 MB/s or so) and start all over when finished.

This is what SMART background scans do already. Having studied disk failures for many years, it is not clear to me that a continuous ZFS scrub is an effective use of resources. Do you have data to show frequent ZFS scrubs is effective?

richardelling on 14 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings