Based on the comment https://github.com/zfsonlinux/zfs/issues/2526#issuecomment-58229535
It would be useful to have a flag to zfs scrub command that when present would scrub blocks written since the previous scrub finished.
So for example after I run zfs scrub -i poolname for the first time, zfs stores (where? volatile in memory? or persistent on disk, with a feature flag or as a bookmark?) the last txg scrubbed.
Then when I run it again, it reads the stored value from previous run and scans from that txg to the latest.
This should be in no way a replacement for regular scrubs but rather complementary for busy pools
FYI, scrub does not stop at the txg when it was started. It stops at the latest txg for each dataset. Since scrubs are done by-dataset and not by-pool, each dataset can have a different completion time. Thus a UI for this could easily become unmanageable.
As the option would boil down to 'scrub only data not yet scrubbed' this would be a single checkmark in an UI. No problems on that front.
@kpande Should I have gotten it correctly the feature request stems from https://github.com/zfsonlinux/zfs/issues/2526 (Verify disks writes), thus the ability to 'continue' scrub to only process the new data could do that particular trick.
Certainly normal scrubs should still be performed to catch rot in old data.
@richardelling UI wouldn't be difficult at all, a single switch (eg. -i, as the OP mentioned) could let the scrub continue at the TXG it left off last time, tracked in a (possibly hidden) dataset property (defaulting to 0) - no problems with different completion times of different datasets then as each would track its own, tracking completion/continue TXG in a dataset property would also automatially give an interface to the user to modify the value should the need arise.
what you are asking for seems a lot like the existing scrub pause functionality.
Perhaps you can adjust your workflow to just pause the scrub when it's almost done; then resume later on once more data has been written and pause again when the scrub is almost done and keep that going in perpetuity.
Having said that, and as kpande pointed out, scrubbing only new data doesn't seem like a good idea since you probably would want to be able to ensure that all data is scrubbed periodically, not just the new stuff
if you only scrub whatever most recent data was added
when I filed this I meant it as complementary to the regular scrubs for large and busy pools, until there will be a read-after-write mechanism available (ie. delay the write acknowledge until read what was written and return error on write to the user if it's read a different thing than was written) tracked in the issue I linked
currently there is no quick way to verify that the hardware lied to you for recent data that you just wrote unless you scrub the entire pool from the beginning, even if you just finished a scrub yesterday, or do the verification on the application that wrote the data (which is unlikely to happen)
you can simply read the files to verify their contents. the application doesn't have to do anything. ZFS will detect errors on read.
Unless the read is served from ARC or, on a redundant vdev, serviced from a good copy.
you can simply read the files to verify their contents.
this is impractical, for example you cannot explain to users why would they need to copy back files they just uploaded to a network share
FWIW, the SCSI protocol has an operation, VERIFY, that is intended to perform verification that the data written on medium matches the request. AFAIK, nobody uses this. Likely because it will be verrryyyyy slllllloooowwww ooww. From a ZFS perspective, in order to perform a verification that data is on medium, the write() to vdev will need to be replaced by verify(). If you cannot do this, then it is not possible to actually verify the data, because you'll be reading it back from the block device's write cache.
@richardelling that feature is tracked separated in issue #2526
this request is just to quickly scrub new data written since last scrub and not wait until the next cron scrub job
The systemic issue is that quickly scrubbing does not, and cannot, guarantee data is on medium. Therefore it is not clear what benefit this feature brings.
@mailinglists35 the opening comment for #2526 describes a generic "read after write" implementation, and does not specifically call for using SCSI verify(). Would you be ok with closing this bug in favour of #2526, since it essentially accomplishes what you're asking for here?
@tonyhutter if I understand correctly, Brian talks in #2526 only about labels, not all data written since previous scrub, hence the idea in the original comment to perform a "tail scrub", which is different from #2526 in that it extends the read after write verification to all new data via this type of requested scrub
@mailinglists35 yes, that's correct. @tonyhutter and I spent some time kicking around ideas for how something like this could be implemented. The new two-phase scrub code could be adapted to perform this kind of tail-scrubbing, in some ways it wouldn't be that different from the resilver code. But there may more efficient and user friendly ways to achieve this functionality.
I don't have any objection to leaving this open to discuss possible solutions. As I understand it, the primary functionality being requested here is to have some (optional) mechanism which verifies that newly written data is in fact correct on-disk. With the understanding that this level of additional verification will have some performance impact (which we want to minimize).
If i may say something: Wouldn麓t it be better to have some sort of always running background scrub with extremely low priority? It would read data all the time (e.g. at a rate of 1 MB/s or so) and start all over when finished.
That way new and old data is more or less continously checked.
If i may say something: Wouldn麓t it be better to have some sort of always running background scrub with extremely low priority? It would read data all the time (e.g. at a rate of 1 MB/s or so) and start all over when finished.
This is what SMART background scans do already. Having studied disk failures for many years, it is not clear to me that a continuous ZFS scrub is an effective use of resources. Do you have data to show frequent ZFS scrubs is effective?
Most helpful comment
Unless the read is served from ARC or, on a redundant vdev, serviced from a good copy.