Type | Version/Name
--- | ---
Distribution Name | Fedora
Distribution Version | 31
Linux Kernel | 5.5.17-200.fc31.x86_64
Architecture | x86_64
ZFS Version | 0.8.0-732_ga84c92f93 (master as of 21/4/2020)
SPL Version | 0.8.0-732_ga84c92f93
I disabled autotrim and intend to run it at specified times. While testing the feature, I found that the -w flag does not work.
# zpool set autotrim off tank
# time zpool trim -w tank
real 0m0.684s
user 0m0.348s
sys 0m0.070s
# echo $?
1
# zpool status -Pt
pool: tank
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
/dev/disk/by-id/ata-XXX-part1 ONLINE 0 0 0 (trim unsupported)
/dev/disk/by-id/ata-YYY-part1 ONLINE 0 0 0 (trim unsupported)
special
mirror-1 ONLINE 0 0 0
/dev/ssd480/special ONLINE 0 0 0 (0% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
/dev/nvme/special ONLINE 0 0 0 (0% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
logs
mirror-2 ONLINE 0 0 0
/dev/ssd480/zil ONLINE 0 0 0 (46% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
/dev/nvme/zil ONLINE 0 0 0 (46% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
cache
/dev/nvme/cache ONLINE 0 0 0 (untrimmed)
/dev/ssd480/cache ONLINE 0 0 0 (untrimmed)
errors: No known data errors
If I insist and do the command again:
# time zpool trim -w tank
cannot trim '/dev/ssd480/special': currently trimming
cannot trim '/dev/nvme/special': currently trimming
cannot trim '/dev/ssd480/zil': currently trimming
cannot trim '/dev/nvme/zil': currently trimming
real 0m1.349s
user 0m0.354s
sys 0m0.070s
No error anywhere I could see
Tagging @jgallag88, per request
Does zpool trim _have_ a -w option?
$ zpool --version
zfs-0.8.3-1ubuntu12
zfs-kmod-0.8.3-1ubuntu12
$ zpool trim
missing pool name argument
usage:
trim [-d] [-r <rate>] [-c | -s] <pool> [<device> ...]
Thanks for filing this @dioni21. I suspect the issue that you are hitting is that the -w flag doesn't work when the pool has a mixture of devices that support trim and ones that don't:
delphix@ip-10-110-231-233:~$ sudo zpool create -f testpool xvdb /export/home/delphix/vdev_1
delphix@ip-10-110-231-233:~$ zpool status -t testpool
pool: testpool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
xvdb ONLINE 0 0 0 (trim unsupported)
/export/home/delphix/vdev_1 ONLINE 0 0 0 (untrimmed)
errors: No known data errors
delphix@ip-10-110-231-233:~$ time sudo zpool trim -w testpool
real 0m0.039s
user 0m0.014s
sys 0m0.005s
delphix@ip-10-110-231-233:~$ echo $?
255
The issue is here. After we do the ioctl to start the trim, we check the return value and if it is non-zero, we skip the wait. However, the return value can indicate errors for devices that don't support trim (EZFS_TRIM_NOTSUP). Such errors are later skipped over in the loop where the error messages are printed out, so they aren't really errors from the perspective of the overall zpool trim command (except that they sort of are, since they still cause a non-zero exit code).
The one difference between the example I've posted and the one you are hitting is the exit status of zpool wait (255 vs 1). I'm not sure what accounts for that difference; it looks like zpool_trim always returns -1 (reported as 255 by bash) in this scenario.
@dioni21 Do you by any chance have any aliases or other things in your shell environment that would have modified the exit status of zpool trim, accounting for the difference between my testing and yours?
Assuming that the problem I mentioned is the root cause of this issue, I think the best thing to do might be to ignore EZFS_TRIM_NOTSUP errors when deciding whether or not to wait.
However, I think that it's potentially confusing that zpool trim can return a non-zero status without printing anything to stderr when there are devices which don't support trimming (and the whole pool is being trimmed). @behlendorf, was this the intended behavior?
The intended behavior was to return an error when attempting to TRIM a specific vdev and it doesn't support TRIM. When attempting to TRIM an entire pool I agree it would make the most sense to not return an error as long as at least one of the vdevs supports the TRIM.
--Suggested workaround until this is fixed in code: bash wrapper script to call zfs trim, and [pseudocode] while $(zpool status -v |grep -c '% trimmed') -gt 0 ; sleep 10 until it disappears
sdate=$(date)
# do forever
while :; do
echo "Pool: $1 - trim started: $sdate"
zpool status -v $1 |grep '% trimmed' || break 2
sleep 10
date
done
--Note this is untested, as I don't have any trim-supported ZFS devices atm
You can also use zpool wait -t trim after starting the trim. Regardless, I'll try to open a PR for both these issues shortly.
@dioni21 Do you by any chance have any aliases or other things in your shell environment that would have modified the exit status of
zpool trim, accounting for the difference between my testing and yours?
No.
My setup is compiled from git, using --enable-asan --enable-debuginfo --enable-debug, but I think this does not affect exit behaviour.
@dioni21 Do you by any chance have any aliases or other things in your shell environment that would have modified the exit status of
zpool trim, accounting for the difference between my testing and yours?No.
My setup is compiled from git, using
--enable-asan --enable-debuginfo --enable-debug, but I think this does not affect exit behaviour.
Strange. The only places I can see the return value of 1 potentially coming from are
zpool_trim_wait() (but I don't think that's ever getting called, because we currently skip the wait if any devices don't support trim)time. It's possible that different shells have implementation of time that work differently, but I'd expect them to generally pass the return value of the program through unmodified.In any case, I've opened #10372 to address the issues that we do understand.
@jgallag88
Strange. The only places I can see the return value of
1potentially coming from are
- from the call to
zpool_trim_wait()(but I don't think that's ever getting called, because we currently skip the wait if any devices don't support
I browsed the code to see if I could find anything, and indeed most returns of 1 are in case of memory allocation errors. I think I saw something in vdev treatment, but could not be sure. Note that my setup has also special, logs and cache devices, while your test setup has not.
- from
time. It's possible that different shells have implementation oftimethat work differently, but I'd expect them to generally pass the return value of the program through unmodified.
Thought this too, but the same same happens without time.
I've merged the fix for the original issue here. If there are other fixes needed lets tackle them in follow up PRs as needed.
Most helpful comment
@dentad https://github.com/openzfs/zfs/pull/10071