Zfs: zpool trim -w does not wait for trim to end

Created on 29 Apr 2020  路  13Comments  路  Source: openzfs/zfs

System information


Type | Version/Name
--- | ---
Distribution Name | Fedora
Distribution Version | 31
Linux Kernel | 5.5.17-200.fc31.x86_64
Architecture | x86_64
ZFS Version | 0.8.0-732_ga84c92f93 (master as of 21/4/2020)
SPL Version | 0.8.0-732_ga84c92f93

Describe the problem you're observing

I disabled autotrim and intend to run it at specified times. While testing the feature, I found that the -w flag does not work.

Describe how to reproduce the problem

# zpool set autotrim off tank
# time zpool trim -w tank
real    0m0.684s
user    0m0.348s
sys     0m0.070s
# echo $?
1
# zpool status -Pt
  pool: tank
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
config: 

        NAME                               STATE     READ WRITE CKSUM
        tank                               ONLINE       0     0     0
          mirror-0                         ONLINE       0     0     0
            /dev/disk/by-id/ata-XXX-part1  ONLINE       0     0     0  (trim unsupported)
            /dev/disk/by-id/ata-YYY-part1  ONLINE       0     0     0  (trim unsupported)
        special
          mirror-1                         ONLINE       0     0     0
            /dev/ssd480/special            ONLINE       0     0     0  (0% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
            /dev/nvme/special              ONLINE       0     0     0  (0% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
        logs
          mirror-2                         ONLINE       0     0     0
            /dev/ssd480/zil                ONLINE       0     0     0  (46% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
            /dev/nvme/zil                  ONLINE       0     0     0  (46% trimmed, started at Tue 28 Apr 2020 09:09:24 PM -03)
        cache
          /dev/nvme/cache                  ONLINE       0     0     0  (untrimmed)
          /dev/ssd480/cache                ONLINE       0     0     0  (untrimmed)

errors: No known data errors

If I insist and do the command again:

# time zpool trim -w tank
cannot trim '/dev/ssd480/special': currently trimming
cannot trim '/dev/nvme/special': currently trimming
cannot trim '/dev/ssd480/zil': currently trimming
cannot trim '/dev/nvme/zil': currently trimming
real    0m1.349s
user    0m0.354s
sys     0m0.070s

Include any warning/errors/backtraces from the system logs

No error anywhere I could see

Defect

Most helpful comment

All 13 comments

Tagging @jgallag88, per request

Does zpool trim _have_ a -w option?
$ zpool --version zfs-0.8.3-1ubuntu12 zfs-kmod-0.8.3-1ubuntu12 $ zpool trim missing pool name argument usage: trim [-d] [-r <rate>] [-c | -s] <pool> [<device> ...]

Thanks for filing this @dioni21. I suspect the issue that you are hitting is that the -w flag doesn't work when the pool has a mixture of devices that support trim and ones that don't:

delphix@ip-10-110-231-233:~$ sudo zpool create -f testpool xvdb /export/home/delphix/vdev_1
delphix@ip-10-110-231-233:~$ zpool status -t testpool
  pool: testpool
 state: ONLINE
  scan: none requested
config:

    NAME                           STATE     READ WRITE CKSUM
    testpool                       ONLINE       0     0     0
      xvdb                         ONLINE       0     0     0  (trim unsupported)
      /export/home/delphix/vdev_1  ONLINE       0     0     0  (untrimmed)

errors: No known data errors
delphix@ip-10-110-231-233:~$ time sudo zpool trim -w testpool

real    0m0.039s
user    0m0.014s
sys 0m0.005s
delphix@ip-10-110-231-233:~$ echo $?
255

The issue is here. After we do the ioctl to start the trim, we check the return value and if it is non-zero, we skip the wait. However, the return value can indicate errors for devices that don't support trim (EZFS_TRIM_NOTSUP). Such errors are later skipped over in the loop where the error messages are printed out, so they aren't really errors from the perspective of the overall zpool trim command (except that they sort of are, since they still cause a non-zero exit code).

The one difference between the example I've posted and the one you are hitting is the exit status of zpool wait (255 vs 1). I'm not sure what accounts for that difference; it looks like zpool_trim always returns -1 (reported as 255 by bash) in this scenario.

@dioni21 Do you by any chance have any aliases or other things in your shell environment that would have modified the exit status of zpool trim, accounting for the difference between my testing and yours?

Assuming that the problem I mentioned is the root cause of this issue, I think the best thing to do might be to ignore EZFS_TRIM_NOTSUP errors when deciding whether or not to wait.

However, I think that it's potentially confusing that zpool trim can return a non-zero status without printing anything to stderr when there are devices which don't support trimming (and the whole pool is being trimmed). @behlendorf, was this the intended behavior?

The intended behavior was to return an error when attempting to TRIM a specific vdev and it doesn't support TRIM. When attempting to TRIM an entire pool I agree it would make the most sense to not return an error as long as at least one of the vdevs supports the TRIM.

--Suggested workaround until this is fixed in code: bash wrapper script to call zfs trim, and [pseudocode] while $(zpool status -v |grep -c '% trimmed') -gt 0 ; sleep 10 until it disappears

sdate=$(date)
# do forever
while :; do
  echo "Pool: $1 - trim started: $sdate"
  zpool status -v $1 |grep  '% trimmed' || break 2
  sleep 10
  date

done

--Note this is untested, as I don't have any trim-supported ZFS devices atm

You can also use zpool wait -t trim after starting the trim. Regardless, I'll try to open a PR for both these issues shortly.

@dioni21 Do you by any chance have any aliases or other things in your shell environment that would have modified the exit status of zpool trim, accounting for the difference between my testing and yours?

No.

My setup is compiled from git, using --enable-asan --enable-debuginfo --enable-debug, but I think this does not affect exit behaviour.

@dioni21 Do you by any chance have any aliases or other things in your shell environment that would have modified the exit status of zpool trim, accounting for the difference between my testing and yours?

No.

My setup is compiled from git, using --enable-asan --enable-debuginfo --enable-debug, but I think this does not affect exit behaviour.

Strange. The only places I can see the return value of 1 potentially coming from are

  • from the call to zpool_trim_wait() (but I don't think that's ever getting called, because we currently skip the wait if any devices don't support trim)
  • from time. It's possible that different shells have implementation of time that work differently, but I'd expect them to generally pass the return value of the program through unmodified.

In any case, I've opened #10372 to address the issues that we do understand.

@jgallag88

Strange. The only places I can see the return value of 1 potentially coming from are

  • from the call to zpool_trim_wait() (but I don't think that's ever getting called, because we currently skip the wait if any devices don't support

I browsed the code to see if I could find anything, and indeed most returns of 1 are in case of memory allocation errors. I think I saw something in vdev treatment, but could not be sure. Note that my setup has also special, logs and cache devices, while your test setup has not.

  • from time. It's possible that different shells have implementation of time that work differently, but I'd expect them to generally pass the return value of the program through unmodified.

Thought this too, but the same same happens without time.

I've merged the fix for the original issue here. If there are other fixes needed lets tackle them in follow up PRs as needed.

Was this page helpful?
0 / 5 - 0 ratings