Zfs: Raw send on encrypted datasets does not work when copying snapshots back

Created on 1 Jul 2020  路  7Comments  路  Source: openzfs/zfs

Describe the problem you're observing

$ sudo zfs create -o encryption=on -o keyformat=passphrase testpool/source
Enter passphrase:
Re-enter passphrase:
$ sudo zfs snap testpool/source@base
$ sudo zfs snap testpool/source@s1
$ sudo zfs umount testpool/source
$ sudo zfs send -w testpool/source@base | sudo zfs receive testpool/target
$ sudo zfs send -w -i @base testpool/source@s1 | sudo zfs receive testpool/target
$ sudo zfs destroy testpool/source@s1
$ sudo zfs send -w -i @base testpool/target@s1 | sudo zfs receive testpool/source
$ sudo zfs mount testpool/source
filesystem 'testpool/source' can not be mounted: Input/output error
cannot mount 'testpool/source': Invalid argument
$ 

With --enable-debug the example already crashes at the zfs receive testpool/source. This produces the following kernel log:

[ 2516.835681] VERIFY3(0 == os->os_flags & OBJSET_FLAG_USERACCOUNTING_COMPLETE) failed (0 == 1)
[ 2516.835819] PANIC at dsl_crypt.c:2019:dsl_crypto_recv_raw_objset_check()
[ 2516.835889] Showing stack for process 305588
[ 2516.835897] CPU: 0 PID: 305588 Comm: zfs Tainted: P           OE     5.4.0-33-generic #37-Ubuntu
[ 2516.835898] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 2516.835899] Call Trace:
[ 2516.835980]  dump_stack+0x6d/0x9a
[ 2516.836043]  spl_dumpstack+0x29/0x2b [spl]
[ 2516.836049]  spl_panic+0xd4/0xfc [spl]
[ 2516.836425]  ? dnode_rele_and_unlock+0x6c/0x150 [zfs]
[ 2516.836468]  ? dmu_buf_get_user+0x17/0x30 [zfs]
[ 2516.836474]  ? tsd_hash_search.isra.0+0x71/0xa0 [spl]
[ 2516.836480]  ? tsd_get+0x49/0x90 [spl]
[ 2516.836535]  ? rrn_find+0x29/0x50 [zfs]
[ 2516.836592]  ? rrw_held+0x79/0xe0 [zfs]
[ 2516.836602]  ? _cond_resched+0x19/0x30
[ 2516.836605]  ? mutex_lock+0x13/0x40
[ 2516.836650]  ? dmu_objset_from_ds+0xbb/0x280 [zfs]
[ 2516.836699]  dsl_crypto_recv_key_check+0x5b4/0x6e0 [zfs]
[ 2516.836755]  ? zfs_refcount_add_many+0xab/0x110 [zfs]
[ 2516.836810]  dsl_sync_task_common+0x13d/0x2a0 [zfs]
[ 2516.836859]  ? dsl_crypto_recv_raw_key_check+0x430/0x430 [zfs]
[ 2516.836907]  ? dsl_crypto_recv_raw_key_sync+0x440/0x440 [zfs]
[ 2516.836956]  ? dsl_crypto_recv_raw_key_check+0x430/0x430 [zfs]
[ 2516.837004]  ? dsl_crypto_recv_raw_key_sync+0x440/0x440 [zfs]
[ 2516.837059]  dsl_sync_task+0x1a/0x20 [zfs]
[ 2516.837108]  dsl_crypto_recv_raw+0x4f/0x70 [zfs]
[ 2516.837154]  dmu_recv_stream+0x22d/0x11c0 [zfs]
[ 2516.837208]  ? dsl_sync_task+0x1a/0x20 [zfs]
[ 2516.837271]  zfs_ioc_recv_impl+0x90e/0x10e0 [zfs]
[ 2516.837337]  zfs_ioc_recv_new+0x2ab/0x330 [zfs]
[ 2516.837342]  ? spl_kvmalloc+0xc2/0xd0 [spl]
[ 2516.837347]  ? spl_kmem_alloc_impl+0x125/0x130 [spl]
[ 2516.837353]  ? spl_vmem_alloc+0x2c/0x60 [spl]
[ 2516.837388]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[ 2516.837393]  ? nv_mem_zalloc.isra.0+0x2f/0x40 [znvpair]
[ 2516.837397]  ? nvlist_xalloc.part.0+0x50/0xb0 [znvpair]
[ 2516.837460]  zfsdev_ioctl_common+0x43e/0x900 [zfs]
[ 2516.837523]  zfsdev_ioctl+0x55/0xe0 [zfs]
[ 2516.837580]  do_vfs_ioctl+0x407/0x670
[ 2516.837607]  ? do_user_addr_fault+0x216/0x450
[ 2516.837610]  ksys_ioctl+0x67/0x90
[ 2516.837611]  __x64_sys_ioctl+0x1a/0x20
[ 2516.837625]  do_syscall_64+0x57/0x190
[ 2516.837628]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2516.837635] RIP: 0033:0x7f5ac4ce737b
[ 2516.837638] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48
[ 2516.837639] RSP: 002b:00007ffe255d0988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 2516.837645] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f5ac4ce737b
[ 2516.837645] RDX: 00007ffe255d09b0 RSI: 0000000000005a46 RDI: 0000000000000005
[ 2516.837646] RBP: 00007ffe255d3fa0 R08: 00005564d654b040 R09: 00007f5ac4dbbd00
[ 2516.837647] R10: 00005564d652a010 R11: 0000000000000246 R12: 00007ffe255d4000
[ 2516.837648] R13: 00007ffe255d09b0 R14: 0000000000005a46 R15: 00005564d6537630

Encryption SenRecv Defect

Most helpful comment

@felixdoerre after talking this issue over with @tcaputi we've got a good idea how to go about fixing this. As you mentioned in an earlier comment the core issue here is that the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. What we can do this handle this specific case it to clear the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This should then allow the user accounting to be correctly updated on first mount using the normal upgrade process. @tcaputi is going to put together a patch, in the next week or so.

All 7 comments

I have disabled the ASSERT that caused the panic to be able to understand where the EIO in release mode comes from. I have enabled the ZFS_DEBUG_SET_ERROR bit in /sys/module/zfs/parameters/zfs_flags, reproduced the problem and got this log excerpt:

593760513   zap_leaf.c:426:zap_leaf_lookup(): error 2
1593760513   zap_leaf.c:426:zap_leaf_lookup(): error 2
1593760513   dsl_prop.c:149:dsl_prop_get_dd(): error 2
1593760513   zap_micro.c:985:zap_lookup_impl(): error 2
1593760513   zap_micro.c:985:zap_lookup_impl(): error 2
1593760513   zap_micro.c:985:zap_lookup_impl(): error 2
1593760513   dsl_prop.c:149:dsl_prop_get_dd(): error 2
1593760513   zap_leaf.c:510:zap_entry_read_name(): error 75
1593760513   dsl_crypt.c:2711:spa_do_crypt_objset_mac_abd(): error 52
1593760513   arc.c:2148:arc_untransform(): error 5
1593760515   zap_micro.c:985:zap_lookup_impl(): error 2
1593760515   zap.c:770:fzap_checksize(): error 22
1593760515   zap_leaf.c:510:zap_entry_read_name(): error 75
1593760515   zap_micro.c:985:zap_lookup_impl(): error 2
1593760515   metaslab.c:3816:metaslab_flush(): flushing: txg 297, spa testpool, vdev_id 0, ms_id 2, unflushed_allocs 66560, unflushed_frees 23552, appended 208 bytes
1593760515   zap_leaf.c:510:zap_entry_read_name(): error 75
1593760515   dbuf.c:2835:dbuf_findbp(): error 2
1593760520   zap_micro.c:985:zap_lookup_impl(): error 2
1593760520   zap.c:770:fzap_checksize(): error 22

So the EIO is most likely triggered by the ECKSUM here: https://github.com/openzfs/zfs/blob/ae7b167a98b48260554fd88cef022547283e125f/module/zfs/dsl_crypt.c#L2711
which gets translated into an EIO here: https://github.com/openzfs/zfs/blob/3c42c9ed84f1755ee8898b76e0264b8ebab19fd8/module/zfs/arc.c#L2160

From the 2 mac checks it is the "local" one that fails: bcmp(local_mac, osp->os_local_mac, ZIO_OBJSET_MAC_LEN) != 0

I have also seen this. This problem has existed in the master branch from some months now. For example zfs 0.8.0-699 (build from 20 March 2020) -796 (28 May 2020), -873 (build from 25th June 2020) and zfs 0.8.0-897 ( build from 8th Jul 2020 ) After an attempt to mount the re-constructed filesystem fails a zfs status -v immediately identifies the re-constructed
filesystem as suffering from a corruption - even though the disk devices themselves have a zero error count.

errors: Permanent errors have been detected in the following files:

        testpool/source:<0x0>

Unfortunately the defect introduced is NOT detected by a zpool scrub! A scrub after the receive and before the mount attempt will not show a problem.

zfs create -o encryption=on -o keyformat=passphrase test/source
Enter passphrase: 
Re-enter passphrase: 
zfs snap test/source@base
zfs umount test/source
zfs snap test/source@s1
zfs send -w test/source@base | zfs receive test/target 
zfs send -w -i @base test/source@s1 | zfs receive test/target 
zfs destroy test/source@s1
zfs rollback test/source@base
zfs send -w -i @base test/target@s1| zfs receive test/source 
zpool scrub test
zpool wait -t scrub test
zpool status -v test
  pool: test
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:00:00 with 0 errors on Thu Jul  9 10:47:48 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        test                        ONLINE       0     0     0
          /localhome/zfsadmin/tank  ONLINE       0     0     0

errors: No known data errors
zfs mount test/source
filesystem 'test/source' can not be mounted: Input/output error
cannot mount 'test/source': Invalid argument
ERROR: bad status 1 for zfs mount test/source
zpool status -v test
  pool: test
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 0 days 00:00:00 with 0 errors on Thu Jul  9 10:47:48 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        test                        ONLINE       0     0     0
          /localhome/zfsadmin/tank  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        test/source:<0x0>

Rolling back the faulty snapshot update (to @base) and re-starting scrub the corruption disappears as might be expected.
The only work around is then to rebuild sending a full snapshot and then the incrementals. Over a slow network this is impractical.

The observation that zpool scrub will not detect the problem is particularly worrying. While one can quickly test a filesystem dataset via a mount command .. what about when you can't use mount? For example when the dataset is a volume?

I believe that the reported "corrpution" is not actually a corruption (that's why scrub does not find any problems). The problem is reported, because the mac-mismatch reports an EIO which is probably interpreted by levels higher up as a data corruption problem. The real cause seems to be, that useraccounting-data is present where it is not expected, and the empty object is calculated to have a different mac (https://github.com/openzfs/zfs/blob/10fa254539ec41c6b043785d4e7ab34bce383b9f/module/os/linux/zfs/zio_crypt.c#L1212) that the actually existing one.

I did a small test using a volume rather than a filesystem. The rebuilt volume appears under /dev/zvol and the contents
are indeed identical. zpool status -v never finds a problem and neither does zpool scrub.

@felixdoerre after talking this issue over with @tcaputi we've got a good idea how to go about fixing this. As you mentioned in an earlier comment the core issue here is that the user space accounting is present when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac. What we can do this handle this specific case it to clear the OBJSET_FLAG_USERACCOUNTING_COMPLETE and reset the local mac. This should then allow the user accounting to be correctly updated on first mount using the normal upgrade process. @tcaputi is going to put together a patch, in the next week or so.

I bumped recently into this issue. Given the excellent discussion/debug done here, I decided to give it a try.

Was this page helpful?
0 / 5 - 0 ratings