Zfs: PANIC: blkptr at ffff881f18601440 has invalid CHECKSUM 0

Created on 27 Jul 2017 · 25Comments · Source: openzfs/zfs

Type | Version/Name
--- | ---
Distribution Name | CentOS
Distribution Version | 7.3.1611
Linux Kernel | 3.10.0-514.6.1.el7.x86_64
Architecture | intel x64
ZFS Version | 0.7.0-rc5
SPL Version | 0.7.0-rc5

Describe the problem you're observing

This particular ZFS filesystem is the backing store for a Lustre MDS. After a Lustre LBUG:

2017-07-26T07:00:11-05:00 hmet0004 kernel: [36445.672990] LustreError: 28630:0:(osd_object.c:1284:__osd_object_create()) ASSERTION( rc == 0 ) failed: sa_buf_hold 18208548 failed: -2
2017-07-26T07:00:11-05:00 hmet0004 kernel: [36445.688811] LustreError: 28630:0:(osd_object.c:1284:__osd_object_create()) LBUG

...and subsequent reboot, any attempt to import the pool now gives this panic:

PANIC: blkptr at ffff881f18601440 has invalid CHECKSUM 0
Showing stack for process 31721
CPU: 3 PID: 31721 Comm: zpool Tainted: P           OE  ------------   3.10.0-514.6.1.el7.x86_64 #1
Hardware name: GIGABYTE D120-C21/MC20-S10, BIOS F02 05/19/2016
 0000000000000003 000000009d6dbe62 ffff881ec47bb1a0 ffffffff816862ac
 ffff881ec47bb1b0 ffffffffa02be234 ffff881ec47bb2d8 ffffffffa02be39c
 0000000000100000 61207274706b6c62 3838666666662074 3431303638316631
Call Trace:
 [<ffffffff816862ac>] dump_stack+0x19/0x1b
 [<ffffffffa02be234>] spl_dumpstack+0x44/0x50 [spl]
 [<ffffffffa02be39c>] vcmn_err+0x6c/0x110 [spl]
 [<ffffffffa03fefb8>] ? vdev_raidz_io_start+0x128/0x320 [zfs]
 [<ffffffffa03fd6c0>] ? vdev_raidz_asize+0x60/0x60 [zfs]
 [<ffffffff811de0c5>] ? kmem_cache_alloc+0x35/0x1e0
 [<ffffffffa02ba319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
 [<ffffffffa02ba319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
 [<ffffffffa03eb079>] zfs_panic_recover+0x69/0x90 [zfs]
 [<ffffffffa03838c0>] ? arc_read+0xa80/0xa80 [zfs]
 [<ffffffffa0442011>] zfs_blkptr_verify+0x311/0x370 [zfs]
 [<ffffffffa04420a4>] zio_read+0x34/0xe0 [zfs]
 [<ffffffff811dd4c1>] ? __kmalloc_node+0x1d1/0x2b0
 [<ffffffffa02b91a0>] ? spl_kmem_zalloc+0xc0/0x170 [spl]
 [<ffffffffa02b91a0>] ? spl_kmem_zalloc+0xc0/0x170 [spl]
 [<ffffffffa03838c0>] ? arc_read+0xa80/0xa80 [zfs]
 [<ffffffffa038343f>] arc_read+0x5ff/0xa80 [zfs]
 [<ffffffffa03a5873>] traverse_prefetch_metadata+0xd3/0xe0 [zfs]
 [<ffffffffa03a58f6>] prefetch_dnode_metadata+0x76/0xf0 [zfs]
 [<ffffffffa03a6355>] traverse_visitbp+0x665/0x840 [zfs]
 [<ffffffffa03a5fe5>] traverse_visitbp+0x2f5/0x840 [zfs]
 [<ffffffffa02b9037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
 [<ffffffffa03a5fe5>] traverse_visitbp+0x2f5/0x840 [zfs]
 [<ffffffffa02b9037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
 [<ffffffffa03a5fe5>] traverse_visitbp+0x2f5/0x840 [zfs]
 [<ffffffffa02b9037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
 [<ffffffffa03a5fe5>] traverse_visitbp+0x2f5/0x840 [zfs]
 [<ffffffffa02b9037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
 [<ffffffffa03a5fe5>] traverse_visitbp+0x2f5/0x840 [zfs]
 [<ffffffffa02b9037>] ? spl_kmem_alloc+0xc7/0x170 [spl]
 [<ffffffffa03a5fe5>] traverse_visitbp+0x2f5/0x840 [zfs]
 [<ffffffffa03a6bb1>] traverse_dnode+0xb1/0x1f0 [zfs]
 [<ffffffffa03a6475>] traverse_visitbp+0x785/0x840 [zfs]
 [<ffffffffa03a6701>] traverse_impl+0x1d1/0x450 [zfs]
 [<ffffffffa03d9c50>] ? spa_async_suspend+0x80/0x80 [zfs]
 [<ffffffffa03a6d45>] traverse_dataset_resume+0x55/0x60 [zfs]
 [<ffffffffa03d9c50>] ? spa_async_suspend+0x80/0x80 [zfs]
 [<ffffffffa03a6f0b>] traverse_pool+0x19b/0x1d0 [zfs]
 [<ffffffffa03d9c50>] ? spa_async_suspend+0x80/0x80 [zfs]
 [<ffffffffa03d9c50>] ? spa_async_suspend+0x80/0x80 [zfs]
 [<ffffffffa03e0643>] spa_load+0x1983/0x1f60 [zfs]
 [<ffffffffa03e0c7e>] spa_load_best+0x5e/0x290 [zfs]
 [<ffffffffa03e285a>] spa_import+0x1ca/0x6d0 [zfs]
 [<ffffffffa041b404>] zfs_ioc_pool_import+0xf4/0x130 [zfs]
 [<ffffffffa041f346>] zfsdev_ioctl+0x506/0x550 [zfs]
 [<ffffffff81212025>] do_vfs_ioctl+0x2d5/0x4b0
 [<ffffffff81691db1>] ? __do_page_fault+0x171/0x450
 [<ffffffff812122a1>] SyS_ioctl+0xa1/0xc0
 [<ffffffff816968c9>] system_call_fastpath+0x16/0x1b

We've tried every suggested workaround we could find in other issues and mailing list threads, to no avail. Including:

swapping the disks into a different chassis (because hope springs eternal in the human breast, I suppose) -- panic
importing the pool with just a single disk -- panic
import -FXn and just import -FX -- panic
the same but with zfs_recover=1 -- panic
listing the uberblocks with zdb -ul, getting the txg for the last uberblock (by date), and trying a zpool import -T txg -o readonly. This at least did something, which was to read the disks at 2 MB/s. Unfortunately at that rate we were looking at 23 days to read the whole thing, so we gave up after 4 hours.

At one point -- I'm not sure if it was import -FX or zfs_recover=1 -- something changed the nature of the failure from an instant panic at the first "invalid CHECKSUM" error into a series of warnings followed by an assertion:

WARNING: blkptr at ffff881ebe6d5440 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d5440 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d5440 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d5440 DVA 1 has invalid VDEV 959721836
WARNING: blkptr at ffff881ebe6d5440 DVA 2 has invalid VDEV 775303544
WARNING: blkptr at ffff881ebe6d5740 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d5740 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d5840 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d5840 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d5840 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d5840 DVA 1 has invalid VDEV 926298476
WARNING: blkptr at ffff881ebe6d5840 DVA 2 has invalid VDEV 775303544
WARNING: blkptr at ffff881ebe6d5b40 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d5b40 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6140 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6140 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6240 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6240 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6240 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d6240 DVA 1 has invalid VDEV 825635180
WARNING: blkptr at ffff881ebe6d6240 DVA 2 has invalid VDEV 775303544
WARNING: blkptr at ffff881ebe6d6540 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6540 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6640 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6640 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6640 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d6640 DVA 1 has invalid VDEV 809054572
WARNING: blkptr at ffff881ebe6d6640 DVA 2 has invalid VDEV 775303544
WARNING: blkptr at ffff881ebe6d6940 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6940 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6a40 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6a40 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6a40 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d6a40 DVA 1 has invalid VDEV 825375084
WARNING: blkptr at ffff881ebe6d6a40 DVA 2 has invalid VDEV 1680750129
WARNING: blkptr at ffff881ebe6d6d40 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6d40 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6e40 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d6e40 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d6e40 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d6e40 DVA 1 has invalid VDEV 925971052
WARNING: blkptr at ffff881ebe6d6e40 DVA 2 has invalid VDEV 1680750129
WARNING: blkptr at ffff881ebe6d7140 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7140 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7240 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7240 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7240 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d7240 DVA 1 has invalid VDEV 875573612
WARNING: blkptr at ffff881ebe6d7240 DVA 2 has invalid VDEV 775303544
WARNING: blkptr at ffff881ebe6d7540 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7540 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7640 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7640 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7640 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d7640 DVA 1 has invalid VDEV 876163436
WARNING: blkptr at ffff881ebe6d7640 DVA 2 has invalid VDEV 775303544
WARNING: blkptr at ffff881ebe6d7940 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7940 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7a40 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7a40 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7a40 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d7a40 DVA 1 has invalid VDEV 875837548
WARNING: blkptr at ffff881ebe6d7a40 DVA 2 has invalid VDEV 1680750129
WARNING: blkptr at ffff881ebe6d7d40 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7d40 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7e40 has invalid CHECKSUM 0
WARNING: blkptr at ffff881ebe6d7e40 has invalid COMPRESS 0
WARNING: blkptr at ffff881ebe6d7e40 DVA 0 has invalid VDEV 1969487872
WARNING: blkptr at ffff881ebe6d7e40 DVA 1 has invalid VDEV 925905516
WARNING: blkptr at ffff881ebe6d7e40 DVA 2 has invalid VDEV 775303544
VERIFY3(c < (1ULL << 24) >> 9) failed (36028797018963967 < 32768)
PANIC at zio.c:266:zio_buf_alloc()

We get the same set of warnings from zdb -AAA -F -e:

Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 15267167644730928330
        name: 'lustre'
        state: 0
        vdev_tree:
            type: 'root'
            id: 0
            guid: 15267167644730928330
            children[0]:
                type: 'raidz'
                id: 0
                guid: 12487255120635568856
                nparity: 3
                metaslab_array: 71
                metaslab_shift: 38
                ashift: 12
                asize: 40003264577536
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 11152527491519449036
                    devid: 'ata-HGST_HUH721010ALE600_7PH35SUC-part1'
                    phys_path: 'pci-0000:05:00.0-sas-0x4433221100000000-lun-0'
                    whole_disk: 1
                    DTL: 204
                    create_txg: 4
                    path: '/dev/mapper/snap_ata-HGST_HUH721010ALE600_7PH35SUC1'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 12217561655882834813
                    devid: 'ata-HGST_HUH721010ALE600_7PH36B6C-part1'
                    phys_path: 'pci-0000:05:00.0-sas-0x4433221103000000-lun-0'
                    whole_disk: 1
                    DTL: 203
                    create_txg: 4
                    path: '/dev/mapper/snap_ata-HGST_HUH721010ALE600_7PH36B6C1'
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 14383882701081049619
                    devid: 'ata-HGST_HUH721010ALE600_7PH377MC-part1'
                    phys_path: 'pci-0000:05:00.0-sas-0x4433221101000000-lun-0'
                    whole_disk: 1
                    DTL: 202
                    create_txg: 4
                    path: '/dev/mapper/snap_ata-HGST_HUH721010ALE600_7PH377MC1'
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 11902102953401244559
                    devid: 'ata-HGST_HUH721010ALE600_7PH395GC-part1'
                    phys_path: 'pci-0000:05:00.0-sas-0x4433221102000000-lun-0'
                    whole_disk: 1
                    DTL: 201
                    create_txg: 4
                    offline: 1
                    path: '/dev/mapper/snap_ata-HGST_HUH721010ALE600_7PH395GC1'
        rewind-policy:
            rewind-request-txg: 18446744073709551615
            rewind-request: 8
(all of the above warnings)
c < (1ULL << 24) >> 9 (0x7fffffffffffff < 0x8000)
ASSERT at ../../module/zfs/zio.c:266:zio_buf_alloc()Aborted

I guess there are two paths forward from here:

A fairly benign kernel panic in a largely-unrelated component somehow leads to an unusable file system? This is terrifying!
Having reached this stage, is there anything at all that can be done to recover from it? It seems like the data must all be there -- after all, the system went from fully operational to unusable in an instant -- we just can't get to it.

I'd be eternally grateful for any advice!

Source

phik

All 25 comments

We've used the (truly terrifying) zfs_revert script (from https://gist.github.com/jshoward/5685757). Removing 1 transaction was not sufficient -- still got the panic -- but after removing _five_ transactions, we can mount again!

The filesystem seems to be functional at that point -- we're able to create 100,000 files and write a GB of data -- and we've brought Lustre back up.

We still have the original dd copies of the raw corrupted disks, so we can still try to learn what's happened here and how to "properly" recover.

phik on 27 Jul 2017

👍1

That is a scary script! I'm glad you were able to use it to rollback to a point where you could import the pool and mount the filesystems. This is exactly the sort of thing zpool import -FXn _should_ have been able to do, but clearly couldn't because of the panic you're seeing.

As for the issue itself we saw something very similar to this on a test system while testing out Lustre 2.10 _prior_ to it being officially released. Based on the pool damage we observed, our best guess as to what must have happened is that somehow Lustre overwrote the some memory containing block pointers due, valid checksums were then generated for the garbage block pointers and that was written to disk.

This is why the zpool import -FXn fails with a panic. ZFS trusts the contents of the blocks because the checksum they were written with is valid. One of the planned development items is to add some additional sanity checks to the ZFS code to verify certain data structures read from disk before using them. The damaged pool you saved would be a great test case for this.

To my knowledge we haven't had any reports of this sort of thing happening when using ZFS without Lustre. Perhaps @adilger knows if any issues like this were recently we resolved in Lustre?

behlendorf on 28 Jul 2017

❤1

When I went looking for this error in the code:

WARNING: blkptr at ffff881ebe6d5440 has invalid CHECKSUM 0

...what I found suggested that the checksum "type" is an invalid value, and that it therefore didn't even know which algorithm to use to validate the checksum?

Or have I misinterpreted?

phik on 28 Jul 2017

You've got it exactly right. The other warnings you reported show variations on similar forms of damage (unknown compression types, impossible DVAs, etc). All of which suggest the block pointer somehow got stomped on.

behlendorf on 28 Jul 2017

Then I don't know how to square that with your previous statement:

This is why the zpool import -FXn fails with a panic. ZFS trusts the contents of the blocks because the checksum they were written with is valid.

How can the block checksum be valid, if it doesn't know which algorithm to use to validate?

phik on 28 Jul 2017

Exactly! ZFS inherently trusts that if the checksum for a block is good, which it was, the contents of that block will be sane. Somehow blocks were written to the pool with valid checksums and nonsense values. That should be impossible, the only plausible way that can happen I'm aware of is due to memory corruption. The block pointers blocks gets damaged in memory after being created but before they're checksummed and written.

behlendorf on 28 Jul 2017

I'm still confused about how it can know that the checksum is valid if it doesn't even know which checksum algorithm was used -- but I trust you. :)

phik on 28 Jul 2017

The checksum for the damaged block and it's type are stored in the parent. The block pointer it's complaining about is for one of its children. :)

behlendorf on 28 Jul 2017

Ah ha! You've drawn the veil from my eyes. I'm now very slightly less ignorant. Thank you!

phik on 28 Jul 2017

Thanks for the insight. How do you want the pool? Its currently 3 x 10T raw disk images (dd of the raw device). I suspect only ~2.1T of each disk is really in use.

sdm900 on 29 Jul 2017

I've just read this again and wondered why, when this corruption is identified, isn't the previous txg used? Continually dropping back until a sane txg is found? Surely almost any issue should either cause the FS not to mount and allow the user to take alternative action, or drop back (as requested by the user) to a sane block?

sdm900 on 1 Aug 2017

My guess would be that noone anticipated that this could happen, and the code to handle it isn't present.

Lalufu on 1 Aug 2017

@phik we think we've identified the root cause here and opened #6439 with a proposed fix for review. My suggestion for the moment is to set dnodesize=legacy on your MDS for now as a workaround until we finalize the fix and get it merged.

Continually dropping back until a sane txg is found

Yes, this is exactly how the zpool import -Fx logic is implemented. The trouble here is that the block has a valid checksum which is what ZFS uses to verify it should trust something read from disk. As mentioned above we're looking in to adding addition layers of sanity checking to more gracefully handle situations like this one which shouldn't be possible unless there's a bug. And we think we found that bug.

behlendorf on 2 Aug 2017

Innnteresting. I definitely don't understand the internals well enough to comment on the patch, but your description would certainly seem to fit the symptom.

Thanks very much for the prompt followup! I'll discuss your suggestion with the team ASAP.

phik on 2 Aug 2017

FWIW we just got this panic again on a 0.7.0-rc3 system

[375528.898326] PANIC: blkptr at ffff881785cfc440 has invalid CHECKSUM 0
[375528.906484] Showing stack for process 221
[375528.912237] CPU: 3 PID: 221 Comm: spl_system_task Tainted: P OE ------------ 3.10.0-514.6.1.el7.x86_64 #1
[375528.924617] Hardware name: GIGABYTE D120-C21/MC20-S10, BIOS F02 05/19/2016
[375528.933127] 0000000000000003 000000003bbe38b8 ffff881fe100f3a0 ffffffff816862ac
[375528.942221] ffff881fe100f3b0 ffffffffa02be234 ffff881fe100f4d8 ffffffffa02be39c
[375528.951631] 0000000000100000 61207274706b6c62 3838666666662074 3463666335383731
[375528.960947] Call Trace:
[375528.965138] [] dump_stack+0x19/0x1b
[375528.971914] [] spl_dumpstack+0x44/0x50 [spl]
[375528.979440] [] vcmn_err+0x6c/0x110 [spl]
[375528.986601] [] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[375528.995011] [] ? kmem_cache_alloc+0x193/0x1e0
[375529.002551] [] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[375529.010875] [] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[375529.019175] [] zfs_panic_recover+0x69/0x90 [zfs]
[375529.026943] [] ? arc_read+0xa30/0xa30 [zfs]
[375529.034226] [] zfs_blkptr_verify+0x311/0x370 [zfs]
[375529.042129] [] zio_read+0x34/0xe0 [zfs]
[375529.049037] [] ? __kmalloc_node+0x1d1/0x2b0
[375529.056289] [] ? spl_kmem_zalloc+0xc0/0x170 [spl]
[375529.064032] [] ? spl_kmem_zalloc+0xc0/0x170 [spl]
[375529.072051] [] ? arc_read+0xa30/0xa30 [zfs]
[375529.079230] [] arc_read+0x5ff/0xa30 [zfs]
[375529.086195] [] traverse_prefetcher+0x14c/0x180 [zfs]
[375529.094123] [] traverse_visitbp+0x4e9/0x840 [zfs]
[375529.101733] [] traverse_dnode+0xb1/0x1f0 [zfs]
[375529.109096] [] traverse_visitbp+0x6bb/0x840 [zfs]
[375529.116641] [] traverse_visitbp+0x2f5/0x840 [zfs]
[375529.124188] [] traverse_visitbp+0x2f5/0x840 [zfs]
[375529.131715] [] traverse_visitbp+0x2f5/0x840 [zfs]
[375529.139194] [] traverse_visitbp+0x2f5/0x840 [zfs]
[375529.146601] [] traverse_visitbp+0x2f5/0x840 [zfs]
[375529.154036] [] traverse_visitbp+0x2f5/0x840 [zfs]
[375529.161398] [] traverse_dnode+0xb1/0x1f0 [zfs]
[375529.168496] [] traverse_visitbp+0x785/0x840 [zfs]
[375529.175806] [] traverse_prefetch_thread+0x108/0x180 [zfs]
[375529.183814] [] ? prefetch_needed.isra.5+0x40/0x40 [zfs]
[375529.191620] [] taskq_thread+0x246/0x470 [spl]
[375529.198513] [] ? wake_up_state+0x20/0x20
[375529.204969] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[375529.212370] [] kthread+0xcf/0xe0
[375529.218115] [] ? kthread_create_on_node+0x140/0x140
[375529.225444] [] ret_from_fork+0x58/0x90
[375529.231689] [] ? kthread_create_on_node+0x140/0x140

the machine was still up and running... hopefully comes back painlessly after reboot...

sdm900 on 7 Aug 2017

Rebooted just fine.

sdm900 on 7 Aug 2017

@sdm900 the exact issue reported here was introduced in 0.7.0-rc5 and will be fixed in 0.7.1.

behlendorf on 7 Aug 2017

That concerns me a little.

sdm900 on 8 Aug 2017

This issue was resolved in the v0.7.1 release tagged on August 8th.

behlendorf on 9 Aug 2017

I have a few lingering questions, if I may...

Out of curiosity, how reliable is that test that you committed? Did that reproduce the problem 100% of the time on rc5?
Stu says that we had this issue again on an rc3 system (in his comment of Aug 6). How can that be, if the problem was introduced in rc5?
We still have the fact that zpool import -Fx couldn't import this filesystem -- should this be filed separately?
There's also the (separate?) issue that zpool import -T couldn't do in finite time what the horrifying python script did very quickly. import -T seemed to be reading the entire filesystem at 2 MB/s. Should this be filed separately?

phik on 10 Aug 2017

👍1

Out of curiosity, how reliable is that test that you committed?

It reproduced the issue 100% of the time in about 5 seconds. After applying the fix I left it running in a loop for about 8 hours without issue. We were able to reproduce the issue on one of our Lustre test systems although it took longer to reproduce, after applying the fix a week ago we haven't seen it since.

Stu says that we had this issue again on an rc3 system (in his comment of Aug 6). How can that be, if the problem was introduced in rc5?

I agree, that is very troubling. The PANIC only indicates that a damaged block pointer was encountered, it doesn't reveal how it ended up damaged. The issue fixed here was caused by two dnodes stomping on each other and is understood, unfortunately the root cause for @smd900's PANIC must be different.
@sdm900 if possible it would be great if you could scrub the pool and see if anything is reported.

We still have the fact that zpool import -Fx couldn't import this filesystem -- should this be filed separately?

Yes. We've talked about doing this development work for quite a while but I don't think we've ever opened an issue for it. It's a fairly large development effort.

There's also the (separate?) issue that zpool import -T couldn't do in finite time what the horrifying python script did very quickly. import -T seemed to be reading the entire filesystem at 2 MB/s. Should this be filed separately?

Yes, it would be great to open an issue for that too. We're going to need to investigate where exactly zpool import -T end up spending all that time.

behlendorf on 10 Aug 2017

I'd love to scrub the file system (and I'll set it going)... but I don't think it will ever finish. It is on a production file system that is heavily used and we will no doubt loose power before it finishes :(

sdm900 on 11 Aug 2017

As far as I can tell, zfs scrub is useless...

  scan: scrub in progress since Sat Aug 12 20:01:57 2017  
        540G scanned out of 6.32T at 6.26M/s, 269h20m to go
        0 repaired, 8.35% done

and this file system has gone quiet over the weekend. I expect the rate to drop to almost 0 when people are in the office.

sdm900 on 14 Aug 2017

👍1

@sdm900 for filesystems with lots of small files or heavy fragmentation I agree it can be unreasonably slow. There are patches in #6256 under development to improve the situation. But it sounds like you may want to just cancel, or pause, the scrub for now if it's not going to complete in a reasonable amount of time.

behlendorf on 14 Aug 2017

@behlendorf Sorry to ask here, but...

Is there currently any tool to fix the invalid CHECKSUM panic?

I'm not sure, but the error could have been caused by something related to the kernel bug reported at #7723 (https://bugzilla.redhat.com/show_bug.cgi?id=1598462)

Sadly, the host panicked a few seconds after loading zfs driver, could never ever try to run zfs import -F. Since I had two disks, isolated one for later post-mortem and restarted send/recv from zero to a single disk. That's when I found the kernel bug mentioned above. :sob: