Zfs: vdev_disk_io_start NULL pointer dereference

Created on 2 Nov 2019  路  5Comments  路  Source: openzfs/zfs

System information


Type | Version/Name
--- | ---
Distribution Name | RHEL
Distribution Version | 8
Linux Kernel | 4.18.0-80.11.2.el8_0.x86_64
Architecture | x86-64
ZFS Version | 0.8.2
SPL Version | 0.8.2

Describe the problem you're observing

I'm hitting a NULL pointer dereference while testing Centos 8 0.8.2 packages on RHEL 8 (for #9287). I noticed my build slave unexpectedly lost contact, so I re-ran the test and it lost contact again. I examined the system logs in AWS and saw the NULL pointer dereference. Since I hit this twice, I'm guessing

Describe how to reproduce the problem

Run test suite using experimental Centos 8 RPMs on RHEL 8.

Include any warning/errors/backtraces from the system logs

[ 7358.980661] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7362.954494] scsi 2:0:0:0: Direct-Access     Linux    scsi_debug       0188 PQ: 0 ANSI: 7
[ 7362.961477] sd 2:0:0:0: Power-on or device reset occurred
[ 7362.967940] sd 2:0:0:0: Attached scsi generic sg0 type 0
[ 7362.970196] sd 2:0:0:0: [sda] 196608 512-byte logical blocks: (101 MB/96.0 MiB)
[ 7362.984747] sd 2:0:0:0: [sda] Write Protect is off
[ 7362.993955] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 7363.027492]  sda: sda1 sda9
[ 7363.053953] sd 2:0:0:0: [sda] Attached SCSI disk
[ 7377.950301]  sda:
[ 7378.171287]  sda: sda1 sda9
[ 7382.765523] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7383.255367] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 7383.263660] PGD 0 P4D 0 
[ 7383.267867] Oops: 0000 [#1] SMP PTI
[ 7383.274420] CPU: 0 PID: 15212 Comm: z_wr_iss Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-80.11.2.el8_0.x86_64 #1
[ 7383.288669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 7383.295487] RIP: 0010:vdev_disk_io_start+0x2cc/0xad0 [zfs]
[ 7383.301742] Code: 74 24 48 48 89 50 08 49 8b 44 24 18 0f b6 56 6c 88 50 1b 49 8b 4c 24 18 48 8b 41 08 48 8b b0 20 05 00 00 48 8b be 78 04 00 00 <48> 8b 47 48 a8 03 0f 85 99 04 00 00 65 48 ff 00 48 8b 86 78 04 00
[ 7383.319804] RSP: 0000:ffffaa3784f6b9f0 EFLAGS: 00010246
[ 7383.325769] RAX: ffff997b244c4000 RBX: 0000000000000600 RCX: ffff997b21e95900
[ 7383.333116] RDX: 0000000000000001 RSI: ffff997af9115580 RDI: 0000000000000000
[ 7383.340480] RBP: ffffaa3784f6bab8 R08: ffff997b21e95900 R09: ffff9979fd9df180
[ 7383.347551] R10: ffff997a47c07100 R11: 0000000001700000 R12: ffff9979fd9df180
[ 7383.355097] R13: 0000000000000600 R14: ffff9979fd9df180 R15: ffff997a944c36a8
[ 7383.362516] FS:  0000000000000000(0000) GS:ffff997b28000000(0000) knlGS:0000000000000000
[ 7383.370545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7383.377107] CR2: 0000000000000048 CR3: 00000000b580a002 CR4: 00000000001606f0
[ 7383.384677] Call Trace:
[ 7383.388727]  ? vdev_queue_io+0x18f/0x230 [zfs]
[ 7383.394184]  zio_vdev_io_start+0xf4/0x2f0 [zfs]
[ 7383.399852]  zio_nowait+0xc3/0x160 [zfs]
[ 7383.405170]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.410584]  vdev_mirror_io_start+0x97/0x180 [zfs]
[ 7383.415514]  zio_vdev_io_start+0xf4/0x2f0 [zfs]
[ 7383.420306]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.425318]  zio_nowait+0xc3/0x160 [zfs]
[ 7383.429817]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.434840]  vdev_mirror_io_start+0x97/0x180 [zfs]
[ 7383.439783]  zio_vdev_io_start+0xf4/0x2f0 [zfs]
[ 7383.444451]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.449324]  zio_nowait+0xc3/0x160 [zfs]
[ 7383.453697]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.458556]  vdev_mirror_io_start+0x97/0x180 [zfs]
[ 7383.463369]  ? spa_config_enter+0xba/0x100 [zfs]
[ 7383.468166]  zio_vdev_io_start+0x269/0x2f0 [zfs]
[ 7383.472960]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[ 7383.478109]  zio_execute+0x90/0xf0 [zfs]
[ 7383.482700]  taskq_thread+0x2e5/0x530 [spl]
[ 7383.487192]  ? wake_up_q+0x70/0x70
[ 7383.491081]  ? zio_taskq_member.isra.8.constprop.13+0x70/0x70 [zfs]
[ 7383.496832]  ? taskq_destroy+0x180/0x180 [spl]
[ 7383.501566]  kthread+0x112/0x130
[ 7383.505558]  ? kthread_bind+0x30/0x30
[ 7383.509556]  ret_from_fork+0x35/0x40
[ 7383.513529] Modules linked in: scsi_debug sd_mod sg loop zfs(POE) icp(POE) zlua(POE) zcommon(POE) zunicode(POE) znvpair(POE) zavl(POE) spl(OE) ext4 mbcache jbd2 cirrus ttm drm_kms_helper intel_rapl syscopyarea sysfillrect sb_edac sysimgblt fb_sys_fops drm intel_rapl_perf pcspkr i2c_piix4 xfs libcrc32c ata_generic ata_piix crct10dif_pclmul crc32_pclmul libata crc32c_intel xen_netfront xen_blkfront ghash_clmulni_intel serio_raw dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
[ 7383.554663] CR2: 0000000000000048

Here's the last couple lines of the test log:

Test: /usr/share/zfs/zfs-tests/tests/functional/exec/exec_002_neg (run as root) [00:00] [PASS]
23:11:42.35 ASSERTION: Setting exec=off on a filesystem, processes can not be executed  from this file system.
23:11:42.35 SUCCESS: cp /var/tmp/constrained_path.PFXF/ls /mnt/testdir/myls
23:11:42.38 SUCCESS: zfs set exec=off testpool/testfs
23:11:42.39 SUCCESS: exec_n_check 126 /mnt/testdir/myls
23:11:42.39 SUCCESS: exec_n_check 1 mmap_exec /mnt/testdir/myls
23:11:42.39 Setting exec=off on filesystem testing passed.
23:11:42.39 NOTE: Performing local cleanup via log_onexit (cleanup)
23:11:42.40 SUCCESS: rm /mnt/testdir/myls
Test: /usr/share/zfs/zfs-tests/tests/functional/exec/cleanup (run as root) [00:00] [PASS]
23:11:42.55 SUCCESS: zpool destroy -f testpool
23:11:42.57 SUCCESS: rm -rf /mnt/testdir
Test: /usr/share/zfs/zfs-tests/tests/functional/fault/setup (run as root) [00:00] [PASS]
23:11:42.69 SUCCESS: mkdir /var/tmp/zed
23:11:42.70 SUCCESS: touch /var/tmp/zed/vdev_id.conf
23:11:42.70 SUCCESS: ln -s /var/tmp/zed/vdev_id.conf /etc/zfs/vdev_id.conf
23:11:42.71 SUCCESS: cp /etc/zfs/zed.d/zed.rc /var/tmp/zed
23:11:42.71 SUCCESS: cp /etc/zfs/zed.d/zed-functions.sh /var/tmp/zed
23:11:42.72 SUCCESS: umask 0022
23:11:42.72 SUCCESS: cp /usr/libexec/zfs/zed.d/resilver_finish-start-scrub.sh /var/tmp/zed
23:11:42.73 SUCCESS: umask 0022
23:11:42.73 SUCCESS: sed -i /\#ZED_DEBUG_LOG=.*/d /var/tmp/zed/zed.rc
23:11:42.74 NOTE: Starting ZED
23:11:42.74 SUCCESS: truncate -s 0 /var/tmp/zed/zed.debug.log
23:11:42.75 SUCCESS: eval zed -vF -d /var/tmp/zed -p /var/tmp/zed/zed.pid -P /var/tmp/constrained_path.PFXF -s /var/tmp/zed/state 2>/var/tmp/zed/zed.log &
Defect

Most helpful comment

This commit suggests where to find the root cause: https://github.com/zfsonlinux/zfs/commit/c12ea778654778f2039369323e8875e71d3a5609

Not all block devices, notably scsi_debug, set a root_blkg on the request queue.

Running "fault/auto_offline_001_pos" under linux-4.18.0-80.11.2.el8_0.x86_64 / zfs-0.8.2:

Thread 171 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 15054]
blkg_tryget (blkg=0x0 <irq_stack_union>) at ./include/linux/blk-cgroup.h:511
511 ./include/linux/blk-cgroup.h: No such file or directory.
(gdb) bt
#0  blkg_tryget (blkg=0x0 <irq_stack_union>) at ./include/linux/blk-cgroup.h:511
#1  0xffffffffc0936dff in vdev_bio_associate_blkg (bio=0xffff909452123c00) at /usr/src/zfs/module/zfs/vdev_disk.c:548
#2  0xffffffffc0937171 in __vdev_disk_physio (bdev=0xffff90936877a1c0, zio=0xffff909359ce31b0, io_size=1536, io_offset=4271616, rw=1, flags=1792) at /usr/src/zfs/module/zfs/vdev_disk.c:652
#3  0xffffffffc0937b8f in vdev_disk_io_start (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/vdev_disk.c:846
#4  0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:3755
#5  0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:2087
#6  zio_nowait (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:2161
#7  0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#8  0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:3755
#9  0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:2087
#10 zio_nowait (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:2161
#11 0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#12 0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:3755
#13 0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:2087
#14 zio_nowait (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:2161
#15 0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#16 0xffffffffc09b50a2 in zio_vdev_io_start (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/zio.c:3645
#17 0xffffffffc09aedd4 in __zio_execute (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/zio.c:2087
#18 zio_execute (zio=0x0 <irq_stack_union>) at /usr/src/zfs/module/zfs/zio.c:2000
#19 0xffffffffc04e7492 in taskq_thread (args=0xffff909446255780) at /usr/src/zfs/module/spl/spl-taskq.c:927
#20 0xffffffffa0ece683 in ?? ()
#21 0xffff90935b519c98 in ?? ()
#22 0xffffffffa0ece570 in ?? ()
#23 0x0000000000000000 in ?? ()
(gdb) up
#1  0xffffffffc0936dff in vdev_bio_associate_blkg (bio=0xffff909452123c00) at /usr/src/zfs/module/zfs/vdev_disk.c:548
548     if (blkg_tryget(q->root_blkg))
(gdb) p q->root_blkg
$1 = (struct blkcg_gq *) 0x0 <irq_stack_union>

blkg_tryget() source according to the el8 kernel src.rpm:

/**
 * blkg_tryget - try and get a blkg reference
 * @blkg: blkg to get
 *
 * This is for use when doing an RCU lookup of the blkg.  We may be in the midst
 * of freeing this blkg, so we can only use it if the refcnt is not zero.
 */
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
        return percpu_ref_tryget(&blkg->refcnt);
}

Which is different from the mainline version of blkg_tryget():

/**
 * blkg_tryget - try and get a blkg reference
 * @blkg: blkg to get
 *
 * This is for use when doing an RCU lookup of the blkg.  We may be in the midst
 * of freeing this blkg, so we can only use it if the refcnt is not zero.
 */
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
    return blkg && percpu_ref_tryget(&blkg->refcnt);
}

The following change seems to avoid the issue:

root@linux:/usr/src/zfs# git diff
diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c
index 46437f21f..0877db1b2 100644
--- a/module/zfs/vdev_disk.c
+++ b/module/zfs/vdev_disk.c
@@ -545,7 +545,7 @@ vdev_bio_associate_blkg(struct bio *bio)
        ASSERT3P(q, !=, NULL);
        ASSERT3P(bio->bi_blkg, ==, NULL);

-       if (blkg_tryget(q->root_blkg))
+       if (q->root_blkg && blkg_tryget(q->root_blkg))
                bio->bi_blkg = q->root_blkg;
 }
 #define        bio_associate_blkg vdev_bio_associate_blkg

All 5 comments

This commit suggests where to find the root cause: https://github.com/zfsonlinux/zfs/commit/c12ea778654778f2039369323e8875e71d3a5609

Not all block devices, notably scsi_debug, set a root_blkg on the request queue.

Running "fault/auto_offline_001_pos" under linux-4.18.0-80.11.2.el8_0.x86_64 / zfs-0.8.2:

Thread 171 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 15054]
blkg_tryget (blkg=0x0 <irq_stack_union>) at ./include/linux/blk-cgroup.h:511
511 ./include/linux/blk-cgroup.h: No such file or directory.
(gdb) bt
#0  blkg_tryget (blkg=0x0 <irq_stack_union>) at ./include/linux/blk-cgroup.h:511
#1  0xffffffffc0936dff in vdev_bio_associate_blkg (bio=0xffff909452123c00) at /usr/src/zfs/module/zfs/vdev_disk.c:548
#2  0xffffffffc0937171 in __vdev_disk_physio (bdev=0xffff90936877a1c0, zio=0xffff909359ce31b0, io_size=1536, io_offset=4271616, rw=1, flags=1792) at /usr/src/zfs/module/zfs/vdev_disk.c:652
#3  0xffffffffc0937b8f in vdev_disk_io_start (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/vdev_disk.c:846
#4  0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:3755
#5  0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:2087
#6  zio_nowait (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:2161
#7  0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#8  0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:3755
#9  0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:2087
#10 zio_nowait (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:2161
#11 0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#12 0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:3755
#13 0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:2087
#14 zio_nowait (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:2161
#15 0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#16 0xffffffffc09b50a2 in zio_vdev_io_start (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/zio.c:3645
#17 0xffffffffc09aedd4 in __zio_execute (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/zio.c:2087
#18 zio_execute (zio=0x0 <irq_stack_union>) at /usr/src/zfs/module/zfs/zio.c:2000
#19 0xffffffffc04e7492 in taskq_thread (args=0xffff909446255780) at /usr/src/zfs/module/spl/spl-taskq.c:927
#20 0xffffffffa0ece683 in ?? ()
#21 0xffff90935b519c98 in ?? ()
#22 0xffffffffa0ece570 in ?? ()
#23 0x0000000000000000 in ?? ()
(gdb) up
#1  0xffffffffc0936dff in vdev_bio_associate_blkg (bio=0xffff909452123c00) at /usr/src/zfs/module/zfs/vdev_disk.c:548
548     if (blkg_tryget(q->root_blkg))
(gdb) p q->root_blkg
$1 = (struct blkcg_gq *) 0x0 <irq_stack_union>

blkg_tryget() source according to the el8 kernel src.rpm:

/**
 * blkg_tryget - try and get a blkg reference
 * @blkg: blkg to get
 *
 * This is for use when doing an RCU lookup of the blkg.  We may be in the midst
 * of freeing this blkg, so we can only use it if the refcnt is not zero.
 */
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
        return percpu_ref_tryget(&blkg->refcnt);
}

Which is different from the mainline version of blkg_tryget():

/**
 * blkg_tryget - try and get a blkg reference
 * @blkg: blkg to get
 *
 * This is for use when doing an RCU lookup of the blkg.  We may be in the midst
 * of freeing this blkg, so we can only use it if the refcnt is not zero.
 */
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
    return blkg && percpu_ref_tryget(&blkg->refcnt);
}

The following change seems to avoid the issue:

root@linux:/usr/src/zfs# git diff
diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c
index 46437f21f..0877db1b2 100644
--- a/module/zfs/vdev_disk.c
+++ b/module/zfs/vdev_disk.c
@@ -545,7 +545,7 @@ vdev_bio_associate_blkg(struct bio *bio)
        ASSERT3P(q, !=, NULL);
        ASSERT3P(bio->bi_blkg, ==, NULL);

-       if (blkg_tryget(q->root_blkg))
+       if (q->root_blkg && blkg_tryget(q->root_blkg))
                bio->bi_blkg = q->root_blkg;
 }
 #define        bio_associate_blkg vdev_bio_associate_blkg

@loli10K very nice work! Hopefully the Centos 8 folks will backport the kernel change so 0.8.2 will work on it. No matter what though, we should include your fix in 0.8.3.

Absolutely, @loli10K would you mind opening a PR with the proposed change so we can get it merged to master.

Will open PR later today

FYI - I can confirm that blkg_tryget() is NOT fixed in the RHEL 8.1 kernel (4.18.0-147.0.3.el8_1)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zrav picture zrav  路  59Comments

dmaziuk picture dmaziuk  路  52Comments

runderwo picture runderwo  路  54Comments

torn5 picture torn5  路  57Comments

bwatkinson picture bwatkinson  路  50Comments