Zfs: unable to handle kernel paging request ... in zio_compress_data

Created on 8 Apr 2019  路  7Comments  路  Source: openzfs/zfs

System information

Type | Version/Name
--- | ---
Distribution Name | RHEL
Distribution Version | 7.6
Linux Kernel | 3.10.0-957.5.1.el7.x86_64
Architecture | x86_64
ZFS Version | 0.7.12-1
SPL Version | 0.7.12-1

Describe the problem you're observing

After updating from zfs 0.6.5.9 to 0.7.12, our system became unstable and crashed about once per day. See example kernel logs below.
The crashes happened during high write IO. We tried an update to current zfs 0.7.13 which did not solve the problem.
Reverting to zfs 0.6.5.11 (and kernel 3.10.0-693.21.1.el7.x86_64 because of known incompatibility of zfs 0.6.5.x to recent RHEL kernels), the system became stable again. We can exclude the different kernel version as a reason, because we also tried an older kernel together with zfs 0.7.13 and saw the same crashes happen.

In all cases, zio_compress_data appears as on of the last function calls in the kernel trace. "unable to handle kernel paging request" is raised either from

abd_iterate_func(src, 0, s_len, zio_compress_zeroed_cb, NULL) == 0

or

c_len = ci->ci_compress(tmp, dst, s_len, d_len, ci->ci_level);

so we guess that the crashes are caused by dereferencing of invalid pointers in connection with scattered ABD. (Scattered) ABDs did not exist in 0.6.5.x.

Interestingly, we operate a second server with identical hardware and zpool configuration. On this server, we upgraded to zfs 0.7.12 without problems (uptime is 32 days and > 300 TiB were written to that zpool without problems since). The two servers differ in data stored on the servers (backups/snapshots of Oracle and/or SAP HANA databases written via NFS in both cases), the problematic server has slightly higher degree of fragmentation and more zfs sub-volumes.

Some hardware info:

Single socket Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
128 GB of ECC RAM
zpool consists of storage LUNs from a storage appliance, 128TiB per LUN, attached via multipath FC
4 SAS SSDs as ZIL and L2ARC

some zpool infos

the "crashing" one:

NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
nfsshare  1.36P   989T   408T         -    33%    70%  1.00x  ONLINE  -

the "good" one:

NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
nfsshare  1.36P  1.03P   344T         -    28%    75%  1.00x  ONLINE  -

We use lz4 compression on both pools, compression ratio is 2.5x in average.

Describe how to reproduce the problem

Heavy write IO on a large zpool using zfs 0.7.x

Include any warning/errors/backtraces from the system logs

example 1 (crash in zio_compress_zeroed_cb)

[ 7406.356182] BUG: unable to handle kernel paging request at ffffa850c396b000
[ 7406.356247] IP: [<ffffffffc0a79a11>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[ 7406.356355] PGD 17fd96067 PUD 1b600e1067 PMD 1b5afc9067 PTE 0
[ 7406.356408] Oops: 0000 [#1] SMP 
[ 7406.356437] Modules linked in: nfsd nfs_acl lockd grace bonding zfs(POE) zunicode(POE) zavl(POE) icp(POE) iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr lpc_ich mei_me mei dm_service_time zcommon(POE) znvpair(POE) ipmi_ssif spl(OE) sg ipmi_si ipmi_devintf ipmi_msghandler wmi tpm_crb acpi_power_meter dm_multipath auth_rpcgss sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx crct10dif_pclmul crct10dif_common drm crc32c_intel bnx2x drm_panel_orientation_quirks ahci nvme_fc libahci nvme_fabrics megaraid_sas nvme_core libata scsi_transport_fc
[ 7406.357138]  mdio ptp pps_core scsi_tgt libcrc32c dm_mirror dm_region_hash dm_log dm_mod
[ 7406.357213] CPU: 5 PID: 79370 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.5.1.el7.x86_64 #1
[ 7406.357296] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[ 7406.357351] task: ffff9043c0665140 ti: ffff9042a5bdc000 task.ti: ffff9042a5bdc000
[ 7406.357405] RIP: 0010:[<ffffffffc0a79a11>]  [<ffffffffc0a79a11>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[ 7406.357519] RSP: 0018:ffff9042a5bdfc18  EFLAGS: 00010287
[ 7406.357559] RAX: ffffa850c396b000 RBX: 0000000000100000 RCX: 0000000000100000
[ 7406.357611] RDX: 0000000000000000 RSI: ffffa850c3a6b000 RDI: ffffa850c396b000
[ 7406.357664] RBP: ffff9042a5bdfc18 R08: 0000000000000000 R09: ffff902553388000
[ 7406.357717] R10: 0000000000000002 R11: 0000001e96b0c000 R12: ffffffffc0a799f0
[ 7406.357769] R13: 0000000000000000 R14: 0000000000100000 R15: ffffa850df694000
[ 7406.357822] FS:  0000000000000000(0000) GS:ffff9043fc740000(0000) knlGS:0000000000000000
[ 7406.357881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7406.357923] CR2: ffffa850c396b000 CR3: 0000001b7a210000 CR4: 00000000003607e0
[ 7406.357976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7406.358028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7406.358079] Call Trace:
[ 7406.358126]  [<ffffffffc09a6257>] abd_iterate_func+0x97/0x120 [zfs]
[ 7406.358213]  [<ffffffffc0a79aab>] zio_compress_data+0x3b/0xc0 [zfs]
[ 7406.358300]  [<ffffffffc0a72fbf>] zio_write_compress+0x3bf/0x6a0 [zfs]
[ 7406.358388]  [<ffffffffc0a7280f>] zio_execute+0x9f/0x100 [zfs]
[ 7406.358440]  [<ffffffffc0475d7c>] taskq_thread+0x2ac/0x4f0 [spl]
[ 7406.358489]  [<ffffffff8fed67f0>] ? wake_up_state+0x20/0x20
[ 7406.358562]  [<ffffffffc0a72770>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
[ 7406.358627]  [<ffffffffc0475ad0>] ? taskq_thread_spawn+0x60/0x60 [spl]
[ 7406.358678]  [<ffffffff8fec1c71>] kthread+0xd1/0xe0
[ 7406.358716]  [<ffffffff8fec1ba0>] ? insert_kthread_work+0x40/0x40
[ 7406.358765]  [<ffffffff90574c37>] ret_from_fork_nospec_begin+0x21/0x21
[ 7406.358814]  [<ffffffff8fec1ba0>] ? insert_kthread_work+0x40/0x40
[ 7406.358858] Code: 00 00 00 00 00 0f 1f 44 00 00 0f 1f 44 00 00 55 48 01 fe 48 39 f7 48 89 e5 72 10 eb 1d 0f 1f 44 00 00 48 83 c7 08 48 39 fe 76 0f <48> 83 3f 00 74 f1 b8 01 00 00 00 5d c3 66 90 31 c0 5d c3 66 90 
[ 7406.359147] RIP  [<ffffffffc0a79a11>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[ 7406.359240]  RSP <ffff9042a5bdfc18>
[ 7406.361319] CR2: ffffa850c396b000

example 2 (crash in lz4_compress_zfs):

[54195.432822] BUG: unable to handle kernel paging request at ffffbb8a45a70000
[54195.432862] IP: [<ffffffffc1117887>] lz4_compress_zfs+0xf7/0x840 [zfs]
[54195.432927] PGD 17fd96067 PUD 1772780067 PMD 1768165067 PTE 0
[54195.432955] Oops: 0000 [#1] SMP 
[54195.432971] Modules linked in: nfsd nfs_acl lockd grace bonding zfs(POE) zunicode(POE) zavl(POE) icp(POE) iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) spl(OE) dm_service_time joydev ipmi_ssif sg pcspkr mei_me mei lpc_ich ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter tpm_crb dm_multipath auth_rpcgss sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx crct10dif_pclmul crct10dif_common crc32c_intel drm ahci bnx2x drm_panel_orientation_quirks libahci nvme_fc libata nvme_fabrics megaraid_sas nvme_core mdio
[54195.433360]  scsi_transport_fc ptp pps_core libcrc32c scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[54195.433409] CPU: 2 PID: 24937 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.5.1.el7.x86_64 #1
[54195.433453] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.9.1 12/04/2018
[54195.433483] task: ffff9b56ecb56180 ti: ffff9b56b07c8000 task.ti: ffff9b56b07c8000
[54195.433513] RIP: 0010:[<ffffffffc1117887>]  [<ffffffffc1117887>] lz4_compress_zfs+0xf7/0x840 [zfs]
[54195.433587] RSP: 0018:ffff9b56b07cbc40  EFLAGS: 00010212
[54195.433608] RAX: ffffbb8a45acbff4 RBX: 0000000000100000 RCX: ffffbb8a45a6fffc
[54195.433635] RDX: ffffbb8a45a6fffd RSI: 00000000000003bb RDI: 00000000000003bb
[54195.433664] RBP: ffff9b56b07cbc98 R08: 0000000000000001 R09: 000000000000005f
[54195.433697] R10: 0000000093060115 R11: ffffbb8a36dc9a12 R12: ffffbb8a459cc000
[54195.433724] R13: ffff9b56a08ec000 R14: ffffbb8a45a6ffe0 R15: ffffbb8a36e8c000
[54195.433755] FS:  0000000000000000(0000) GS:ffff9b56fc680000(0000) knlGS:0000000000000000
[54195.433789] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54195.433816] CR2: ffffbb8a45a70000 CR3: 0000001fd4fa0000 CR4: 00000000003607e0
[54195.433844] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[54195.433871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[54195.433907] Call Trace:
[54195.433950]  [<ffffffffc1199af7>] zio_compress_data+0x87/0xc0 [zfs]
[54195.434002]  [<ffffffffc1192fbf>] zio_write_compress+0x3bf/0x6a0 [zfs]
[54195.434053]  [<ffffffffc119280f>] zio_execute+0x9f/0x100 [zfs]
[54195.434083]  [<ffffffffc0645d7c>] taskq_thread+0x2ac/0x4f0 [spl]
[54195.434109]  [<ffffffffa86d67f0>] ? wake_up_state+0x20/0x20
[54195.434156]  [<ffffffffc1192770>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
[54195.434192]  [<ffffffffc0645ad0>] ? taskq_thread_spawn+0x60/0x60 [spl]
[54195.434219]  [<ffffffffa86c1c71>] kthread+0xd1/0xe0
[54195.434239]  [<ffffffffa86c1ba0>] ? insert_kthread_work+0x40/0x40
[54195.434265]  [<ffffffffa8d74c37>] ret_from_fork_nospec_begin+0x21/0x21
[54195.434292]  [<ffffffffa86c1ba0>] ? insert_kthread_work+0x40/0x40
[54195.434315] Code: 00 00 00 eb 0f 0f 1f 80 00 00 00 00 45 89 c8 48 89 d1 89 fe 45 8d 48 01 41 c1 f8 06 49 63 d0 48 01 ca 48 39 d0 0f 82 df 06 00 00 <44> 8b 02 49 89 ca 4d 29 e2 41 69 f8 b1 79 37 9e 4d 8d 44 b5 00 
[54195.434473] RIP  [<ffffffffc1117887>] lz4_compress_zfs+0xf7/0x840 [zfs]
[54195.435708]  RSP <ffff9b56b07cbc40>
[54195.436930] CR2: ffffbb8a45a70000

Most helpful comment

Fwiw, we have crash dumps, and we did some trawling around the stack to find the abd_t that induced the crash, so we can give more details about what things looked like just before the crash, if that is helpful.

Re setting zfs_abd_scatter_enabled = 0, the abd_t is marked with ABD_FLAG_LINEAR, so I'm not sure if you still think that would be helpful.

In the below case, dereferencing 0xffffbf23650d5000 was the thing that caused the crash.

crash> struct abd_iter 0xffff9dc912d63c38
struct abd_iter {
  iter_mapaddr = 0xffffbf23650d5000, 
  iter_mapsize = 131072, 
  iter_abd = 0xffff9dd820ab39b0, 
  iter_pos = 0, 
  iter_offset = 0, 
  iter_sg = 0x0
}
crash> struct abd 0xffff9dd820ab39b0
struct abd {
  abd_flags = ABD_FLAG_LINEAR, 
  abd_size = 131072, 
  abd_parent = 0x0, 
  abd_children = {
    rc_count = 0
  }, 
  abd_u = {
    abd_scatter = {
      abd_offset = 1695371264, 
      abd_nents = 4294950691, 
      abd_sgl = 0xffff9dc64da72080
    }, 
    abd_linear = {
      abd_buf = 0xffffbf23650d5000, 
      abd_sgl = 0xffff9dc64da72080
    }
  }
}

All 7 comments

As a workaround, does disabling ABD result in stable operation?
https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_abd_scatter_enabled

@richardelling thank you for your feedback.
Setting zfs_abd_scatter_enabled = 0 is worth a try. Unfortunately, I do not have a test system where I could reproduce the problem and will not get a downtime on the production system (which is currently running 0.6.5.11 again) before next week. I will let you know, when I could test this option, maybe it helps tracking down the problem.

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

We have seen the same problem on a few hosts over the past couple of weeks on zfs 0.8.4:

[723473.157220] BUG: unable to handle kernel paging request at ffffb4288aa24000
[723473.157955] IP: [<ffffffffc0c94431>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[723473.158661] PGD 17fe09067 PUD c23e91067 PMD 201e977067 PTE 0
[723473.159294] Oops: 0000 [#1] SMP
[723473.159915] Modules linked in: joydev mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase nfsv3 nfs_acl nfs lockd grace fscache dell_rbu dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support dell_smbios dell_wmi_descriptor dcdbas zfs(POE) zunicode(POE) zlua(POE) skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm zcommon(POE) mgag200 znvpair(POE) zavl(POE) irqbypass icp(POE) ttm pcspkr drm_kms_helper spl(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops drm sg drm_panel_orientation_quirks lpc_ich i2c_i801 mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter acpi_pad auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ahci
[723473.163831]  ixgbe glue_helper lrw gf128mul ablk_helper cryptd libahci igb libata megaraid_sas mdio ptp pps_core i2c_algo_bit dca nfit libnvdimm dm_mod
[723473.165187] CPU: 13 PID: 113950 Comm: z_wr_iss_h Kdump: loaded Tainted: P           OE  ------------   3.10.0-1127.18.2.el7.x86_64 #1
[723473.166529] Hardware name: Dell Inc. PowerEdge R640, BIOS 2.6.4 04/09/2020
[723473.167196] task: ffff959fbb4ab150 ti: ffff959f24ba8000 task.ti: ffff959f24ba8000
[723473.167855] RIP: 0010:[<ffffffffc0c94431>]  [<ffffffffc0c94431>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[723473.168544] RSP: 0018:ffff959f24babc18  EFLAGS: 00010287
[723473.169204] RAX: ffffb4288aa24000 RBX: 0000000000020000 RCX: 0000000000020000
[723473.169873] RDX: 0000000000000000 RSI: ffffb4288aa44000 RDI: ffffb4288aa24000
[723473.170534] RBP: ffff959f24babc18 R08: 0000000000000000 R09: 0000000000000001
[723473.171185] R10: 000000001bc6d001 R11: ffff95aa1bc6e4a0 R12: ffffffffc0c94410
[723473.171829] R13: 0000000000000000 R14: 0000000000020000 R15: ffffb4284b15c000
[723473.172471] FS:  0000000000000000(0000) GS:ffff95afbad80000(0000) knlGS:0000000000000000
[723473.173105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[723473.173730] CR2: ffffb4288aa24000 CR3: 000000200efc8000 CR4: 00000000007607e0
[723473.174353] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[723473.174973] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[723473.175593] PKRU: 00000000
[723473.176209] Call Trace:
[723473.176838]  [<ffffffffc0b852e7>] abd_iterate_func+0x97/0x120 [zfs]
[723473.177474]  [<ffffffffc0c944cb>] zio_compress_data+0x3b/0xc0 [zfs]
[723473.178097]  [<ffffffffc0c8c98d>] zio_write_compress+0x3dd/0x750 [zfs]
[723473.178704]  [<ffffffffc0c8bb5f>] zio_execute+0x9f/0x100 [zfs]
[723473.179283]  [<ffffffffc086da9c>] taskq_thread+0x2ac/0x4f0 [spl]
[723473.179850]  [<ffffffff8c4db990>] ? wake_up_state+0x20/0x20
[723473.180435]  [<ffffffffc0c8bac0>] ? zio_taskq_member.isra.8.constprop.11+0x80/0x80 [zfs]
[723473.181015]  [<ffffffffc086d7f0>] ? taskq_thread_spawn+0x60/0x60 [spl]
[723473.181585]  [<ffffffff8c4c6691>] kthread+0xd1/0xe0
[723473.182138]  [<ffffffff8c4c65c0>] ? insert_kthread_work+0x40/0x40
[723473.182682]  [<ffffffff8cb92d1d>] ret_from_fork_nospec_begin+0x7/0x21
[723473.183216]  [<ffffffff8c4c65c0>] ? insert_kthread_work+0x40/0x40
[723473.183738] Code: 00 00 00 00 00 0f 1f 44 00 00 0f 1f 44 00 00 55 48 01 fe 48 39 f7 48 89 e5 72 10 eb 1d 0f 1f 44 00 00 48 83 c7 08 48 39 fe 76 0f <48> 83 3f 00 74 f1 b8 01 00 00 00 5d c3 66 90 31 c0 5d c3 66 90
[723473.184840] RIP  [<ffffffffc0c94431>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[723473.185378]  RSP <ffff959f24babc18>
[723473.185881] CR2: ffffb4288aa24000

Fwiw, we have crash dumps, and we did some trawling around the stack to find the abd_t that induced the crash, so we can give more details about what things looked like just before the crash, if that is helpful.

Re setting zfs_abd_scatter_enabled = 0, the abd_t is marked with ABD_FLAG_LINEAR, so I'm not sure if you still think that would be helpful.

In the below case, dereferencing 0xffffbf23650d5000 was the thing that caused the crash.

crash> struct abd_iter 0xffff9dc912d63c38
struct abd_iter {
  iter_mapaddr = 0xffffbf23650d5000, 
  iter_mapsize = 131072, 
  iter_abd = 0xffff9dd820ab39b0, 
  iter_pos = 0, 
  iter_offset = 0, 
  iter_sg = 0x0
}
crash> struct abd 0xffff9dd820ab39b0
struct abd {
  abd_flags = ABD_FLAG_LINEAR, 
  abd_size = 131072, 
  abd_parent = 0x0, 
  abd_children = {
    rc_count = 0
  }, 
  abd_u = {
    abd_scatter = {
      abd_offset = 1695371264, 
      abd_nents = 4294950691, 
      abd_sgl = 0xffff9dc64da72080
    }, 
    abd_linear = {
      abd_buf = 0xffffbf23650d5000, 
      abd_sgl = 0xffff9dc64da72080
    }
  }
}

@cperl82 Do you still have the crash dump? Can you please do crash> vtop ffffbf23650d5000?

Yes, I have 4 different crash dumps saved, all seem to be the same issue (and, I expect before too long, I'll have some more).

crash> vtop 0xffffbf23650d5000
VIRTUAL           PHYSICAL        
ffffbf23650d5000  (not mapped)

PGD DIRECTORY: ffffffff88010000
PAGE DIRECTORY: 17fe09067
   PUD: 17fe09468 => ffe9c1067
   PMD: ffe9c1940 => 1012164067
   PTE: 10121646a8 => 0
Was this page helpful?
0 / 5 - 0 ratings