Type | Version/Name
--- | ---
Distribution Name | Debian
Distribution Version | Buster
Linux Kernel | 4.19.0-5-amd64
Architecture | amd64
ZFS Version | 0.8.0
SPL Version |
Upgraded a 0.7.13 pool to 0.8.0. Started a trim on the pool. Within minutes a disk was marked as faulted. SMART data is clean, regular scrubs are occurring, and the disk has never once thrown an error. It has less than 2000 hours on it.
I'm not saying the disk isn't bad, but it would be a heck of a coincidence.
Cleared the error on the disk, now the disk seems to be in a permanent "trimming" state.
zpool layout is eight S3610 in four mirrors
Permanent trimming (now 5 hours later):
mirror-2 ONLINE 0 0 0
ata-LK1600GEYMV_xxxxx ONLINE 0 0 0 (trimming)
ata-LK1600GEYMV_xxxxx ONLINE 0 0 0
Disk errors:
May 29 14:48:57 tank kernel: [ 3437.636049] sd 0:0:4:0: attempting task abort! scmd(00000000f0a3b4b6)
May 29 14:48:57 tank kernel: [ 3437.636059] sd 0:0:4:0: [sde] tag#3060 CDB: Write(10) 2a 00 8d 64 b7 98 00 00 08 00
May 29 14:48:57 tank kernel: [ 3437.636064] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
May 29 14:48:57 tank kernel: [ 3437.636068] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
May 29 14:48:57 tank kernel: [ 3437.660033] sd 0:0:4:0: task abort: SUCCESS scmd(00000000f0a3b4b6)
May 29 14:48:57 tank kernel: [ 3437.660046] sd 0:0:4:0: [sde] tag#3060 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
May 29 14:48:57 tank kernel: [ 3437.660050] sd 0:0:4:0: [sde] tag#3060 CDB: Write(10) 2a 00 8d 64 b7 98 00 00 08 00
May 29 14:48:57 tank kernel: [ 3437.660113] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=1214559236096 size=4096 flags=180880
May 29 14:48:57 tank kernel: [ 3437.660136] sd 0:0:4:0: attempting task abort! scmd(000000005093a137)
May 29 14:48:57 tank kernel: [ 3437.660139] sd 0:0:4:0: [sde] tag#3057 CDB: Write(10) 2a 00 8d 64 b6 f8 00 00 78 00
May 29 14:48:57 tank kernel: [ 3437.660143] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
May 29 14:48:57 tank kernel: [ 3437.660145] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
May 29 14:48:57 tank kernel: [ 3437.660148] sd 0:0:4:0: task abort: SUCCESS scmd(000000005093a137)
May 29 14:48:57 tank kernel: [ 3437.660155] sd 0:0:4:0: [sde] tag#3057 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
May 29 14:48:57 tank kernel: [ 3437.660157] sd 0:0:4:0: [sde] tag#3057 CDB: Write(10) 2a 00 8d 64 b6 f8 00 00 78 00
May 29 14:48:57 tank kernel: [ 3437.660207] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=1214559154176 size=61440 flags=180880
May 29 14:48:57 tank kernel: [ 3437.660219] sd 0:0:4:0: attempting task abort! scmd(000000004cb30aad)
May 29 14:48:57 tank kernel: [ 3437.660223] sd 0:0:4:0: [sde] tag#3056 CDB: Write(10) 2a 00 04 a2 ef 58 00 00 e8 00
May 29 14:48:57 tank kernel: [ 3437.660227] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
May 29 14:48:57 tank kernel: [ 3437.660230] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
May 29 14:48:57 tank kernel: [ 3437.660234] sd 0:0:4:0: task abort: SUCCESS scmd(000000004cb30aad)
May 29 14:48:57 tank kernel: [ 3437.660243] sd 0:0:4:0: [sde] tag#3056 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
May 29 14:48:57 tank kernel: [ 3437.660247] sd 0:0:4:0: [sde] tag#3056 CDB: Write(10) 2a 00 04 a2 ef 58 00 00 e8 00
May 29 14:48:57 tank kernel: [ 3437.660318] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=39825879040 size=118784 flags=180880
May 29 14:48:57 tank kernel: [ 3437.660329] sd 0:0:4:0: attempting task abort! scmd(000000009a38a976)
May 29 14:48:57 tank kernel: [ 3437.660333] sd 0:0:4:0: [sde] tag#3026 CDB: Write(10) 2a 00 9a cb 24 b8 00 00 08 00
May 29 14:48:57 tank kernel: [ 3437.660335] scsi target0:0:4: handle(0x000d), sas_address(0x4433221103000000), phy(3)
May 29 14:48:57 tank kernel: [ 3437.660338] scsi target0:0:4: enclosure logical id(0x590b11c00214b300), slot(0)
May 29 14:48:57 tank kernel: [ 3437.660340] sd 0:0:4:0: task abort: SUCCESS scmd(000000009a38a976)
May 29 14:48:57 tank kernel: [ 3437.660345] sd 0:0:4:0: [sde] tag#3026 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
May 29 14:48:57 tank kernel: [ 3437.660348] sd 0:0:4:0: [sde] tag#3026 CDB: Write(10) 2a 00 9a cb 24 b8 00 00 08 00
May 29 14:48:57 tank kernel: [ 3437.660395] zio pool=ssdtank vdev=/dev/disk/by-id/ata-LK1600GEYMV_xxxxxx-part1 error=5 type=2 offset=1329665241088 size=4096 flags=180880
After clearing the faulted device:
ZFS has finished a resilver:
eid: 74
class: resilver_finish
host: tank
time: 2019-05-29 14:52:51-0500
pool: ssdtank
state: ONLINE
scan: resilvered 146M in 0 days 00:00:02 with 0 errors on Wed May 29 14:52:51 2019
config:
NAME STATE READ WRITE CKSUM
ssdtank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0 (trimming)
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0
ata-LK1600GEYMV_xxxxxx ONLINE 0 0 0
To add to this, some 9 hours later, still running.
Manually running:
zpool trim -s ssdtank /dev/disk/by-id/ata-device-still-trimming
does nothing, while the other devices report there is no active trim
After running above:
mirror-2 ONLINE 0 0 0
ata-LK1600GEYMV_ xxxxxxx ONLINE 0 0 0 (trimming)
@kroy-the-rabbit that would be a real of a coincidence. According to the console messages posted an unexpected timeout was encountered (hostbyte=DID_TIME_OUT). After aborting the command it was converted to an IO error by the scsi layer and then reported to ZFS. This doesn't necessarily mean there's anything wrong with the device, but something in the path resulted in the timeout.
A couple quick questions.
Did you TRIM the entire pool? If so were all other vdevs in the pool trimmed successfully? You can use zpool status -t to output the detailed per-vdev trim status.
The zpool trim -s command will suspend the TRIM, not cancel it, so it hanging around may be expected. Use zpool trim -c to cancel it.
Did you TRIM the entire pool?
Yep.
The rest of the pool other than the failed device says:
100% trimmed, completed at Wed 29 May 2019 02:29:10 PM CDT)
The other mirror device in that vdev says:
(100% trimmed, completed at Wed 29 May 2019 03:09:23 PM CDT)
The "failed" device says (and doesn't seem to be moving):
(33% trimmed, started at Wed 29 May 2019 02:27:35 PM CDT)
Use zpool trim -c to cancel it.
That did bring the status to "untrimmed".
Restarting it by the specific device resulted in a postive result in less than 3 minutes:
100% trimmed, completed at Wed 29 May 2019 11:34:25 PM CDT)
It sounds as if ZFS may have overwhelmed the controller with outstanding TRIM commands resulting in the unexpected timeout and subsequently faulted device. You could try setting the module option zfs_vdev_trim_max_active=1 to reduce how many concurrent TRIM commands can be outstanding per-vdev (it defaults to 2). If this prevents the issue we may want to consider reducing the default value.
so I didn't want to open another ticket, but I'm not sure this is the same thing.
Yesterday I upgraded zfs from the ubuntu provided 0.7.5 to a manually built 0.8.2 so I could use trim.
I have a simple mirror of 2 1tb ssds. (both product: SanDisk SSD PLUS).
I installed 0.8.2, did the pool upgrade, and first thing I did was zpool trim on the pool.
come back later and there's over 1000 unrecoverable errors.
blow away the pool, recreate it, turn on autotrim, copied 200+gig back to it, I have a handful of unrecoverable errors, while it's copying to the pool.
I blow away the pool again, leave autotrim off, copy 300+ gig to it, zero errors, scrub, zero errors.
root@io:/z/vm_backup# lsb_release -a; uname -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
Linux io 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
filename: /lib/modules/4.15.0-70-generic/updates/dkms/zfs.ko
version: 0.8.2-1
maybe it's my disk, not dealing well with trim? maybe not.
This is my backup system so I don't really want to use it as a testbed to run more tests, but if there's something non destructive you want me to do, I can try it.
oh and there were no disk errors of any kind in kern.log.
I have issues with trim too, triming pool results in zvol corruption ALWAYS.
I can restore pool from backup, scrub says it's ok, and right after trim it becomes degraded.
Dec 13 15:41:38 pve zed: eid=3005 class=trim_start pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1 vdev_state=ONLINE
Dec 13 15:41:38 pve zed: eid=3006 class=history_event pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:04 pve zed: eid=3007 class=trim_finish pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1 vdev_state=ONLINE
Dec 13 15:42:04 pve zed: eid=3008 class=history_event pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:29 pve zed: eid=3009 class=scrub_start pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:29 pve zed: eid=3010 class=history_event pool_guid=0x9649B29D3CFCA360
Dec 13 15:42:40 pve zed: eid=3011 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3012 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3013 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3014 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3015 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3016 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3017 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3018 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3019 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3020 class=checksum pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1
Dec 13 15:42:40 pve zed: eid=3021 class=statechange pool_guid=0x9649B29D3CFCA360 vdev_path=/dev/disk/by-id/ata-WDC_WDS480G2G0B-00EPW0_19144B800540-part1 vdev_state=DEGRADED
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
ssd-storage 3.46G 427G 24K /ssd-storage
ssd-storage/data 3.46G 427G 24K /ssd-storage/data
ssd-storage/data/vm-130-disk-0 3.46G 427G 3.46G -
ssd-storage/data/vm-130-disk-1 15K 427G 15K -
# zpool scrub ssd-storage
# zpool status
pool: ssd-storage
state: ONLINE
scan: scrub repaired 0B in 0 days 00:00:12 with 0 errors on Fri Dec 13 04:22:48 2019
config:
NAME STATE READ WRITE CKSUM
ssd-storage ONLINE 0 0 0
ata-WDC_WDS480G2G0B-00EPW0_19144B800540 ONLINE 0 0 0
errors: No known data errors
# zpool trim ssd-storage
# zpool status
pool: ssd-storage
state: ONLINE
scan: scrub repaired 0B in 0 days 00:00:13 with 0 errors on Fri Dec 13 04:24:01 2019
config:
NAME STATE READ WRITE CKSUM
ssd-storage ONLINE 0 0 0
ata-WDC_WDS480G2G0B-00EPW0_19144B800540 ONLINE 0 0 0 (trimming)
errors: No known data errors
# zpool scrub ssd-storage
# zpool status -v
pool: ssd-storage
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 7K in 0 days 00:00:13 with 18 errors on Fri Dec 13 04:25:00 2019
config:
NAME STATE READ WRITE CKSUM
ssd-storage DEGRADED 0 0 0
ata-WDC_WDS480G2G0B-00EPW0_19144B800540 DEGRADED 0 0 40 too many errors
errors: Permanent errors have been detected in the following files:
ssd-storage/data/vm-130-disk-0:<0x1>
Status says it was trimmed OK: (100% trimmed, completed at Fri 13 Dec 2019 03:42:04 PM MSK)
echo 1 > /sys/module/zfs/parameters/zfs_vdev_trim_max_active didn't help
@Temtaime if it's not too much to ask would it be possible for you to rerun your test case with the ZFS_DEBUG_TRIM bit set in zfs_flags. This will enable some additional internal checks to re-verify that only free space is ever trimmed. If any of these checks fail it'll result in a system panic with some additional debugging. You can enable these checks by running:
sudo sh -c "echo 2048 >/sys/module/zfs/parameters/zfs_flags"
This should help us determine if it's the software or hardware which is causing the problem.
@behlendorf Hello. Thanks for a reply.
Sorry, this is a remote production machine and i cannot right for now cause a kernel panic for it.
I'll try to find some free time to get to it and we'll see.
@behlendorf Long story short, i replaced ssd and the issue is gone.
Any news from @nixomose ?
sorry I didn't get back to you before.
so I grabbed a couple of ssds at work, different make and model. set up the mirror just like I have at home, did all the same things, ran trim, everything was fine.
I'm afraid to run it at home again because it's a pain restoring all the data. :-)
So likely it has to do with some firmware problem specific to these drives I have.
That's just a guess though. I can get you the model (they're sandisk drives) if you want. I'm not near the machine right now so ... oh wait, I have a network one moment...
*-disk
description: ATA Disk
product: SanDisk SSD PLUS
physical id: 0.0.0
bus info: scsi@3:0.0.0
logical name: /dev/sdc
version: 00RL
serial: 190919447101
size: 931GiB (1TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=e9911e60-0f76-c847-b894-61de7ca43f2b logicalsectorsize=512 sectorsize=512
and
*-disk
description: ATA Disk
product: SanDisk SSD PLUS
physical id: 0.0.0
bus info: scsi@5:0.0.0
logical name: /dev/sde
version: 00RL
serial: 191177464112
size: 931GiB (1TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=55d0fdde-0f3e-db4e-8cc2-7f474864f590 logicalsectorsize=512 sectorsize=512
I have a single pool with them mirrored together, and if I run trim, it all goes boom.
@behlendorf Hi from stu. Happy friday.
So those drives I had a problem with are going to be freed up in the next few days I hope, for a little while anyway. I asked tom and he said tag you and ask if you'd want me to do any testing with these drives.
I'm inclined to agree with you that the firmware on these drives is not capable of dealing with whatever trims zfs is throwing at it.
But if you think this is a problem worth chasing and there's something I can test on these drives, lemme know, I can do whatever.
a short version of what happened so you don't have to read through everything again:
I installed 0.8.2, I have a pool with two 1tb sandisk ssds mirrored together. ( the pool was created with 0.7 something I think)
when I run zpool trim on that pool, I get a bazillion errors and all the data is unrecoverable.
Thanks for the update. If you're able to do a little testing with the drives it would be helpful to create a new pool and see if you can reproduce the issue. If so are there any IO errors logged to dmesg? What does the /proc/spl/kstat/zfs/pool/iostats file look like? Are only checksum errors reported? Going on the theory that this is a firmware issue. If we were somehow able to recognize the drive was behaving badly we could stop issuing trims.
Didn't have a lot of time, I'll get you more information, but the first set of stuff is interesting...
I ran zpool trim z went away, came back after a while...
these two ssd are mirrored together....
lrwxrwxrwx 1 root root 10 May 2 20:30 ata-SanDisk_SSD_PLUS_1000GB_191177464112-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 May 2 20:30 ata-SanDisk_SSD_PLUS_1000GB_190919447101-part1 -> ../../sdd1
root@io:/dev/disk/by-id# zpool status
pool: z
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0B in 0 days 01:10:35 with 0 errors on Tue Mar 31 21:16:40 2020
config:
NAME STATE READ WRITE CKSUM
z ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-SanDisk_SSD_PLUS_1000GB_190919447101 ONLINE 7 0 181
ata-SanDisk_SSD_PLUS_1000GB_191177464112 ONLINE 0 0 177
so kern.log has io errors on sdd which is 190919447101
...
May 3 07:45:06 io kernel: [207209.921245] ata4.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x0
May 3 07:45:06 io kernel: [207209.921249] ata4.00: irq_stat 0x40000008
May 3 07:45:06 io kernel: [207209.921251] ata4.00: failed command: READ FPDMA QUEUED
May 3 07:45:06 io kernel: [207209.921256] ata4.00: cmd 60/00:48:34:92:75/01:00:4a:00:00/40 tag 9 ncq dma 131072 in
May 3 07:45:06 io kernel: [207209.921256] res 41/40:00:34:92:75/00:00:4a:00:00/00 Emask 0x409 (media error)
May 3 07:45:06 io kernel: [207209.921258] ata4.00: status: { DRDY ERR }
May 3 07:45:06 io kernel: [207209.921259] ata4.00: error: { UNC }
May 3 07:45:06 io kernel: [207209.927018] ata4.00: configured for UDMA/133
May 3 07:45:06 io kernel: [207209.927028] sd 3:0:0:0: [sdd] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 3 07:45:06 io kernel: [207209.927030] sd 3:0:0:0: [sdd] tag#9 Sense Key : Medium Error [current]
May 3 07:45:06 io kernel: [207209.927032] sd 3:0:0:0: [sdd] tag#9 Add. Sense: Unrecovered read error - auto reallocate failed
May 3 07:45:06 io kernel: [207209.927034] sd 3:0:0:0: [sdd] tag#9 CDB: Read(10) 28 00 4a 75 92 34 00 01 00 00
May 3 07:45:06 io kernel: [207209.927036] print_req_error: I/O error, dev sdd, sector 1249219124
May 3 07:45:06 io kernel: [207209.927041] zio pool=z vdev=/dev/disk/by-id/ata-SanDisk_SSD_PLUS_1000GB_190919447101-part1 error=5 type=1 offset=639599142912 size=131072 flags=180880
May 3 07:45:06 io kernel: [207209.927052] ata4: EH complete
But the other disk seems fine (from a kern.log point of view) yet the data is corrupt, the mirror couldn't get the data from either drive?
there are zero errors in kern.log for the sdf drive.
reading the files zpool status -v says are bad I get io errors...
root@io:/dev/disk/by-id# file /z/backup/sp/spford/ford/git/o/objects/e8/6955a1e45bf3d4c3e3d7836d8c10a14389711b
/z/backup/sp/spford/ford/git/o/objects/e8/6955a1e45bf3d4c3e3d7836d8c10a14389711b: ERROR: cannot read `/z/backup/sp/spford/ford/git/o/objects/e8/6955a1e45bf3d4c3e3d7836d8c10a14389711b' (Input/output error)
a few more notes:
I checked and there's no firmware update for these drives, so it is what it is.
given that only one drive was giving errors, I swapped sata cables, and plugged them into different ports, since then... no more hardware errors. I've been copying, and deleting and creating and deleting snapshots, the corrupt blocks are still corrupt, but nothing new shows up. again, no hardware errors.
So maybe just a bad sata cable. Odd that it's only a problem when I trim though, I'll play more.
Also, I guess this makes sense, just wanted to check, but when doing a zpool status, I notice that every time I did a du on my zpool, the CKSUM errors went up. I guess that's because zfs was unable to repair the blocks, so it kept finding them bad over and over every time I tried to read that data, and the CKSUM value is just a tally of errors it has come across I guess?
Anyway. I'll let you know if I find anything else worth mentioning but it's starting to look like either a bad cable or something not seated right. I blew away the pool, made a new one and am generating more data and deletes, it seems to be holding up for now.
Also, I'm realizing, these appear to be pretty low grade consumer level drives, so one can only ask for so much.
Thanks for the feedback anyway.
oh well. was looking good for a bit there...
blew away the old pool, made a new one copied a bunch of stuff to it, scripted taking a snapshot every 5 seconds, started copying more data to it, started deleting old data, did a zpool trim, and started getting checksum errors. no dmesg errors at all this time.
I guess I'm just out of luck with these drives.
in case I didn't mention, I had tried this type of stuff on 2 different ssds at work a few months ago, worked flawlessly, no errors. I'm sure it's just these drives.
I have a similar issue occuring with autotrim=on on a a IBM M1015 HBA (LSI 9211-8i). I can isolate it to the HBA as the raidz1 consists of disks on the HBA and on the MoBo SATA controller. With autotrim=on I receive on all 4 HBA-connected devices error messages like this (none on MoBO connected disks):
zio pool vdev=xxxx error=5 type=1 offset=xxxx flags=180880
with autotrim=off I so far have not seen the issue appearing. A manual trim also did not invoke any issues
Updtae: I ran a scrub without autotrim and did receive errors I/O errors (though not zio pool errors but typical blk_update_request IO errors). I ordered a new 8087 cable to check if errors persist with a new cable.
If I do not report back, consider my issue solved ;-)
@nixomose what HBA are you using?
all my drives are plugged directly into the motherboard, so whatever controller's on the board which lshw says is....
product: 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] [8086:1E02]
vendor: Intel Corporation [8086]
I can't imagine my problem is the controller though, because other ssds attached the same way are fine with trim, I'm sure it's the drives in my case. never buying those again. :-)
Hello, seems like I've faced a similar issue. I hope some more info could help to figure out root cause for the issue.
I'm running NixOS 19.09 with ZFS 0.8.3 (while there was the same issue with 0.8.2).
My SSDs are Seagate Nytro 1351 3.84TB (XA3840LE10063). I have 6 of them and the issue is reproducible on all of them and looks the same. All SSDs are directly connected to the MB. That's my root pool and I can't destroy it, but I'm ok with carefully running some experiments on one of the devices as I'm running raidz2.
So, before enabling the autotrim feature I've decided to test on one device first using the sudo zpool trim zmain <DEVICE> command and almost immediately I've got one read error:
➜ sudo zpool status -t zmain ~
pool: zmain
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 0B in 0 days 02:16:07 with 0 errors on Mon Jun 1 15:55:00 2020
config:
NAME STATE READ WRITE CKSUM
zmain ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-XA3840LE10063_HKT01DNK-part2 ONLINE 0 0 0 (100% trimmed, completed at Mon 01 Jun 2020 01:30:30 PM PDT)
ata-XA3840LE10063_HKT01DNN-part2 ONLINE 1 0 0 (43% trimmed, started at Mon 01 Jun 2020 04:09:43 PM PDT)
ata-XA3840LE10063_HKT01E92-part2 ONLINE 0 0 0 (untrimmed)
ata-XA3840LE10063_HKT01EBH-part2 ONLINE 0 0 0 (untrimmed)
ata-XA3840LE10063_HKT01EGB-part2 ONLINE 0 0 0 (untrimmed)
ata-XA3840LE10063_HKT01ETE-part2 ONLINE 0 0 0 (untrimmed)
logs
nvme-INTEL_SSDPE21D480GA_PHM2809000AZ480BGN-part2 ONLINE 0 0 0 (untrimmed)
cache
nvme-Samsung_SSD_970_PRO_512GB_S463NF0K910544A ONLINE 0 0 0 (untrimmed)
errors: No known data errors
Trim would eventually successfully finish. Scrub successfully passing after that as well and the pool seems to continue running just fine after that. I've faced that issue more than 3 months ago and after successful scrub, I've cleaned up the errors and they never acquired again. So, that's quite interesting and I'm wondering is it even a real error or some issue that leading to zfs to report an error...
Here is what I see in logs at the time of running trim command:
Jun 01 16:09:43 nas sudo[123474]: frostman : TTY=pts/4 ; PWD=/home/frostman ; USER=root ; COMMAND=/run/current-system/sw/bin/zpoo>
Jun 01 16:09:43 nas sudo[123474]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 01 16:09:43 nas sudo[123474]: pam_unix(sudo:session): session closed for user root
Jun 01 16:09:43 nas kernel: ata2.00: exception Emask 0x0 SAct 0xa000000 SErr 0x0 action 0x0
Jun 01 16:09:43 nas kernel: ata2.00: irq_stat 0x40000001
Jun 01 16:09:43 nas kernel: ata2.00: failed command: SEND FPDMA QUEUED
Jun 01 16:09:43 nas kernel: ata2.00: cmd 64/01:c8:00:00:00/00:00:00:00:00/a0 tag 25 ncq dma 512 out
res 41/04:c8:00:00:00/00:00:00:00:00/00 Emask 0x401 (device error) <F>
Jun 01 16:09:43 nas kernel: ata2.00: status: { DRDY ERR }
Jun 01 16:09:43 nas kernel: ata2.00: error: { ABRT }
Jun 01 16:09:43 nas kernel: ata2.00: failed command: SEND FPDMA QUEUED
Jun 01 16:09:43 nas kernel: ata2.00: cmd 64/01:d8:00:00:00/00:00:00:00:00/a0 tag 27 ncq dma 512 out
res 51/04:c8:00:00:00/00:00:00:00:00/00 Emask 0x1 (device error)
Jun 01 16:09:43 nas kernel: ata2.00: status: { DRDY ERR }
Jun 01 16:09:43 nas kernel: ata2.00: error: { ABRT }
Jun 01 16:09:43 nas kernel: ata2.00: configured for UDMA/133
Jun 01 16:09:43 nas kernel: ata2.00: device reported invalid CHS sector 0
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 Sense Key : Illegal Request [current]
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 Add. Sense: Unaligned write command
Jun 01 16:09:43 nas kernel: sd 1:0:0:0: [sdb] tag#27 CDB: Write same(16) 93 08 00 00 00 00 00 20 21 f0 00 00 00 48 00 00
Jun 01 16:09:43 nas kernel: blk_update_request: I/O error, dev sdb, sector 2105840 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio c>
Jun 01 16:09:43 nas kernel: ata2: EH complete
Jun 01 16:09:43 nas kernel: ata2.00: exception Emask 0x0 SAct 0x820 SErr 0x0 action 0x0
Jun 01 16:09:43 nas kernel: ata2.00: irq_stat 0x40000001
Jun 01 16:09:43 nas kernel: ata2.00: failed command: SEND FPDMA QUEUED
Jun 01 16:09:43 nas kernel: ata2.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
res 41/04:28:00:00:00/00:00:00:00:00/00 Emask 0x401 (device error) <F>
Jun 01 16:09:44 nas kernel: ata2.00: status: { DRDY ERR }
Jun 01 16:09:44 nas kernel: ata2.00: error: { ABRT }
I have a similar issue occuring with autotrim=on on a a IBM M1015 HBA (LSI 9211-8i). I can isolate it to the HBA as the raidz1 consists of disks on the HBA and on the MoBo SATA controller. With autotrim=on I receive on all 4 HBA-connected devices error messages like this (none on MoBO connected disks):
I ran a scrub without autotrim and did receive errors I/O errors (though not zio pool errors but typical blk_update_request IO errors). I ordered a new 8087 cable to check if errors persist with a new cable.
If I do not report back, consider my issue solved ;-)
So I installed the new cable without any significant change - I still get errors like these regularly:
Jun 3 19:41:47 xxx kernel: [ 7368.995554] sd 0:0:6:0: [sdg] tag#205 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun 3 19:41:47 xxx kernel: [ 7368.995556] sd 0:0:6:0: [sdg] tag#205 CDB: Read(10) 28 00 04 cc f6 d0 00 00 08 00
Jun 3 19:41:47 xxx kernel: [ 7368.995557] blk_update_request: I/O error, dev sdg, sector 80541392 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
Jun 3 19:41:47 xxx kernel: [ 7368.995584] zio pool=tank-ssd vdev=/dev/disk/by-id/wwn-0x5002538844584d30-part1 error=5 type=1 offset=41236144128 size=4096 flags=180880
Jun 3 19:41:47 xxx zed: eid=21 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5002538844584d30-part1
Jun 3 19:41:48 xxx zed: eid=22 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a44a4f38-part1
Jun 3 19:41:48 xxx zed: eid=23 class=io pool_guid=0xAF344193AF7F244C
Jun 3 19:41:48 xxx zed: eid=24 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a4b66962-part1
Jun 3 19:41:48 xxx zed: eid=25 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x0000000000000000-part1
Jun 3 20:45:22 xxx kernel: [11184.199033] blk_update_request: I/O error, dev sdf, sector 67551064 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
Jun 3 20:45:22 xxx kernel: [11184.218613] sd 0:0:4:0: [sde] tag#200 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun 3 20:45:22 xxx kernel: [11184.218616] blk_update_request: I/O error, dev sde, sector 67551064 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
Jun 3 20:45:23 xxx zed: eid=26 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a4b66962-part1
Jun 3 20:45:23 xxx zed: eid=27 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5001b444a44a4f38-part1
Jun 3 20:45:23 xxx zed: eid=28 class=io pool_guid=0xAF344193AF7F244C
Jun 3 20:45:23 xxx zed: eid=29 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x0000000000000000-part1
Jun 3 20:45:23 xxx zed: eid=30 class=io pool_guid=0xAF344193AF7F244C vdev_path=/dev/disk/by-id/wwn-0x5002538844584d30-part1
Jun 3 21:06:46 xxx kernel: [12468.149728] sd 0:0:5:0: [sdf] tag#194 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun 3 21:06:46 xxx kernel: [12468.149736] sd 0:0:5:0: [sdf] tag#194 CDB: Read(10) 28 00 1d cf 2f 80 00 00 08 00
Jun 3 21:06:46 xxx kernel: [12468.224035] sd 0:0:6:0: [sdg] tag#238 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jun 3 21:06:46 xxx kernel: [12468.224044] sd 0:0:6:0: [sdg] tag#238 CDB: Read(10) 28 00 1d cf 2f 80 00 00 08 00
Jun 3 21:06:46 xxx kernel: [12468.224050] blk_update_request: I/O error, dev sdg, sector 500117376 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
It does not seem to be a trim issue to another ZFS related issue. Before converting to zfs, I ran those disks as mdadm raid5 without any issues. Nevertheless, this seems to be isolated to the controller. 4 out of 5 disks are connected via the HBA (LSI 9211-8i) and all of them are throwing these error. It is very unlikely that all 4 SSDs all of a sudden have hardware issues.
I hope someone has an idea how to fix this although not this is not related to trim...
Someone from reddit mentioned this bug in response to a post i submitted inquiring if the Linux mpt3sas driver may have been responsible for a reproducible bug with zpool trim where using zpool trim with a single vdev mirror pool comprised of samsung 860 evo ssd's was causing controller resets with an LSI 9305-16i sas hba.
Running Gentoo Linux and 16+ years experience using it. Here's the server kernel config running Linux 5.4.38 and the kernel log result of attempting to use zpool trim with zfs_vdev_max_active=2
After recovering from the first round of disk resets and having set zfs_vdev_trim_max_active=1 zpool trim degraded the mirror pool due to more disk resets and faulted one disk in the pool.
Currently running zfs 0.8.4 but this was reproducable on previous zfs versions when i was willing to attempt testing it.
I'm not certain what to attempt from here but i know i'm not running zpool trim on my root disk pool set until further notice which this mirror pool certainly is not.
I'm likely going to try a pool rebuild and attempt to trim with a fresh pool and will report the results.
Here's the kernel logs post zfs_vdex_max_active=1
[ 89.440215] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[ 89.440218] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[ 89.440225] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[ 89.440229] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[ 89.440232] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[ 89.440262] sd 6:0:3:0: [sdd] tag#3457 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[ 89.440275] sd 6:0:3:0: [sdd] tag#3457 CDB: Write(10) 2a 00 1d 1c 57 80 00 00 08 00
[ 89.440284] blk_update_request: I/O error, dev sdd, sector 488396672 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[ 89.440292] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=222058971136 size=4096 flags=180ac0
[ 89.440311] sd 6:0:3:0: [sdd] tag#3456 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[ 89.440316] sd 6:0:3:0: [sdd] tag#3456 CDB: Write(10) 2a 00 1d 1c 55 80 00 00 08 00
[ 89.440321] blk_update_request: I/O error, dev sdd, sector 488396160 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[ 89.440325] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=222058708992 size=4096 flags=180ac0
[ 89.440333] sd 6:0:3:0: [sdd] tag#3519 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[ 89.440342] sd 6:0:3:0: [sdd] tag#3519 CDB: Write(10) 2a 00 03 42 7b 80 00 00 08 00
[ 89.440346] blk_update_request: I/O error, dev sdd, sector 54688640 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[ 89.440351] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=458752 size=4096 flags=180ac0
[ 89.440358] sd 6:0:3:0: [sdd] tag#3517 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[ 89.440362] sd 6:0:3:0: [sdd] tag#3517 CDB: Write(10) 2a 00 03 42 79 80 00 00 08 00
[ 89.440365] blk_update_request: I/O error, dev sdd, sector 54688128 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[ 89.440369] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=196608 size=4096 flags=180ac0
[ 89.992550] mpt3sas_cm0: fault_state(0x6004)!
[ 89.992554] mpt3sas_cm0: sending diag reset !!
[ 90.957467] mpt3sas_cm0: diag reset: SUCCESS
[ 91.021702] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 91.176534] mpt3sas_cm0: _base_display_fwpkg_version: complete
[ 91.176855] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[ 91.176856] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[ 91.176915] mpt3sas_cm0: sending port enable !!
[ 100.003767] mpt3sas_cm0: port enable: SUCCESS
[ 100.003867] mpt3sas_cm0: search for end-devices: start
[ 100.004889] scsi target6:0:0: handle(0x0019), sas_addr(0x300062b20394b740)
[ 100.004892] scsi target6:0:0: enclosure logical id(0x500062b20394b740), slot(3)
[ 100.004945] scsi target6:0:1: handle(0x001a), sas_addr(0x300062b20394b741)
[ 100.004947] scsi target6:0:1: enclosure logical id(0x500062b20394b740), slot(2)
[ 100.004990] scsi target6:0:2: handle(0x001b), sas_addr(0x300062b20394b742)
[ 100.004992] scsi target6:0:2: enclosure logical id(0x500062b20394b740), slot(0)
[ 100.005035] scsi target6:0:3: handle(0x001c), sas_addr(0x300062b20394b743)
[ 100.005037] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[ 100.005080] scsi target6:0:4: handle(0x001d), sas_addr(0x300062b20394b744)
[ 100.005082] scsi target6:0:4: enclosure logical id(0x500062b20394b740), slot(7)
[ 100.005129] scsi target6:0:5: handle(0x001e), sas_addr(0x300062b20394b745)
[ 100.005131] scsi target6:0:5: enclosure logical id(0x500062b20394b740), slot(6)
[ 100.005176] scsi target6:0:6: handle(0x001f), sas_addr(0x300062b20394b746)
[ 100.005178] scsi target6:0:6: enclosure logical id(0x500062b20394b740), slot(4)
[ 100.005220] scsi target6:0:7: handle(0x0020), sas_addr(0x300062b20394b747)
[ 100.005222] scsi target6:0:7: enclosure logical id(0x500062b20394b740), slot(5)
[ 100.005266] scsi target6:0:8: handle(0x0021), sas_addr(0x300062b20394b750)
[ 100.005268] scsi target6:0:8: enclosure logical id(0x500062b20394b740), slot(11)
[ 100.005312] scsi target6:0:10: handle(0x0022), sas_addr(0x300062b20394b752)
[ 100.005314] scsi target6:0:10: enclosure logical id(0x500062b20394b740), slot(8)
[ 100.005315] handle changed from(0x0023)!!!
[ 100.005357] scsi target6:0:9: handle(0x0023), sas_addr(0x300062b20394b751)
[ 100.005359] scsi target6:0:9: enclosure logical id(0x500062b20394b740), slot(10)
[ 100.005360] handle changed from(0x0022)!!!
[ 100.005403] scsi target6:0:11: handle(0x0024), sas_addr(0x300062b20394b753)
[ 100.005405] scsi target6:0:11: enclosure logical id(0x500062b20394b740), slot(9)
[ 100.005473] mpt3sas_cm0: search for end-devices: complete
[ 100.005474] mpt3sas_cm0: search for end-devices: start
[ 100.005475] mpt3sas_cm0: search for PCIe end-devices: complete
[ 100.005476] mpt3sas_cm0: search for expanders: start
[ 100.005477] mpt3sas_cm0: search for expanders: complete
[ 100.005483] mpt3sas_cm0: _base_fault_reset_work: hard reset: success
[ 100.005488] mpt3sas_cm0: removing unresponding devices: start
[ 100.005489] mpt3sas_cm0: removing unresponding devices: end-devices
[ 100.005490] mpt3sas_cm0: Removing unresponding devices: pcie end-devices
[ 100.005491] mpt3sas_cm0: removing unresponding devices: expanders
[ 100.005492] mpt3sas_cm0: removing unresponding devices: complete
[ 100.005497] mpt3sas_cm0: scan devices: start
[ 100.006118] mpt3sas_cm0: scan devices: expanders start
[ 100.006180] mpt3sas_cm0: break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
[ 100.006181] mpt3sas_cm0: scan devices: expanders complete
[ 100.006183] mpt3sas_cm0: scan devices: end devices start
[ 100.007760] mpt3sas_cm0: break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
[ 100.007761] mpt3sas_cm0: scan devices: end devices complete
[ 100.007762] mpt3sas_cm0: scan devices: pcie end devices start
[ 100.007780] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[ 100.007799] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[ 100.007803] mpt3sas_cm0: break from pcie end device scan: ioc_status(0x0021), loginfo(0x3003011d)
[ 100.007804] mpt3sas_cm0: pcie devices: pcie end devices complete
[ 100.007804] mpt3sas_cm0: scan devices: complete
[ 100.129679] sd 6:0:3:0: Power-on or device reset occurred
[ 100.131188] sd 6:0:0:0: Power-on or device reset occurred
[ 100.131196] sd 6:0:2:0: Power-on or device reset occurred
[ 100.132498] sd 6:0:1:0: Power-on or device reset occurred
[ 100.133699] sd 6:0:5:0: Power-on or device reset occurred
[ 100.137979] sd 6:0:10:0: Power-on or device reset occurred
[ 100.138342] sd 6:0:11:0: Power-on or device reset occurred
[ 100.138391] sd 6:0:4:0: Power-on or device reset occurred
[ 100.138713] sd 6:0:9:0: Power-on or device reset occurred
[ 100.141598] sd 6:0:6:0: Power-on or device reset occurred
[ 100.143354] sd 6:0:8:0: Power-on or device reset occurred
[ 100.168447] sd 6:0:7:0: Power-on or device reset occurred
[ 130.579449] sd 6:0:3:0: attempting task abort! scmd(0000000089c83e88)
[ 130.579461] sd 6:0:3:0: [sdd] tag#4988 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[ 130.579466] scsi target6:0:3: handle(0x001c), sas_address(0x300062b20394b743), phy(3)
[ 130.579470] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[ 130.579473] scsi target6:0:3: enclosure level(0x0000), connector name( )
[ 131.784364] mpt3sas_cm0: fault_state(0x6004)!
[ 131.784367] mpt3sas_cm0: sending diag reset !!
[ 132.749519] mpt3sas_cm0: diag reset: SUCCESS
[ 132.749573] mpt3sas_cm0: Command terminated due to Host Reset
[ 132.749576] mf:
[ 132.749578] 0100001c
[ 132.749579] 00000100
[ 132.749580] 00000000
[ 132.749581] 00000000
[ 132.749582] 00000000
[ 132.749582] 00000000
[ 132.749583] 00000000
[ 132.749584] 00000000
[ 132.749585]
[ 132.749586] 00000000
[ 132.749586] 00000000
[ 132.749587] 00000000
[ 132.749588] 00000000
[ 132.749589] 0000137d
[ 132.749603] sd 6:0:3:0: task abort: SUCCESS scmd(0000000089c83e88)
[ 132.813788] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[ 132.967821] mpt3sas_cm0: _base_display_fwpkg_version: complete
[ 132.968130] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[ 132.968132] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[ 132.968188] mpt3sas_cm0: sending port enable !!
[ 141.795517] mpt3sas_cm0: port enable: SUCCESS
[ 141.795614] mpt3sas_cm0: search for end-devices: start
[ 141.796664] scsi target6:0:0: handle(0x0019), sas_addr(0x300062b20394b740)
[ 141.796668] scsi target6:0:0: enclosure logical id(0x500062b20394b740), slot(3)
[ 141.796710] scsi target6:0:1: handle(0x001a), sas_addr(0x300062b20394b741)
[ 141.796713] scsi target6:0:1: enclosure logical id(0x500062b20394b740), slot(2)
[ 141.796754] scsi target6:0:2: handle(0x001b), sas_addr(0x300062b20394b742)
[ 141.796758] scsi target6:0:2: enclosure logical id(0x500062b20394b740), slot(0)
[ 141.796800] scsi target6:0:3: handle(0x001c), sas_addr(0x300062b20394b743)
[ 141.796803] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[ 141.796844] scsi target6:0:4: handle(0x001d), sas_addr(0x300062b20394b744)
[ 141.796846] scsi target6:0:4: enclosure logical id(0x500062b20394b740), slot(7)
[ 141.796887] scsi target6:0:5: handle(0x001e), sas_addr(0x300062b20394b745)
[ 141.796889] scsi target6:0:5: enclosure logical id(0x500062b20394b740), slot(6)
[ 141.796930] scsi target6:0:6: handle(0x001f), sas_addr(0x300062b20394b746)
[ 141.796932] scsi target6:0:6: enclosure logical id(0x500062b20394b740), slot(4)
[ 141.796973] scsi target6:0:7: handle(0x0020), sas_addr(0x300062b20394b747)
[ 141.796975] scsi target6:0:7: enclosure logical id(0x500062b20394b740), slot(5)
[ 141.797016] scsi target6:0:8: handle(0x0021), sas_addr(0x300062b20394b750)
[ 141.797018] scsi target6:0:8: enclosure logical id(0x500062b20394b740), slot(11)
[ 141.797064] scsi target6:0:10: handle(0x0022), sas_addr(0x300062b20394b752)
[ 141.797066] scsi target6:0:10: enclosure logical id(0x500062b20394b740), slot(8)
[ 141.797107] scsi target6:0:9: handle(0x0023), sas_addr(0x300062b20394b751)
[ 141.797109] scsi target6:0:9: enclosure logical id(0x500062b20394b740), slot(10)
[ 141.797150] scsi target6:0:11: handle(0x0024), sas_addr(0x300062b20394b753)
[ 141.797153] scsi target6:0:11: enclosure logical id(0x500062b20394b740), slot(9)
[ 141.797219] mpt3sas_cm0: search for end-devices: complete
[ 141.797220] mpt3sas_cm0: search for end-devices: start
[ 141.797222] mpt3sas_cm0: search for PCIe end-devices: complete
[ 141.797224] mpt3sas_cm0: search for expanders: start
[ 141.797225] mpt3sas_cm0: search for expanders: complete
[ 141.797233] mpt3sas_cm0: _base_fault_reset_work: hard reset: success
[ 141.797238] mpt3sas_cm0: removing unresponding devices: start
[ 141.797239] mpt3sas_cm0: removing unresponding devices: end-devices
[ 141.797241] mpt3sas_cm0: Removing unresponding devices: pcie end-devices
[ 141.797242] mpt3sas_cm0: removing unresponding devices: expanders
[ 141.797243] mpt3sas_cm0: removing unresponding devices: complete
[ 141.797250] mpt3sas_cm0: scan devices: start
[ 141.797804] mpt3sas_cm0: scan devices: expanders start
[ 141.797863] mpt3sas_cm0: break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
[ 141.797864] mpt3sas_cm0: scan devices: expanders complete
[ 141.797865] mpt3sas_cm0: scan devices: end devices start
[ 141.799307] mpt3sas_cm0: break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
[ 141.799308] mpt3sas_cm0: scan devices: end devices complete
[ 141.799309] mpt3sas_cm0: scan devices: pcie end devices start
[ 141.799326] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[ 141.799345] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[ 141.799348] mpt3sas_cm0: break from pcie end device scan: ioc_status(0x0021), loginfo(0x3003011d)
[ 141.799349] mpt3sas_cm0: pcie devices: pcie end devices complete
[ 141.799350] mpt3sas_cm0: scan devices: complete
[ 141.923008] sd 6:0:2:0: Power-on or device reset occurred
[ 141.923075] sd 6:0:3:0: Power-on or device reset occurred
[ 141.923116] sd 6:0:0:0: Power-on or device reset occurred
[ 141.923201] sd 6:0:1:0: Power-on or device reset occurred
[ 141.928761] sd 6:0:9:0: Power-on or device reset occurred
[ 141.928790] sd 6:0:11:0: Power-on or device reset occurred
[ 141.956396] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[ 141.958947] sd 6:0:7:0: Power-on or device reset occurred
[ 141.959101] sd 6:0:8:0: Power-on or device reset occurred
[ 141.959145] sd 6:0:10:0: Power-on or device reset occurred
[ 141.959247] sd 6:0:4:0: Power-on or device reset occurred
[ 141.959296] sd 6:0:5:0: Power-on or device reset occurred
[ 141.959347] sd 6:0:6:0: Power-on or device reset occurred
[ 142.547967] sd 6:0:3:0: Power-on or device reset occurred
[ 8492.279205] logitech-hidpp-device 0003:046D:4051.0006: HID++ 4.5 device connected.
[51703.443968] logitech-hidpp-device 0003:046D:4076.0007: HID++ 4.1 device connected.
[69105.246529] br0: port 3(vnet1) entered blocking state
[69105.246532] br0: port 3(vnet1) entered disabled state
[69105.246596] device vnet1 entered promiscuous mode
[69105.246786] br0: port 3(vnet1) entered blocking state
[69105.246788] br0: port 3(vnet1) entered forwarding state
[83882.067828] br0: port 3(vnet1) entered disabled state
[83882.072255] device vnet1 left promiscuous mode
[83882.072260] br0: port 3(vnet1) entered disabled state
[89017.080568] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89017.080578] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89017.080610] sd 6:0:3:0: [sdd] tag#637 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89017.080615] sd 6:0:3:0: [sdd] tag#637 CDB: Write(10) 2a 00 0c 46 06 00 00 00 10 00
[89017.080619] blk_update_request: I/O error, dev sdd, sector 205915648 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
[89017.080622] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 error=5 type=2 offset=77428686848 size=8192 flags=180880
[89017.510016] mpt3sas_cm0: fault_state(0x6004)!
[89017.510033] mpt3sas_cm0: sending diag reset !!
[89018.474335] mpt3sas_cm0: diag reset: SUCCESS
[89018.538873] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[89018.694001] mpt3sas_cm0: _base_display_fwpkg_version: complete
[89018.694298] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[89018.694299] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[89018.694354] mpt3sas_cm0: sending port enable !!
[89027.520224] mpt3sas_cm0: port enable: SUCCESS
[89027.520315] mpt3sas_cm0: search for end-devices: start
[89027.521216] scsi target6:0:0: handle(0x0019), sas_addr(0x300062b20394b740)
[89027.521218] scsi target6:0:0: enclosure logical id(0x500062b20394b740), slot(3)
[89027.521256] scsi target6:0:1: handle(0x001a), sas_addr(0x300062b20394b741)
[89027.521257] scsi target6:0:1: enclosure logical id(0x500062b20394b740), slot(2)
[89027.521294] scsi target6:0:2: handle(0x001b), sas_addr(0x300062b20394b742)
[89027.521296] scsi target6:0:2: enclosure logical id(0x500062b20394b740), slot(0)
[89027.521333] scsi target6:0:3: handle(0x001c), sas_addr(0x300062b20394b743)
[89027.521335] scsi target6:0:3: enclosure logical id(0x500062b20394b740), slot(1)
[89027.521372] scsi target6:0:4: handle(0x001d), sas_addr(0x300062b20394b744)
[89027.521373] scsi target6:0:4: enclosure logical id(0x500062b20394b740), slot(7)
[89027.521412] scsi target6:0:5: handle(0x001e), sas_addr(0x300062b20394b745)
[89027.521414] scsi target6:0:5: enclosure logical id(0x500062b20394b740), slot(6)
[89027.521451] scsi target6:0:6: handle(0x001f), sas_addr(0x300062b20394b746)
[89027.521452] scsi target6:0:6: enclosure logical id(0x500062b20394b740), slot(4)
[89027.521489] scsi target6:0:7: handle(0x0020), sas_addr(0x300062b20394b747)
[89027.521490] scsi target6:0:7: enclosure logical id(0x500062b20394b740), slot(5)
[89027.521528] scsi target6:0:8: handle(0x0021), sas_addr(0x300062b20394b750)
[89027.521530] scsi target6:0:8: enclosure logical id(0x500062b20394b740), slot(11)
[89027.521568] scsi target6:0:10: handle(0x0022), sas_addr(0x300062b20394b752)
[89027.521569] scsi target6:0:10: enclosure logical id(0x500062b20394b740), slot(8)
[89027.521606] scsi target6:0:9: handle(0x0023), sas_addr(0x300062b20394b751)
[89027.521608] scsi target6:0:9: enclosure logical id(0x500062b20394b740), slot(10)
[89027.521650] scsi target6:0:11: handle(0x0024), sas_addr(0x300062b20394b753)
[89027.521651] scsi target6:0:11: enclosure logical id(0x500062b20394b740), slot(9)
[89027.521711] mpt3sas_cm0: search for end-devices: complete
[89027.521711] mpt3sas_cm0: search for end-devices: start
[89027.521712] mpt3sas_cm0: search for PCIe end-devices: complete
[89027.521713] mpt3sas_cm0: search for expanders: start
[89027.521714] mpt3sas_cm0: search for expanders: complete
[89027.521720] mpt3sas_cm0: _base_fault_reset_work: hard reset: success
[89027.521724] mpt3sas_cm0: removing unresponding devices: start
[89027.521725] mpt3sas_cm0: removing unresponding devices: end-devices
[89027.521725] mpt3sas_cm0: Removing unresponding devices: pcie end-devices
[89027.521726] mpt3sas_cm0: removing unresponding devices: expanders
[89027.521727] mpt3sas_cm0: removing unresponding devices: complete
[89027.521729] mpt3sas_cm0: scan devices: start
[89027.522200] mpt3sas_cm0: scan devices: expanders start
[89027.522252] mpt3sas_cm0: break from expander scan: ioc_status(0x0022), loginfo(0x310f0400)
[89027.522252] mpt3sas_cm0: scan devices: expanders complete
[89027.522253] mpt3sas_cm0: scan devices: end devices start
[89027.523490] mpt3sas_cm0: break from end device scan: ioc_status(0x0022), loginfo(0x310f0400)
[89027.523491] mpt3sas_cm0: scan devices: end devices complete
[89027.523492] mpt3sas_cm0: scan devices: pcie end devices start
[89027.523505] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[89027.523519] mpt3sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d)
[89027.523522] mpt3sas_cm0: break from pcie end device scan: ioc_status(0x0021), loginfo(0x3003011d)
[89027.523522] mpt3sas_cm0: pcie devices: pcie end devices complete
[89027.523523] mpt3sas_cm0: scan devices: complete
[89027.647728] sd 6:0:3:0: Power-on or device reset occurred
[89027.647808] sd 6:0:0:0: Power-on or device reset occurred
[89027.647892] sd 6:0:1:0: Power-on or device reset occurred
[89027.655001] sd 6:0:8:0: Power-on or device reset occurred
[89027.656901] sd 6:0:7:0: Power-on or device reset occurred
[89027.659283] sd 6:0:11:0: Power-on or device reset occurred
[89027.660610] sd 6:0:9:0: Power-on or device reset occurred
[89027.660802] sd 6:0:10:0: Power-on or device reset occurred
[89027.661496] sd 6:0:6:0: Power-on or device reset occurred
[89027.663245] sd 6:0:5:0: Power-on or device reset occurred
[89027.665317] sd 6:0:4:0: Power-on or device reset occurred
[89027.772630] sd 6:0:2:0: Power-on or device reset occurred
[89027.827594] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827600] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827610] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827613] sd 6:0:2:0: [sdc] tag#2055 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827615] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827619] sd 6:0:2:0: [sdc] tag#602 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827621] sd 6:0:2:0: [sdc] tag#605 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827623] sd 6:0:2:0: [sdc] tag#2055 CDB: Write(10) 2a 00 03 ad 0f 50 00 00 08 00
[89027.827625] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827627] sd 6:0:2:0: [sdc] tag#605 CDB: Write(10) 2a 00 14 90 18 00 00 00 08 00
[89027.827631] blk_update_request: I/O error, dev sdc, sector 61673296 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827634] sd 6:0:2:0: [sdc] tag#602 CDB: Write(10) 2a 00 0b 95 09 70 00 00 08 00
[89027.827640] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=3576602624 size=4096 flags=180880
[89027.827643] blk_update_request: I/O error, dev sdc, sector 344987648 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827645] sd 6:0:2:0: [sdc] tag#604 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827648] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827652] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=148633550848 size=4096 flags=180880
[89027.827654] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827658] blk_update_request: I/O error, dev sdc, sector 194316656 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827660] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827661] sd 6:0:2:0: [sdc] tag#692 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827666] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=71490002944 size=4096 flags=180880
[89027.827667] mpt3sas_cm0: log_info(0x31120440): originator(PL), code(0x12), sub_code(0x0440)
[89027.827669] sd 6:0:2:0: [sdc] tag#692 CDB: Write(10) 2a 00 0c 46 07 00 00 00 10 00
[89027.827673] blk_update_request: I/O error, dev sdc, sector 205915904 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
[89027.827674] sd 6:0:2:0: [sdc] tag#604 CDB: Write(10) 2a 00 11 86 04 d8 00 00 08 00
[89027.827675] sd 6:0:2:0: [sdc] tag#691 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827678] blk_update_request: I/O error, dev sdc, sector 293995736 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827681] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=77428817920 size=4096 flags=180880
[89027.827684] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=122525691904 size=4096 flags=180880
[89027.827684] sd 6:0:2:0: [sdc] tag#691 CDB: Write(10) 2a 00 09 1e 36 70 00 00 08 00
[89027.827687] blk_update_request: I/O error, dev sdc, sector 152974960 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827691] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=50323054592 size=4096 flags=180880
[89027.827693] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=77428822016 size=4096 flags=180880
[89027.827695] sd 6:0:2:0: [sdc] tag#601 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827699] sd 6:0:2:0: [sdc] tag#601 CDB: Write(10) 2a 00 11 86 04 d0 00 00 08 00
[89027.827700] sd 6:0:2:0: [sdc] tag#688 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827704] blk_update_request: I/O error, dev sdc, sector 293995728 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827705] sd 6:0:2:0: [sdc] tag#688 CDB: Write(10) 2a 00 13 57 db c8 00 00 08 00
[89027.827707] blk_update_request: I/O error, dev sdc, sector 324525000 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827710] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=122525687808 size=4096 flags=180880
[89027.827713] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=138156675072 size=4096 flags=180880
[89027.827718] sd 6:0:2:0: [sdc] tag#685 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89027.827720] sd 6:0:2:0: [sdc] tag#685 CDB: Write(10) 2a 00 0b 95 09 68 00 00 08 00
[89027.827721] blk_update_request: I/O error, dev sdc, sector 194316648 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89027.827723] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=71489998848 size=4096 flags=180880
[89028.522746] sd 6:0:2:0: Power-on or device reset occurred
[89028.560512] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560516] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560520] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560521] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560523] mpt3sas_cm0: log_info(0x31120b41): originator(PL), code(0x12), sub_code(0x0b41)
[89028.560544] sd 6:0:2:0: [sdc] tag#637 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[89028.560547] sd 6:0:2:0: [sdc] tag#637 CDB: Write(10) 2a 00 1d 1c 57 c8 00 00 08 00
[89028.560550] blk_update_request: I/O error, dev sdc, sector 488396744 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[89028.560553] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=222059008000 size=4096 flags=180ac0
[89028.560565] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=222058745856 size=4096 flags=180ac0
[89028.560569] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=495616 size=4096 flags=180ac0
[89028.560571] zio pool=deadpool vdev=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 error=5 type=2 offset=233472 size=4096 flags=180ac0
[89029.604982] mpt3sas_cm0: fault_state(0x6004)!
[89029.604985] mpt3sas_cm0: sending diag reset !!
[89030.569407] mpt3sas_cm0: diag reset: SUCCESS
[89030.634010] mpt3sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k
[89030.787981] mpt3sas_cm0: _base_display_fwpkg_version: complete
[89030.788253] mpt3sas_cm0: LSISAS3224: FWVersion(16.00.01.00), ChipRevision(0x01), BiosVersion(18.00.00.00)
[89030.788254] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[89030.788296] mpt3sas_cm0: sending port enable !!
Follow up zpool commands and general system info
fenrir ~ # zpool status -t deadpool
pool: deadpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0B in 0 days 00:00:48 with 0 errors on Fri Jun 26 04:00:49 2020
config:
NAME STATE READ WRITE CKSUM
deadpool DEGRADED 0 0 0
mirror-0 DEGRADED 1 2 0
ata-Samsung_SSD_860_EVO_250GB_xxxxxxxxxxxxxxx-part4 ONLINE 1 3 0 (untrimmed)
ata-Samsung_SSD_860_EVO_250GB_yyyyyyyyyyyyyyy-part4 FAULTED 0 14 0 too many errors (0% trimmed, started at Fri Jun 26 20:32:17 2020)
errors: No known data errors
fenrir ~ # cat /sys/module/zfs/parameters/zfs_vdev_trim_max_active
1
fenrir ~ # uname -a
Linux fenrir 5.4.38-gentoo #3 SMP Wed Jun 17 19:31:19 CDT 2020 x86_64 Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz GenuineIntel GNU/Linux
Update to my previous comment.
I migrated datasets off of the 250GB mirror pool to prepare for further testing,
Pool creation command string used
zpool create -f -o ashift=12 -O compression=lz4 -O xattr=sa -O relatime=on -O dedup=off deadpool2 mirror /dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_XXXXXXXXXXXXXXX /dev/disk/by-id/ata-Samsung_SSD_860_EVO_250GB_YYYYYYYYYYYYYYY
Results... zpool trim succeeded with no device reset errors, same ssd disks and same server. Bizarre...
I also tested attempting to rate limit trim using zpool trim -r initially to attempt to further rate limit the trim intensity expecting dmesg to fill with mpt3sas device reset errors however there were none after testing zpool trim -r and forcing several different trim rate limits or just allowing trim to proceed at full i/o speed.
# zpool status -t deadpool2
pool: deadpool2
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
deadpool2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-Samsung_SSD_860_EVO_250GB_XXXXXXXXXXXXXXX ONLINE 0 0 0 (100% trimmed, completed at Sat Jun 27 18:49:17 2020)
ata-Samsung_SSD_860_EVO_250GB_YYYYYYYYYYYYYYY ONLINE 0 0 0 (100% trimmed, completed at Sat Jun 27 18:49:17 2020)
errors: No known data errors