Type | Version/Name
--- | ---
Distribution Name | Debian
Distribution Version | zfs-0.8.1-pve2
Linux Kernel | 5.0.18-1-pve
Architecture | x86
ZFS Version | 0.8.1-pve1
SPL Version | 0.8.1-pve1
I accidentally added a disk to a pool in order to replace it, since the replace commands failed for some reason (disk name unresolvable or not found, can't remember). Now I cannot replace a failed device with the disk I already physically replaced in the same slot.
Now, even though I offline'd the new disk, I cannot remove the disk, as it throws errors (see below)
What is the recommended way to replace the disk and cleanup the apparently-unfixable-mess? There seems to be no force flags...
zpool add tank <new-disk>zpool offline tank <new-disk>
Before:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ FAULTED 33 0 0 too many errors
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2 ONLINE 0 0 0
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032 ONLINE 0 0 0
After:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ FAULTED 33 0 0 too many errors
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2 ONLINE 0 0 0
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032 ONLINE 0 0 0
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD DEGRADED 0 0 0 external device fault
Removing or replacing the disk fails:
$ zpool remove tank ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD
cannot remove ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD: operation not supported on this type of pool
$ zpool replace tank ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD
/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD is in use and contains a unknown filesystem.
You created a single drive vdev that is stripped with your raidz2 vdev. Normally you should not be able to do that (add a new top level vdev that has less redundancy than the existing vdevs) without the -f option to force.
Hm, apparently it's "should not". Went through my bash history and this is what I entered:
zpool add tank /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD, no -f flag.
@ccremer we're always looking to improve documentation. Can you share what led you to use the zpool add command rather than the zpool replace command?
I tried to replace the faulty disk with this command:
zpool replace tank ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD
For some reason (I don't have the output anymore) that failed, I think it was a not-found error or similar. When googling for "zfs replace disk" I land ofc on the Oracle docs: https://docs.oracle.com/cd/E19253-01/819-5461/gazgd/index.html, there the device name was entered directly. Since that didn't work, I thought "ah, maybe the pool needs to know/import the disk before replacing" and there you have it...
ok, I think this is where we can do better. The replace command is what should have succeeded, but it is possible that zpool was looking the wrong device directory /dev, rather than /dev/disk/by-id. So the "not found" error sent you off on a wild goose chase and that should not happen.
It is still unclear to me why '-f' wasn't required because that code has been there for 10+ years. Later this week I'll try to reproduce.
-- richard
That sums it up pretty much.
Looking over the manpages and the output of zpool status, I think there is now a mirror between the raidz0 vdev and the vdev that is now inconveniently called ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD, but the disk is not yet actually used yet as part of the same-named vdev. Am I understanding this correctly? If so, I could still replace the old faulted disk?
You created a single drive vdev that is mirrored with your raidz2 vdev.
Isn't the single drive in stripe with the raidz vdev ?
@richardelling I think it might be at least worth while to document a big "Do not use add instead of replace" in the replace section of the documentation... just to prevent mistakes...
@drescherjm Interesting note: Actually I just created some single disk pools and I can at least confirm single disk pools(and thus vdev) require -f to be created on MASTER. So that part of the code is right.
So if the -f requirement isn't enforced on add then it would be limited to add.
@ccremer It might be very important to note NOT to use the oracle docs, this is not Oracle ZFS (anymore), while most things might still work no guarantees.
@HiFiPhile According to these readouts its indeed in stripe with the raidz2 vdev
Edit
Looking into why force isn't working:
zpool add passes the force bool to make_root_vdev. But make_root_vdev wants an int, is B_True or B_False compatible with that?
Even if so, make_root_vdev doesn't seem to do much with the force argument....
It only passes it to is_device_in_use and thats about it.
I don't see any checks that require -f to prevent single disk vdev adds in any case.
@Ornias1993 I learned that the hard way now. However, I'm not sure which are the Docs for Zfsonlinux. In the wiki page https://github.com/zfsonlinux/zfs/wiki/Admin-Documentation I'm again presented with 3 different admin guides, including the Oracle one. And when using search engines, only the Oracle ones show up... So in a way it's hard to find the right online documentation. Coming from Kubernetes/Docker worlds I did not think of manpages, I always look for online documentation first.
Regardless of the -f flag and the docs, how could I fix the mess and replace the disk in the raidz2 vdev with the wrongly-added new disk? What's the recommended approach?
Google-foo showed me a guide:
http://blog.moellenkamp.org/archives/50-How-to-remove-a-top-level-vdev-from-a-ZFS-pool.html
Google-foo showed me a guide:
http://blog.moellenkamp.org/archives/50-How-to-remove-a-top-level-vdev-from-a-ZFS-pool.html
It's for Oracle ZFS, OpenZFS doesn't support remove for pools with RAIDZ in them now https://github.com/zfsonlinux/zfs/blob/master/man/man8/zpool-remove.8#L56
Yes, I also found this one, except the first command doesn't work:
$ zpool remove tank ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD
cannot remove ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD: operation not supported on this type of pool
@gmelikov The second vdev isn't raidz
@gmelikov The second vdev isn't raidz
When the primary pool storage includes a top-level raidz vdev only hot spare,
cache, and log devices can be removed.
@gmelikov SHOOT, my bad!
You created a single drive vdev that is mirrored with your raidz2 vdev.
Isn't the single drive in stripe with the raidz vdev ?
Yes you are correct, I guess I was a little sleepy replying last night..
And when using search engines, only the Oracle ones show up...
I find that annoying myself when searching for the documentation.
Coming from Kubernetes/Docker worlds I did not think of manpages, I always look for online documentation first.
This one made me laugh out loud :D
Keep it that way! :+1:
You dont need your bash history to find out what happened, zpool history <pool> should show you what happened to a pool, at any time, eg:
[root@taski ~]# zpool history zpool1 | head -n 10
History for 'zpool1':
2019-12-26.00:48:35 zpool create -o ashift=12 zpool2 /dev/sdb /dev/sdh
2019-12-26.00:51:53 zfs create zpool2/data
2019-12-26.00:52:51 zfs set relatime=on zpool2
2019-12-26.00:53:04 zfs set compression=on zpool2
2019-12-26.00:53:12 zfs set xattr=sa dnodesize=auto zpool2
2019-12-26.00:54:08 zfs destroy zpool2/data
2019-12-26.00:54:17 zfs create zpool2/data
2019-12-26.01:00:09 zfs snapshot -o com.sun:auto-snapshot-desc=- -r zpool2@zfs-auto-snap_frequent-2019-12-26-0000
2019-12-26.01:01:09 zfs snapshot -o com.sun:auto-snapshot-desc=- -r zpool2@zfs-auto-snap_hourly-2019-12-26-0001
nice! unfortunately I couldn't find the documentation in man zpool, or online. Apparently it supports some more flags, I wondered if they can filter for dates or events. Right now, the output takes a minute to generate, as I'm creating/deleting lots snapshots automatically, so the full output isn't helpful atm.
I also still haven't got an answer on how to fix my mess here. Is destroying and recreating the zpool the best option I have? How would I keep the snapshots then? ZFS send/receive?
You could send the output to a file and /or use grep to filter.
I also still haven't got an answer on how to fix my mess here.
Depends on how much extra storage you have. This is probably also a mailing list topic instead of a bug report.
@ccremer this seems a 2 part issue: how it happened, how can you fix it.
The _how it happened_ can be answered by having a look at the pool history, just grep for add and have a look, but that's not so much relevant in the end.
The _how can you fix it_, most likely, and unfortunately will be answered with: you cannot. Just make a new pool with a proper config. Pools with raidz vdevs do not support top level device removal unless its cache / logs. I for example use stripes of mirrors, and that allows me to do operations like the one you are trying. Once you have a new pool move the data from this one to the new one, and destroy this one.
Snapshots can be transferred with the dataset to the new pool, zfs send -R will do that for you.
How much data are we talking about here? Is it something you can recreate, eg: movies, tvshows, etc?
found it with the grep approach: 2019-12-24.13:52:40 zpool add tank /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD
I think I have some older 2TB drives laying around. looking with zfs list, it seems I have
Thank you guys for your comments. So unless you want to treat the "add without force" that @Ornias1993 mentioned as a bug, I think we can come to an end of this discussion here.
@ccremer
The -f tag not being required on add is definately a bug, but I think it's cleaner to create a new specific issue for it (you can use my quote if you like) and close this one :)
I don't see any checks that require -f to prevent single disk vdev adds in any case.
No, zpool_do_add passes !force to make_root_vdev in check_rep argument.
Are we sure history preserves the exact command?
edit: !false was typo, it is !force
If it does not then that would be the actual bug, I would say...
What good is a untrustworthy history for ?
On Thu, Jan 9, 2020 at 22:18, scineram notifications@github.com wrote:
I don't see any checks that require -f to prevent single disk vdev adds in any case.
No, zpool_do_add passes !false to make_root_vdev in check_rep argument.
Are we sure history preserves the exact command?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
I do not see an explicit ZTS test for this in zfs/tests/zfs-tests/functional/cli_root/zpool_add Am I blind?
@scineram Yes, but as I wrote above: make_root_vdev doesn't do much with the !false :
zpool add passes the force bool to make_root_vdev. But make_root_vdev wants an int, is B_True or B_False compatible with that?
Even if so, make_root_vdev doesn't seem to do much with the force argument....
It only passes it to is_device_in_use and thats about it.I don't see any checks that require -f to prevent single disk vdev adds in any case.
This is the second report suggesting zpool add does not work as intended with mismatched replication levels (#9038): i am going to mark this as duplicate so we don't forget to close both issues once this is fixed.
Duplicate of #9038
I have now inserted 3 drives into the same machine and made a new pool out of them. They are solely used as a transfer pool where I put the data temporarily. After that, I'll get rid of the disks again. Now I'm trying to send the data with zfs send, but with Encryption enabled at the same time on the transfer pool (the source is unencrypted yet, I would prefer to keep the data encrypted once I recreated the Raidz2)
So, my commands looks like this, but I cannot seem to get it started. Before I'm messing things up again, any advises are appreciated :)
$ zpool history transfer
2020-01-20.19:07:26 zpool create transfer /dev/disk/by-id/ata-WDC... /dev/disk/by-id/ata-WDC... /dev/disk/by-id/ata-WDC...
2020-01-20.19:12:56 zfs set compression=lz4 transfer
2020-01-20.19:20:31 zfs set aclinherit=passthrough transfer
2020-01-20.19:47:27 zfs create -o encryption=on -o keylocation=file:///path/to/zfs/transfer.key -o keyformat=passphrase transfer/data
$ zfs snap -r tank/data@backup-2020-01-20
$ zfs send -v -R tank/data@backup-2020-01-20 | zfs receive -o encryption=on -o keyformat=passphrase -o keylocation=file:///path/to/zfs/transfer.key transfer/data
cannot receive new filesystem stream: destination 'transfer/data' exists
must specify -F to overwrite it
# with -F
$ zfs send -v -R tank/data@backup-2020-01-20 | zfs receive -o encryption=on -o keyformat=passphrase -o keylocation=file:///path/to/zfs/transfer.key -F transfer/data
cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one
Should I destroy the transfer/data first?
What could possibly go wrong... from bad to worse
When I tried to copy the data to the new transfer pool, the host crashed with the following error:
Feb 1 21:19:08 vmm-1 kernel: [12098918.482173] audit: type=1400 audit(1580588348.450:33): apparmor="STATUS" operation="profile_replace" info="same as current profile, skippi
ng" profile="unconfined" name="/usr/bin/lxc-start" pid=15848 comm="apparmor_parser"
Feb 1 21:19:08 vmm-1 kernel: [12098918.701348] audit: type=1400 audit(1580588348.670:35): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="lxc-contai
ner-default-cgns" pid=15851 comm="apparmor_parser"
Feb 1 21:33:40 vmm-1 kernel: [12099790.507007] ata7.00: exception Emask 0x0 SAct 0x20000000 SErr 0x0 action 0x6 frozen
Feb 1 21:33:40 vmm-1 kernel: [12099790.507304] ata7.00: failed command: WRITE FPDMA QUEUED
Feb 1 21:33:40 vmm-1 kernel: [12099790.507601] ata7.00: cmd 61/08:e8:10:1e:8c/00:00:04:00:00/40 tag 29 ncq dma 4096 out
Feb 1 21:33:40 vmm-1 kernel: [12099790.507601] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 1 21:33:40 vmm-1 kernel: [12099790.508172] ata7.00: status: { DRDY }
Feb 1 21:33:40 vmm-1 kernel: [12099790.508468] ata7: hard resetting link
Feb 1 21:33:40 vmm-1 kernel: [12099790.849101] ata7.00: supports DRM functions and may not be fully accessible
Feb 1 21:33:40 vmm-1 kernel: [12099790.849482] ata7.00: NCQ Send/Recv Log not supported
Feb 1 21:33:40 vmm-1 kernel: [12099790.850187] ata7.00: supports DRM functions and may not be fully accessible
Feb 1 21:33:40 vmm-1 kernel: [12099790.850567] ata7.00: NCQ Send/Recv Log not supported
Feb 1 21:33:40 vmm-1 kernel: [12099790.851002] ata7.00: configured for UDMA/133
Feb 1 21:33:40 vmm-1 kernel: [12099790.851009] ata7: EH complete
After rebooting, importing the pool is not possible anymore:
root@vmm-1:/var/log# zpool import -f
pool: tank
id: 10161650679385460837
state: UNAVAIL
status: One or more devices are faulted.
action: The pool cannot be imported due to damaged devices or data.
config:
tank UNAVAIL insufficient replicas
raidz2-0 DEGRADED
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ UNAVAIL
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658 ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953 ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669 ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2 ONLINE
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032 ONLINE
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD FAULTED corrupted data
I still had the old disk, my hope was that once the raid leg was working again, it should have enough replicas. It actually comes online (until any scrubs run, then it should become too-many-write-errors or so):
root@vmm-1:~# zpool import -f
pool: tank
id: 10161650679385460837
state: UNAVAIL
status: One or more devices are faulted.
action: The pool cannot be imported due to damaged devices or data.
config:
tank UNAVAIL insufficient replicas
raidz2-0 ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658 ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953 ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669 ONLINE
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2 ONLINE
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032 ONLINE
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD FAULTED corrupted data
The disk is inserted and should work, I assume because I set it offline on Christmas this is now the result. I feel like the ZFS does a good enough job in documenting what each command does, but poorly about possible implications and gotchas :/
Is there ANY way I can retrieve some data out of it? Even a read-only no-snapshot corrupted view of the files would be sufficient for me, because I know that before setting it offline, it only holds about ~150 MB of possibly-corrupted data...
commands like zpool import -fFX also failed
Found this blog: https://serverfault.com/questions/562998/zfs-bringing-a-disk-online-in-an-unavailable-pool
Researching further, there is a zdb command, for which I'm trying to find a txg_id, so I can try importing with something like zpool import -o readonly=on -f -T [txg_id] tank.
root@vmm-1:~# zdb -e tank -v
Configuration for import:
vdev_children: 2
version: 5000
pool_guid: 10161650679385460837
name: 'tank'
state: 0
hostid: 2831157250
hostname: 'vmm-1'
vdev_tree:
type: 'root'
id: 0
guid: 10161650679385460837
children[0]:
type: 'raidz'
id: 0
guid: 17074537606798264386
nparity: 2
metaslab_array: 35
metaslab_shift: 37
ashift: 12
asize: 23991808425984
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 17647865786871571347
whole_disk: 1
DTL: 222
create_txg: 4
faulted: 1
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ-part2'
devid: 'ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ-part2'
phys_path: 'pci-0000:00:1f.2-ata-6'
children[1]:
type: 'disk'
id: 1
guid: 17282544705469633954
whole_disk: 1
DTL: 228
create_txg: 4
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658-part2'
devid: 'ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658-part2'
phys_path: 'pci-0000:00:1f.2-ata-1'
children[2]:
type: 'disk'
id: 2
guid: 8666471612038363978
whole_disk: 1
DTL: 227
create_txg: 4
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953-part2'
devid: 'ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953-part2'
phys_path: 'pci-0000:00:1f.2-ata-2'
children[3]:
type: 'disk'
id: 3
guid: 3867782762648460105
whole_disk: 1
DTL: 226
create_txg: 4
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669-part2'
devid: 'ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669-part2'
phys_path: 'pci-0000:00:1f.2-ata-3'
children[4]:
type: 'disk'
id: 4
guid: 15287609802282741202
whole_disk: 1
DTL: 225
create_txg: 4
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2-part2'
devid: 'ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2-part2'
phys_path: 'pci-0000:00:1f.2-ata-4'
children[5]:
type: 'disk'
id: 5
guid: 4786417980543102686
whole_disk: 1
DTL: 267
create_txg: 4
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032-part1'
devid: 'ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032-part1'
phys_path: 'pci-0000:00:1f.2-ata-5'
children[1]:
type: 'disk'
id: 1
guid: 14041696495835651738
whole_disk: 1
metaslab_array: 73622
metaslab_shift: 34
ashift: 12
asize: 4000771997696
is_log: 0
create_txg: 23976332
degraded: 1
aux_state: 'external'
path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD-part1'
devid: 'ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD-part1'
phys_path: 'pci-0000:04:00.0-ata-2'
load-policy:
load-request-txg: 18446744073709551615
load-rewind-policy: 2
zdb: can't open 'tank': No such device or address
ZFS_DBGMSG(zdb) START:
spa.c:5490:spa_import(): spa_import: importing tank
spa_misc.c:408:spa_load_note(): spa_load(tank, config trusted): LOADING
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): vdev tree has 1 missing top-level vdevs.
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): current settings allow for maximum 0 missing top-level vdevs at this stage.
spa_misc.c:393:spa_load_failed(): spa_load(tank, config untrusted): FAILED: unable to open vdev tree [error=6]
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: root, guid: 10161650679385460837, path: N/A, can't open
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: raidz, guid: 17074537606798264386, path: N/A, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 17647865786871571347, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 17282544705469633954, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 2: disk, guid: 8666471612038363978, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 3: disk, guid: 3867782762648460105, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 4: disk, guid: 15287609802282741202, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 5: disk, guid: 4786417980543102686, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032-part1, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 14041696495835651738, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD-part1, faulted
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): UNLOADING
ZFS_DBGMSG(zdb) END
Is the create_txg: 23976332 on the last disk something of interest? Would that import the pool before it was added as a disk?
So you moved it to the transfer pool, that went fine and while transfering to your new pool everything it died?
In that case I dont get your attached screenshot, because it shows the old pool with the seperate (single) drive attatched.
Not quite. I started copying to the transfer pool, but it didn't complete. It ran for ca. 30min, then the host crashed.
Well in that case this is a good threat to point people towards that added a single disk or a raid 0 :)
Found this issue #9313
Basically it mentions this blog: https://www.delphix.com/blog/openzfs-pool-import-recovery
So I tried the procedure with zdb
zdb -e tank -G -X tank
zdb: can't open 'tank': No such device or address
ZFS_DBGMSG(zdb) START:
spa.c:5493:spa_import(): spa_import: importing tank, max_txg=-1 (RECOVERY MODE)
spa_misc.c:408:spa_load_note(): spa_load(tank, config trusted): LOADING
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): vdev tree has 1 missing top-level vdevs.
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): current settings allow for maximum 0 missing top-level vdevs at this stage.
spa_misc.c:393:spa_load_failed(): spa_load(tank, config untrusted): FAILED: unable to open vdev tree [error=6]
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: root, guid: 10161650679385460837, path: N/A, can't open
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: raidz, guid: 17074537606798264386, path: N/A, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 17647865786871571347, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 17282544705469633954, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 2: disk, guid: 8666471612038363978, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 3: disk, guid: 3867782762648460105, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 4: disk, guid: 15287609802282741202, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 5: disk, guid: 4786417980543102686, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032-part1, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 14041696495835651738, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD-part1, faulted
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): UNLOADING
ZFS_DBGMSG(zdb) END
cd /lib
ln -s libzpool.so.2 libzpool.so
With the zfs_max_missing_tvds it looks like it could be possible to import:
root@vmm-1:~# zdb -e tank -G -o zfs_max_missing_tvds=1 -X tank
Dataset mos [META], ID 0, cr_txg 4, 2.19G, 9413 objects
Object lvl iblk dblk dsize dnsize lsize %full type
0 4 16K 16K 65.0M 512 820M 0.56 DMU dnode
ZFS_DBGMSG(zdb) START:
spa.c:5493:spa_import(): spa_import: importing tank, max_txg=-1 (RECOVERY MODE)
spa_misc.c:408:spa_load_note(): spa_load(tank, config trusted): LOADING
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): vdev tree has 1 missing top-level vdevs.
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): current settings allow for maximum 1 missing top-level vdevs at this stage.
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: root, guid: 10161650679385460837, path: N/A, degraded
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: raidz, guid: 17074537606798264386, path: N/A, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 17647865786871571347, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 17282544705469633954, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 2: disk, guid: 8666471612038363978, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 3: disk, guid: 3867782762648460105, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 4: disk, guid: 15287609802282741202, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 5: disk, guid: 4786417980543102686, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032-part1, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 14041696495835651738, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD-part1, faulted
vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658-part2': best uberblock found for spa tank. txg 24675614
spa_misc.c:408:spa_load_note(): spa_load(tank, config untrusted): using uberblock with txg=24675614
spa_misc.c:408:spa_load_note(): spa_load(tank, config trusted): vdev tree has 1 missing top-level vdevs.
spa_misc.c:408:spa_load_note(): spa_load(tank, config trusted): current settings allow for maximum 1 missing top-level vdevs at this stage.
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: root, guid: 10161650679385460837, path: N/A, degraded
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: raidz, guid: 17074537606798264386, path: N/A, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 0: disk, guid: 17647865786871571347, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 17282544705469633954, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 2: disk, guid: 8666471612038363978, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 3: disk, guid: 3867782762648460105, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 4: disk, guid: 15287609802282741202, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2-part2, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 5: disk, guid: 4786417980543102686, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032-part1, healthy
vdev.c:179:vdev_dbgmsg_print_tree(): vdev 1: disk, guid: 14041696495835651738, path: /dev/disk/by-id/ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD-part1, faulted
spa_misc.c:408:spa_load_note(): spa_load(tank, config trusted): spa_load_verify found 0 metadata errors and 1 data errors
spa_misc.c:408:spa_load_note(): spa_load(tank, config trusted): LOADED
spa.c:7592:spa_async_request(): spa=tank async request task=32
ZFS_DBGMSG(zdb) END
but the zdb doesn't actually import. Is there a way to import with zfs_max_missing_tvds=1? According to the blog he uses mdb to modify that, but it's not available on my Debian, and https://github.com/max123/mdbzfs was built for Solaris, cannot execute on my machine.
How/Where can I get a version of mdb for Linux, or how is it possible in other way to import with that restriction off?
For me it seems illogical that you can put disks offline and ZFS continues to work even with Data corruption. But add a node crash/reboot and ZFS is suddenly like "Nah, I don't feel like working at all". This is where ZFS disappoints me now. There should be an easier way to import faulty pools even with possible corruption, I mean it was running like that before the reboot...
To anyone who is googling for this issue, has a similar problem and tries to import an otherwise healthy pool with missing top level device:
This is a summary how I managed to rescue my data after accidentally adding a replacement disk to a pool instead of replacing a faulty one.
zpool replace I got thrown errors about disk not found. In an attempt to "fix" this I made a mistake and made everything worse: I executed zpool add <new-disk>. That resulted in Stripe between the degraded raidz2 and the new disk, without replacing the old one. Once done, this process is irreversible.zpool offline command. ZFS stopped striping, but apparently that left some data corrupted. The pool was otherwise still readable and writable.zfs send | zfs receive the node crashed during the copy for an unknown reason and the pool cannot be imported anymore.zpool add -f. Right now, apparently this flag wouldn't do anything. This is considered a (duplicate) bug and handled in #9038Obviously this only works when the damage done is not too much i.e. most of the data is on one leg. In my case I had years of data on the raidz2 pool before adding a vdev for stripe.
All the flags documented in zpool import are useless in my case. Every combination (-F, -X, -T, etc) failed with the error The pool cannot be imported due to damaged devices or data. or similar.
Reading upon this excellent blog (https://www.delphix.com/blog/openzfs-pool-import-recovery) I discovered the zdb tool, which confirmed the cause of the failing import: ZFS would simply not import a pool when a top-level vdev is missing and has insufficient replicas.
I know that I put the zpool offline command fairly immediately after adding the disk, from which I always believed that the data should be accessible and intact, since the Raidz2 leg was otherwise healthy.
Luckily, there are Module parameters that allowed me to import the pool. They are documented here: https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters. It just happens that the required parameter is not listed in the page 🤦♂ , but it's in man zfs-module-parameters.
The workaround is actually easy, but I first to had to research and get to know about it.
# Allow a pool import even with missing top level vdevs
$ echo 1 >> /sys/module/zfs/parameters/zfs_max_missing_tvds
# to make that persistent, add an entry in `/etc/modprobe.d/zfs.conf`
# Then, let's import the tank read-only, to minimize risk of further data corruption.
$ zpool import -o readonly=on tank
$ zpool status
pool: tank
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 1M in 0 days 06:39:23 with 0 errors on Sun Nov 10 07:03:25 2019
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2NN9JNJ ONLINE 0 0 129 # This is the old disk that was actually marked faulty by a scrub, it's bound to happen again.
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0555658 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0535953 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0527669 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5JEC5V2 ONLINE 0 0 0
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K4AR0032 ONLINE 0 0 0
ata-WDC_WD40EFRX-68N32N0_WD-WCC7K5VH43CD FAULTED 0 0 0 external device fault
errors: No known data errors
Now I can rescue my data with rsync etc. At this point, I don't even care about snapshots or encryption anymore, I'm just happy that they're not lost completely. It was a hard time for me to research and figure out a way to access the data again. Even if the workaround contains about 3 lines of commands in the end 🙄
zpool add). recreate if needed. This would ensure that you have at least one recent data backup.zpool replace when managing disk failures.This might seem common sense. But we are all human and make mistakes here and there ;)
Thanks for the help to everyone involved here. I think I can go on now and maybe you also learned a thing or two.
@ccremer please file an issue against the zfs_module_parameters wiki page for the missing module parameter and assign it to me.
Most helpful comment
@Ornias1993 I learned that the hard way now. However, I'm not sure which are the Docs for Zfsonlinux. In the wiki page https://github.com/zfsonlinux/zfs/wiki/Admin-Documentation I'm again presented with 3 different admin guides, including the Oracle one. And when using search engines, only the Oracle ones show up... So in a way it's hard to find the right online documentation. Coming from Kubernetes/Docker worlds I did not think of manpages, I always look for online documentation first.
Regardless of the -f flag and the docs, how could I fix the mess and replace the disk in the raidz2 vdev with the wrongly-added new disk? What's the recommended approach?