Zfs: No way to force a rollback (using -T option etc.) on a pool which is OK.

Created on 18 Aug 2018 · 12Comments · Source: openzfs/zfs

System information

Type | Version/Name
--- | ---
Distribution Name | Ubuntu
Distribution Version | 18.04.1 LTS
Linux Kernel | 4.15.0
Architecture | x86_64
ZFS Version | 0.7.9 (and others)
SPL Version | 0.7.9 (and others)

Describe the problem you're observing

Three days ago, I accidentally ran "zfs destroy ..." on a large filesystem on one of my pools (thereby destroying the only important filesystem in that pool.)
I quickly exported the pool to prevent further writing to it, so probably very little (if any) data should be overwritten. I have since booted from a live-usb (Ubuntu) and compiled several different versions of zfs, but since there is nothing wrong with the pool _per se_ it will not allow me to import it to a previous txg number (through the -T option, although I also specify the -F option).

Example:

root@ubuntu:~/source/zfs# cmd/zpool/zpool import -T 245332 -F -o readonly=on S_pool
cannot import 'S_pool': one or more devices is currently unavailable

(Obviously the pool imports with the following command:

cmd/zpool/zpool import -o readonly=on S_pool

, but since the data were destroy prior to the current txg, this is alas worthless to me...)

I also tried the patch described as #2452 (and elsewhere), but still got the same results, and think that this issue might be different since in my case the pool is OK, but not at the txg that I need...

Describe how to reproduce the problem

Create a pool, create two zfs sub-filesystems, destroy one of them, destroy or export the pool, import it back in to find a txg prior to "zfs-destroy". Export the pool again, and now try to import to that previous txg... No luck :-(

I've been struggling a lot now, but still think the data (or certainly the bulk of them) should be possible to retrieve if I was only able import the pool through using that previous txg as my current one. I would greatly appreciate any suggestion / updates that enables this option.

Best regards,
Jon Ivar

Feature

Source

jonryk

Most helpful comment

@jonryk while not widely known, you can use the -x dumpdir option with zdb(8) to request that it make a copy of every block read when importing the pool. This will include the uberblocks as well as any other critical pool data. This is similar in intent to e2image(8) utility you may be familiar with. This is mainly for analysis purposes.

behlendorf on 22 Aug 2018

👍2

All 12 comments

I now also tried with the current master branch:

root@ubuntu:~/source/zfs# cat /sys/module/zfs/version 
0.7.0-1512_g802715b74

Results were the same as before:

root@ubuntu:~/source/zfs# bin/zpool import -T 245332 -F -o readonly=on S_pool
cannot import 'S_pool': one or more devices is currently unavailable

(And as before, if I omit -T 245332 -F it imports just fine.)

I have now tried 0.7.5, 0.7.6, 0.7.9, and the current master (0.7.0-1512_g802715b74), but no luck...

jonryk on 18 Aug 2018

Doh! Another mistake... - I guess all is now irrevocably lost?

By accident (and probably through all of my unsuccessful attempts) it seems I have (perhaps on a couple of occasions) happened to have the pool mounted without the readonly-option set, and now the oldest uberblock I can find (through zdb -ul -e S_pool), appears to be from a couple of minutes after I had already destroyed the filesystem from the pool... 👎

I assume that means "all hope is lost" (at least for a simple layman as myself)? - I guess that even if I _could_ make the -T option work, I would never be able to roll back or access anything that was destroyed prior to the date of the first uberblock I can find through zdb -ul -e S_pool?

If anyone "Senior" could confirm this assumption (that I definitely need an older uberblock, and that zdb -ul -e S_pool would reveal all "available" uberblocks), I would appreciate such a feedback as well. (Since given such a confirmation, I would move on / start to spend my time trying to reassemble whatever I can from scratch, rather than futilely wasting time on trying to salvage data from this pool.)

Anyway; keep ut the good work and I love your great effort! (Although I obviously wish the -T option had worked, and I think my experience illustrates a case where that would have been very useful!)

jonryk on 19 Aug 2018

@jonryk oh no! Yes, I'm sorry to say that unless you made a copy of the uberblocks somewhere recovering this pool would be very challenging. The uberblocks contain all the possible root block pointers for the pool and without them there's no reasonable way to rollback.

As for -T I'm going to leave this issue open so we can investigate the behavior you observed. This option should work as a possible last resort and we should add some basic test coverage to verify that.

behlendorf on 22 Aug 2018

👍1

@behlendorf - Thanks a for your reply! Although you confirmed my suspicion that it is now too late to save my data, I appreciate that you'll keep this issue open and try to make this work for others who might end up in a similar situation - Thanks for your great work!

By the way / out of curiosity: Is there a command (or a fairly straight forward way) to export/import an uberblock, in order to quickly make/restore a copy of the uberblocks, such as you mentioned?
(That would perhaps have been a useful alternative for a case when a mistake such as mine was made?)

jonryk on 22 Aug 2018

behlendorf on 22 Aug 2018

👍2

I too could really use this functionality - not working for me atm, see here:

add mdbzfs (explore and undelete files from offline pool) - needed feature for brown paper bag "rm" moments #9313

zenaan on 13 Sep 2019

I'm running into this issue as well... My pool is fine, I just want to use -T to rollback and try to save a file which was deleted by accident by proxmox when restoring a VM. After I realized what proxmox did, I immediately stopped everyting, exported the pool, and I'm unable to import it with the -T for an earlier txg... keep getting the "one or more devices is currently unavailable".

If I do "zpool import" it imports just fine!

I'm going to give it a try on the hack code from #2452, and edit the DKMS zfs module and re-installing it on a clean debian 9, to see if I can rollback.

But it was a scary surprise to realize I can't rollback, specially considering #2452 is from 2014!

hradec on 21 Mar 2020

I'm running into this issue as well... My pool is fine, I just want to use -T to rollback and try to save a file which was deleted by accident by proxmox when restoring a VM. After I realized what proxmox did, I immediately stopped everyting, exported the pool, and I'm unable to import it with the -T for an earlier txg... keep getting the "one or more devices is currently unavailable".

If I do "zpool import" it imports just fine!

I'm going to give it a try on the hack code from #2452, and edit the DKMS zfs module and re-installing it on a clean debian 9, to see if I can rollback.

But it was a scary surprise to realize I can't rollback, specially considering #2452 is from 2014!

Having the same problem. Accidentally deleted all of my data by reverting to a very early snapshot.
Did you manage to make -T work?

Edit: It works, but only for a few transactions back (in my case ~100), depending if the uberblock is there or not.
So, I had to restore from an older backup instead.

sotiris-bos on 18 Apr 2020

I'm facing a similar situation, having accidentally overwritten recursive datasets using send/receive with an erroneous receive name. I immediately exported the pool and was then able to identify a txg preceding this unfortunate action (a pool scrub) from which I identified a remaining uberblock, but was unable to rewind to it using the zpool import -T option, as follows:

# zdb -hhe backpool
...
2020-06-14.00:24:08 zpool scrub backpool
  history command: 'zpool scrub backpool'
  history zone: 'linux'
  history who: 0
  history time: 1592087048
  history hostname: 'bckupsys'
unrecognized record:
  history internal str: 'errors=0'
  internal_name: 'scan done'
  history txg: 2620580
  history time: 1592115323
  history hostname: 'bckupsys'
...

# zdb -l -u -e backpool
...
    Uberblock[4]
        magic = 0000000000bab10c
        version = 5000
        txg = 2620580
        guid_sum = 11392655302192565697
        timestamp = 1592115323 UTC = Sun Jun 14 08:15:23 2020
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        labels = 0 1 2 3

# zpool import -N -o readonly=on -T 2620580 backpool
cannot import 'backpool': one or more devices is currently unavailable
...

@sotiris-bos

It works, but only for a few transactions back

Could you please detail the command that you used to make it work, and did you use any special trick to achieve it?
I think this would be of great help to many!

Thanks a lot.

tacticz on 17 Jun 2020

FYI, this worked for me in a simple test. I'm not sure what's different about other configurations that makes it sometimes not work.

$sudo zdb -lu /dev/sdc1 | grep "txg"
...
    txg = 18128
...
$ sudo zpool import -N -o readonly=on -T 18128 test

Note that in general there's no guarantee that txg's that are more than 4 back will work, because some of the blocks may have been overwritten. But I'm not sure if that would cause the one or more devices is currently unavailable error, or if that's due to a bug.

ahrens on 17 Jun 2020

@tacticz

I believe the "one or more devices is unavailable error" is because the uberblock is invalid/overwritten/unavailable.

Your zpool import command is correct, at least that is what I used as far as I remember, but just to be sure you may want to add "-d /dev/disk/by-id".

Basically, you need to find what the last good uberblock/txg is (if there is one before your problem occured). Maybe try zpool history -il to help you with the timestamps, but mainly you need to try all available txgs to possibly find a useable one.

I wish you the best!

sotiris-bos on 17 Jun 2020

Thank you @harens and @sotiris-bos for replying!
Unfortunately for me, trying all identified txgs (starting from the oldest "interesting" one and progressing to the last listed one) didn't succeed. The system keeps outputting the dreaded cannot import 'backpool: one or more devices is currently unavailable message whatever value is given to the -T option (even the latest one).
As suggested I tried adding the -d /dev/disk/by-id option, and even tried adding the -FX options, to no avail :-(
I think I'll have to give up all hopes of being able to recover lost data since I cannot hold back much longer the use of this backup pool, thanks anyways guys!