zfs receive fails receiving a deduplicated stream "invalid backup stream"

Created on 25 Mar 2014 · 37Comments · Source: openzfs/zfs

"zfs receive" fails when receiving a deduplicated stream with snapshots with "invalid backup stream"
When the stream isn´t deduplicated it runs smooth.
To be sure, i patched another pool (part2 of this issue) with the newest pull request from @dweeezil (Recent illumos/Delphix features #2149) but the problem persists.

Linux Suse122 3.7.7-2-default #1 SMP Wed Feb 13 09:33:05 UTC 2013 (d55bd47) x86_64 x86_64 x86_64 GNU/Linux
ZFS: Loaded module v0.6.2-219_ga15dac4, ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.6.2-23_g4c99541

NAME USED AVAIL REFER MOUNTPOINT
stor2/zfstest 4.67G 102G 4.67G /stor2/zfstest
stor2/zfstest3 49.9G 102G 49.8G /stor2/zfstest3
stor2/zfstest3@394806312 736K stor2/zfstest3@421188256 976K stor2/zfstest3@484463391 992K stor2/zfstest3@618833865 1.28M stor2/zfstest3@579117221 1.41M stor2/zfstest3@516414280 1.54M stor2/zfstest3@967496177 1.68M stor2/zfstest3@636523150 1.77M stor2/zfstest3@672405261 1.87M stor2/zfstest3@949169779 1.99M stor2/zfstest3@688080912 2.33M stor2/zfstest3@522831668 2.91M stor2/zfstest3@493854542 3.20M stor2/zfstest3@151707443 3.32M stor2/zfstest3@194586695 3.46M stor2/zfstest3@493432740 3.58M stor2/zfstest3@144417120 3.68M stor2/zfstest3@138155230 3.80M stor2/zfstest3@122410522 3.98M stor2/zfstest3@967710983 1.86M stor2/zfstest3@now 0 - 768K -
- 1.94G -
- 4.26G -
- 7.45G -
- 9.58G -
- 12.8G -
- 16.0G -
- 18.2G -
- 20.5G -
- 23.1G -
- 25.8G -
- 28.5G -
- 31.1G -
- 33.8G -
- 36.4G -
- 39.1G -
- 41.7G -
- 44.4G -
- 47.0G -
- 49.7G -
- 49.8G -

Suse122:/stor2/zfstest # zfs send -R -D stor2/zfstest3@now > /stor2/zfstest/test.zfs
Suse122:/stor2/zfstest # zfs destroy -r stor2/zfstest3

Suse122:/stor2/zfstest # zfs recv -v stor2/zfstest3 < /stor2/zfstest/test.zfs
receiving full stream of stor2/zfstest3@394806312 into stor2/zfstest3@394806312
received 2.11MB stream in 5 seconds (432KB/sec)
receiving incremental stream of stor2/zfstest3@421188256 into stor2/zfstest3@421188256
received 249MB stream in 14 seconds (17.8MB/sec)
receiving incremental stream of stor2/zfstest3@484463391 into stor2/zfstest3@484463391
cannot receive incremental stream: invalid backup stream

Linux linux-ts3r 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC 2014 (750023e) x86_64 x86_64 x86_64 GNU/Linux
ZFS: Loaded module v0.6.2-201_g8eac9fb, ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.6.2-23_g4c99541

linux-ts3r:/etc # zpool get all
NAME PROPERTY VALUE SOURCE
tank size 496G -
tank capacity 0% -
tank altroot - default
tank health ONLINE -
tank guid 8224401125096643018 default
tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupditto 0 default
tank dedupratio 1.00x -
tank free 496G -
tank allocated 126K -
tank readonly off -
tank ashift 0 default
tank comment - default
tank expandsize 0 -
tank freeing 0 default
tank feature@async_destroy enabled local
tank feature@empty_bpobj enabled local
tank feature@lz4_compress enabled local
tank feature@spacemap_histogram active local
tank feature@enabled_txg active local
tank feature@hole_birth active local
tank feature@extensible_dataset enabled local
tank feature@bookmarks enabled local

NAME USED AVAIL REFER MOUNTPOINT
tank 49.8G 438G 32K /tank
tank/zfstest 47K 438G 47K /tank/zfstest
tank/zfstest2 49.8G 438G 49.8G /tank/zfstest2
tank/zfstest2@300726135 242K - 2.24G -
tank/zfstest2@296730124 470K - 4.81G -
tank/zfstest2@224726639 771K - 7.39G -
tank/zfstest2@902427907 908K - 9.98G -
tank/zfstest2@540759649 1.07M - 12.6G -
tank/zfstest2@255536517 1.49M - 15.1G -
tank/zfstest2@199845779 1.62M - 17.7G -
tank/zfstest2@566124845 1.74M - 20.3G -
tank/zfstest2@956409770 1.87M - 22.8G -
tank/zfstest2@037434686 2.14M - 25.5G -
tank/zfstest2@301863766 2.63M - 28.1G -
tank/zfstest2@725218939 2.92M - 30.7G -
tank/zfstest2@694801830 3.06M - 33.2G -
tank/zfstest2@643148554 3.20M - 35.9G -
tank/zfstest2@492741743 3.33M - 38.5G -
tank/zfstest2@243681899 3.46M - 41.1G -
tank/zfstest2@659482439 3.59M - 43.7G -
tank/zfstest2@999733757 3.75M - 46.2G -
tank/zfstest2@921130344 3.69M - 48.9G -

linux-ts3r:/tank/zfstest # zfs snap tank/zfstest2@now
linux-ts3r:/tank/zfstest # zfs send -R -D tank/zfstest2@now > /tank/zfstest/testD.zfs
linux-ts3r:/tank/zfstest # zfs destroy -r tank/zfstest

zfs recv -v tank/zfstest2 < /tank/zfstest/testD.zfs
receiving full stream of tank/zfstest2@300726135 into tank/zfstest2@300726135
linux-ts3r:/tank/zfstest # zfs recv -v tank/zfstest2 < /tank/zfstest/testD.zfs
receiving full stream of tank/zfstest2@300726135 into tank/zfstest2@300726135
received 283MB stream in 19 seconds (14.9MB/sec)
receiving incremental stream of tank/zfstest2@296730124 into tank/zfstest2@296730124
received 319MB stream in 13 seconds (24.6MB/sec)
receiving incremental stream of tank/zfstest2@224726639 into tank/zfstest2@224726639
cannot receive incremental stream: invalid backup stream

Source

pyavdr

Most helpful comment

Just a heads-up for anyone stumbling upon this issue by searching for "cannot receive incremental stream: invalid backup stream": This error message _can also_ occur if you have something running (like a cron job) that automatically takes snapshots on the pool you are zfs-receiving against.

In my case, I use puppet to automatically configure a bunch of Linux boxes that all have a standard config. I have a pool named 'tank' on all of them, and during normal production use zfs-auto-snapshot runs every 15 minutes to ensure customer data is protected. I was restoring data to a newly created 'tank' with much larger drives. The zfs receive would run for a while, then zfs-auto-snapshot would snapshot, and the next snapshot in the received stream would cause this error.

darkpixel on 3 Jul 2016

👍6

All 37 comments

There is an old issue (https://www.illumos.org/issues/755), describing an "invalid backup stream" failure, but this should be already solved and included in ZOL sources, isn´t it?

pyavdr on 25 Mar 2014

@pyavdr We did pick up the fix for the issue you references a long time ago. We'll need to dig in to this and see what the cause is.

behlendorf on 25 Mar 2014

I needed to rearrange my partitions, so I did a ZFS send to copy off my ZFS data, repartitioned, and then tried to ZFS recv that data back. Unfortunately I seem to have independently found this bug. Now that the original ZFS data has been overwritten is my only choice restoring data from backups, or is there some sort of workaround? I tried using zfs-fuse but it said "cannot receive: stream has unsupported feature". Would maybe compiling ZFS for myself make the receive work for me?

crowtrobot on 8 Jul 2014

It maybe possible to recv with another Zfs system. It isnt clear wether the bug happens on the send or the recv side. If it happens on recv side, your chances are better for a good recv with another system which supports deduplicated streams. If you have a valid backup try again without dedup the stream.

@behlendorf
Brian, since this bug is severe, how can i help? Strace is at #2453 . Quick producer? There is a stack at #2453 does it help?

pyavdr on 8 Jul 2014

If it helps at all, the problem appears to be in ZFS send. I setup a VM with nas4free (based on FreeBSD) and tried the ZFS receive from there. It failed with the same error at about the same point (~17 GB in). I will begin restoring from my bacula backups now.

I suggest that a warning be printed when a user tries to use the ZFS send -D option. I have decent backups but the next person might not be so lucky.

Also I would like to help if I can. I will try this weekend to create a smaller test case that doesn't contain all my personal data. I am not a programmer but might at least be able to help test potential fixes.

crowtrobot on 9 Jul 2014

I am experiencing the same problem on the official 0.6.3 release for Debian and as well on the nightly build 0.6.3-10~672692~wheezy.

The issue can be simply reproduced (suppose backup_pool is an empty pool and the copied file is larger than pool block size; I am using a small deb archive here):

cp python-gitzilla_2.0-1_all.deb /backup_pool/p.deb
zfs create backup_pool/test
zfs snapshot -r backup_pool@snap1
cp python-gitzilla_2.0-1_all.deb /backup_pool/test/p1.deb
zfs snapshot -r backup_pool@snap2
cp python-gitzilla_2.0-1_all.deb /backup_pool/test/p2.deb
zfs snapshot -r backup_pool@snap3
cp python-gitzilla_2.0-1_all.deb /backup_pool/test/p3.deb
zfs snapshot -r backup_pool@snap4
zfs send -RDv backup_pool@snap4 > dedup_snapshot
zfs destroy -r backup_pool
zfs receive -vF backup_pool < dedup_snapshot

The output from the last command is as follows:

receiving full stream of backup_pool@snap1 into backup_pool@snap1
received 22,0KB stream in 2 seconds (11,0KB/sec)
receiving incremental stream of backup_pool@snap2 into backup_pool@snap2
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of backup_pool@snap3 into backup_pool@snap3
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of backup_pool@snap4 into backup_pool@snap4
received 312B stream in 1 seconds (312B/sec)
receiving full stream of backup_pool/test@snap1 into backup_pool/test@snap1
received 39,9KB stream in 1 seconds (39,9KB/sec)
receiving incremental stream of backup_pool/test@snap2 into backup_pool/test@snap2
cannot receive incremental stream: invalid backup stream

The problem is most probably on the receiving side because the same error is reported when trying to receive an older stream created by a 0.6.1 (?) version which I was able to successfully receive earlier.

mantlik on 13 Aug 2014

For those stacked like me with no help from the ZFS for Linux team:

If you have the deduped full stream of the entire pool, this regression can be worked around by downgrading to version 0.6.2. The procedure is as follows (for Debian system with installed DKMS SPL/ZFS version 0.6.3):

Destroy the pool you can not receive in. (This is hopefully not needed if you did not upgrade your pool after installing 0.6.3 version).
Completely uninstall (purge) ZFS and SPL and all related packages from your system. Do not reboot!
Remove ZFS sources from /etc/apt/sources.d and run "aptitude update".
Install grub or grub2 BEFORE system restart. Check that Grub is installed correctly or your system will NOT boot anymore.
Reboot.
Follow the steps described here: http://zfsonlinux.org/generic-deb.html - KMOD packages - from GIT.
IMPORTANT - before each ./autogen.sh you need to checkout 0.6.2 version by "git checkout spl-0.6.2" resp. "git checkout zfs-0.6.2".
Create your pool destroyed in the first step.
Receive pool content from the dedup stream. The "Invalid backup stream" message should not appear again.

Recommendation: If you are not sure that you backup stream is OK, backup your existing pool without -D option if possible before destroying it.

Hope it will help.

mantlik on 20 Aug 2014

Sorry guys, I've been working on other issues. I've bumped this to the 0.6.4 tag and we'll definitely try to get it resolved since it's a regression.

behlendorf on 20 Aug 2014

Brian, thank you very much for your effort in ZFS for Linux development. In order to help you at least a little with the resolution of this issue (I am not a C programmer), I made bisect search over the ZFS Git repository using a Debian virtual machine.

The first commit which failed to receive the deduped stream (see my reproducing procedure above) is this one:

https://github.com/zfsonlinux/zfs/commit/13fe019870c8779bf2f5b3ff731b512cf89133ef

This can be with relatively high probability source of the regression. Unfortunatelly, this commit introduces a complex code change.

The previous commit

https://github.com/zfsonlinux/zfs/commit/6f1ffb06655008c9b519108ed29fbf03acd6e5de

received the deduped stream without problems.

Hope this could help.

mantlik on 22 Aug 2014

The problem is triggered here: https://github.com/zfsonlinux/zfs/blob/0426c16804e572f088dd293029ab39fa75e1e203/module/zfs/dmu_send.c#L1506. I'm looking into why.

BTW, if you receive the stream into a new dataset, it seems to work (zfs receive /), which of course created everything a level deeper than it originally was. Also, this seems specific to recursive dumps.

dweeezil on 22 Aug 2014

I have figured out the cause of this problem: In some circumstances, the guid of a WRITE_BYREF's referenced dataset (refobject) is not recorded in the restorearg.guid_to_ds_map. At the very least, this occurs when the refguid is the top-level dataset (named as the pool). It also happens when doing a force-receive.

The restructured sync task refactored many parts of the receive process, including among other things, the contents of dmu_receive_cookie_t and the way it is used.

I'm working on a fix for this problem.

dweeezil on 24 Aug 2014

@dweeezil Any chance this is related to #2602?

FransUrbo on 24 Aug 2014

@FransUrbo I don't think so, at least not from my reading of the issue.

Update on this issue: There are actually two somewhat related problems. First, the referred-to object is not recorded in guid_to_ds_map when writing to an existing object (such as doing a "-F" receive to a top-level dataset; named after the pool) and second, the object recorded in dmu_recv_cookie.drc_newsnapobj does not always represent the object ID of the just-created snapshot.

I did manage to come up with an awful hack which fixes receiving into a top-level dataset (named as the pool) but is not a general solution. I'll continue to work on a proper solution.

dweeezil on 25 Aug 2014

The cause is pretty much the same as was the case for #372, however, the reason for the incomplete guid_to_ds_map is different.

dweeezil on 25 Aug 2014

@dweeezil Were you able to determine why ZoL was impacted and not the other ports?

behlendorf on 26 Aug 2014

@behlendorf I've been working on the assumption that other ports will be affected, too. More information as I get it. I'll be testing my saved send stream file under Illumos and FreeBSD.

dweeezil on 27 Aug 2014

@behlendorf The problem occurs under OpenIndiana 151a8. I see no reason this problem wouldn't occur under any other implementation as well. I will still try FreeBSD when I get a chance.

EDIT: any other implementation "with the restructured sync task code".

dweeezil on 27 Aug 2014

@behlendorf The problem occurs under FreeBSD 10.

I've not had much time to work on a fix since the weekend. The old receive cookie had a couple of fields to help track it:

struct dmu_recv_cookie {
...
        * If logical and real are different, we are recving the stream
        * into the "real" temporary clone, and then switching it with
        * the "logical" target.
        */
      struct dsl_dataset *drc_logical_ds;
      struct dsl_dataset *drc_real_ds;
...

but in the new code, these have been replaced by uint64_t drc_newsnapobj with new logic to set it. Unfortunately, the new logic doesn't always grab the correct value (object id) of the newly-created snapshot. I've come up with a hack to calculate it correctly in the case where the receive is on top of an existing dataset but that only fixes one class of the problem and is, as I mentioned, a hack.

I suspect that @ahrens could fix this in short order because I'm trying to guess the intent of the new code whereas he actually wrote it. Regardless, I'll see if I can't come up with something that works properly.

dweeezil on 27 Aug 2014

@dweeezil Thanks for the clarification, for some reason I was under the impression this was Linux specific.

behlendorf on 27 Aug 2014

@dweeezil do you have a simple set of commands to hit this issue? I'd like to try and reproduce it locally.

prakashsurya on 29 Aug 2014

@prakashsurya I think I can come up with a (short) reliable reproducer pretty easily. I've been preoccupied with other issues and haven't been able to look at this one all week... making a bit of time now.

It was a bit simpler than I thought: https://gist.github.com/dweeezil/78320cd167b41bc13cd3

Here's an abbreviated dump of the stream:

root@ubu14-zfstest:~# zstreamdump -v < send.tank-new | egrep 'refguid|fromguid|toguid|toname'
    toguid = 0
    fromguid = 0
    toname = tank/fs@snap3
    toguid = 286f24bc71a20861
    fromguid = 0
    toname = tank/fs@base
    toguid = fb59a768878336da
    fromguid = 286f24bc71a20861
    toname = tank/fs@snap2
toguid = fb59a768878336da refguid = 286f24bc71a20861
    toguid = d9b3f8ac5f2ab144
    fromguid = fb59a768878336da
    toname = tank/fs@snap3
toguid = d9b3f8ac5f2ab144 refguid = fb59a768878336da <<<<- the fb59 guid was not recorded
toguid = d9b3f8ac5f2ab144 refguid = 286f24bc71a20861

I added a comment where the problem occurs.

Here's a few interesting pointers into dmu_send.c: One issue is in dmu_recv_end_sync() where drc->drc_newsnapobj is set. The same value is used for both the "force" case and the non-force case. This is has been shown to be wrong to me. If the stream is being received into the top-level dataset, the proper new snapshot object is actually the "previous snapshot's next snapshot". In the non-force case, however, I've not figured out a reliable way to get the proper snapshot object. I also think that dmu_recv_existing_end() ultimately needs to do an avl_add() similarly to that which is done in dmu_recv_new_end(); without that, I don't see how the top-level dataset could ever be referenced. Both these functions did the add in the old version of the code but, of course, the whole works was structured differently.

dweeezil on 30 Aug 2014

@dweeezil Awesome! Thanks for the simple script. I can confirm it's not Linux specific, I see it in an illumos based VM.

prakashsurya on 30 Aug 2014

Can somebody give this a try? It seems to work in my illumos based environment...

diff --git a/usr/src/uts/common/fs/zfs/dmu_send.c b/usr/src/uts/common/fs/zfs/dmu_send.c
index 8c91fc5..021d7e2 100644
--- a/usr/src/uts/common/fs/zfs/dmu_send.c
+++ b/usr/src/uts/common/fs/zfs/dmu_send.c
@@ -2790,6 +2790,7 @@ dmu_recv_end_sync(void *arg, dmu_tx_t *tx)

                dsl_dataset_rele(origin_head, FTAG);
                dsl_destroy_head_sync_impl(drc->drc_ds, tx);
+               drc->drc_newsnapobj = origin_head->ds_phys->ds_prev_snap_obj;

                if (drc->drc_owner != NULL)
                        VERIFY3P(origin_head->ds_owner, ==, drc->drc_owner);
@@ -2807,8 +2808,8 @@ dmu_recv_end_sync(void *arg, dmu_tx_t *tx)

                dmu_buf_will_dirty(ds->ds_dbuf, tx);
                ds->ds_phys->ds_flags &= ~DS_FLAG_INCONSISTENT;
+               drc->drc_newsnapobj = drc->drc_ds->ds_phys->ds_prev_snap_obj;
        }
-       drc->drc_newsnapobj = drc->drc_ds->ds_phys->ds_prev_snap_obj;
        /*
         * Release the hold from dmu_recv_begin.  This must be done before
         * we return to open context, so that when we free the dataset's dnode,
@@ -2867,8 +2868,13 @@ dmu_recv_existing_end(dmu_recv_cookie_t *drc)
            dmu_recv_end_check, dmu_recv_end_sync, drc,
            dmu_recv_end_modified_blocks, ZFS_SPACE_CHECK_NORMAL);

-       if (error != 0)
+       if (error != 0) {
                dmu_recv_cleanup_ds(drc);
+       } else if (drc->drc_guid_to_ds_map != NULL) {
+               (void) add_ds_to_guidmap(drc->drc_tofs,
+                   drc->drc_guid_to_ds_map,
+                   drc->drc_newsnapobj);
+       }
        return (error);
 }

prakashsurya on 23 Sep 2014

@prakashsurya I'll check it out, but I'm pretty sure this is the equivalent of the "awful hack" I mentioned above. IIRC, origin_head->ds_phys->ds_prev_snap_obj is not always the proper value for `drc_newsnapobj``. Have you tried restoring the deduped stream into a deeper part of a pool (with/without recv -F)?

dweeezil on 23 Sep 2014

@prakashsurya I applied and checked your patch, but it doesn't work: " invalid backup stream".

pyavdr on 23 Sep 2014

@dweeezil When posting that patch, I had only tested it with the simple test script you provided earlier. Since then I've run it through some more testing and hit a panic, which I believe to be a NULL pointer deference. So that's _almost_ correct, but as you say, it needs some more work. I'll try and devote some more time to this, and see what I can come up with.

prakashsurya on 23 Sep 2014

Just ran into this myself. No data loss thanks to backups, but this is a really scary bug. Would it at least be possible to add a warning to zfs send -R until this can be fixed as crowtrobot suggests above?

calmofthestorm on 18 Apr 2015

The problem is that the user learns this when receiving, not when sending. And while yes, everyone should test their backups (indeed, that's how I found this myself), it's a really nasty surprise if you don't.

Just skip using -D is a fine workaround, but it's one that is far from obvious to users that they need to be employing it:-)

calmofthestorm on 18 Apr 2015

The issue is that receive is potentially happening (much) later than send.

In my case the issue is that send goes to a file, which is then backed up offsite, rather than being directly restored to a zpool, so the issue wouldn't be noticed until restore time. This is why I test them before I'm "done", since at least then I know there exists a version of ZFS that can receive them without error.

I understand there are always risks inherent in this approach that make testing much more crucial, but I'd argue that at the moment a flag that claims to do one thing while secretly corrupting your backup without any warning goes beyond the "always test" threshold and into the "we should warn the user" territory, ESPECIALLY since the -D flag is much more attractive if you're backing up to tape/etc than if you're immediately sending to another zpool.

calmofthestorm on 18 Apr 2015

No, I don't have easy access to any of those systems. I'm just deleting it and am recreating the backup from the original source, without -D. I'm not looking at actual data loss because I'm pretty cautious these days, but I'm remembering several youthful errors many years ago where a bug like this could have cost me everything:-)

calmofthestorm on 18 Apr 2015

If you need full receive, you can install ZFS 0.6.2 which is not yet affected by the problem and receive under this version. This rescued me a while ago. The problem is that you need to start completely from scratch.

mantlik on 18 Apr 2015

I just hit this as well. Would be ideal to get this working properly because deduplication could have a lot of wins when replicating an entire pool.

tycho on 3 May 2016

darkpixel on 3 Jul 2016

👍6

I believe the PR posted here (https://github.com/openzfs/openzfs/pull/159) should fix this issue, and it also needs a reviewer to get pushed upstream. If anyone here wants to take a look I'd be very appreciative!

dankimmel on 21 Aug 2016

@prakashsurya @dweeezil @behlendorf @dankimmel I tried to patch the current head of 7.0.-rc1_39* with the changes of (https://github.com/openzfs/openzfs/commit/b09697c8c18be68abfe538de9809938239402ae8) and it works successful. Receiving the deduplicated stream with snapshots works. But i am unsure with it, because there are a some ambiguous differences in the code and i cant check all of them. Find my patch file in the attachment. There should be an additional reviewer.

dmu_send.txt

pyavdr on 23 Sep 2016

@pyavdr thanks for porting this. It'll be much easier to review and comment on if you open a new pull request against master with the patch. And make sure to include the test case from the original OpenZFS patch. You just need to add it to the tests directory, update the Makefile.am, and enabled it in the runfile.

behlendorf on 23 Sep 2016

Closed by #5689