"zfs receive" fails when receiving a deduplicated stream with snapshots with "invalid backup stream"
When the stream isn麓t deduplicated it runs smooth.
To be sure, i patched another pool (part2 of this issue) with the newest pull request from @dweeezil (Recent illumos/Delphix features #2149) but the problem persists.
Linux Suse122 3.7.7-2-default #1 SMP Wed Feb 13 09:33:05 UTC 2013 (d55bd47) x86_64 x86_64 x86_64 GNU/Linux
ZFS: Loaded module v0.6.2-219_ga15dac4, ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.6.2-23_g4c99541
NAME USED AVAIL REFER MOUNTPOINT
stor2/zfstest 4.67G 102G 4.67G /stor2/zfstest
stor2/zfstest3 49.9G 102G 49.8G /stor2/zfstest3
stor2/zfstest3@394806312 736K - 768K -
stor2/zfstest3@421188256 976K - 1.94G -
stor2/zfstest3@484463391 992K - 4.26G -
stor2/zfstest3@618833865 1.28M - 7.45G -
stor2/zfstest3@579117221 1.41M - 9.58G -
stor2/zfstest3@516414280 1.54M - 12.8G -
stor2/zfstest3@967496177 1.68M - 16.0G -
stor2/zfstest3@636523150 1.77M - 18.2G -
stor2/zfstest3@672405261 1.87M - 20.5G -
stor2/zfstest3@949169779 1.99M - 23.1G -
stor2/zfstest3@688080912 2.33M - 25.8G -
stor2/zfstest3@522831668 2.91M - 28.5G -
stor2/zfstest3@493854542 3.20M - 31.1G -
stor2/zfstest3@151707443 3.32M - 33.8G -
stor2/zfstest3@194586695 3.46M - 36.4G -
stor2/zfstest3@493432740 3.58M - 39.1G -
stor2/zfstest3@144417120 3.68M - 41.7G -
stor2/zfstest3@138155230 3.80M - 44.4G -
stor2/zfstest3@122410522 3.98M - 47.0G -
stor2/zfstest3@967710983 1.86M - 49.7G -
stor2/zfstest3@now 0 - 49.8G -
Suse122:/stor2/zfstest # zfs send -R -D stor2/zfstest3@now > /stor2/zfstest/test.zfs
Suse122:/stor2/zfstest # zfs destroy -r stor2/zfstest3
Suse122:/stor2/zfstest # zfs recv -v stor2/zfstest3 < /stor2/zfstest/test.zfs
receiving full stream of stor2/zfstest3@394806312 into stor2/zfstest3@394806312
received 2.11MB stream in 5 seconds (432KB/sec)
receiving incremental stream of stor2/zfstest3@421188256 into stor2/zfstest3@421188256
received 249MB stream in 14 seconds (17.8MB/sec)
receiving incremental stream of stor2/zfstest3@484463391 into stor2/zfstest3@484463391
cannot receive incremental stream: invalid backup stream
Linux linux-ts3r 3.11.10-7-desktop #1 SMP PREEMPT Mon Feb 3 09:41:24 UTC 2014 (750023e) x86_64 x86_64 x86_64 GNU/Linux
ZFS: Loaded module v0.6.2-201_g8eac9fb, ZFS pool version 5000, ZFS filesystem version 5
SPL: Loaded module v0.6.2-23_g4c99541
linux-ts3r:/etc # zpool get all
NAME PROPERTY VALUE SOURCE
tank size 496G -
tank capacity 0% -
tank altroot - default
tank health ONLINE -
tank guid 8224401125096643018 default
tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupditto 0 default
tank dedupratio 1.00x -
tank free 496G -
tank allocated 126K -
tank readonly off -
tank ashift 0 default
tank comment - default
tank expandsize 0 -
tank freeing 0 default
tank feature@async_destroy enabled local
tank feature@empty_bpobj enabled local
tank feature@lz4_compress enabled local
tank feature@spacemap_histogram active local
tank feature@enabled_txg active local
tank feature@hole_birth active local
tank feature@extensible_dataset enabled local
tank feature@bookmarks enabled local
NAME USED AVAIL REFER MOUNTPOINT
tank 49.8G 438G 32K /tank
tank/zfstest 47K 438G 47K /tank/zfstest
tank/zfstest2 49.8G 438G 49.8G /tank/zfstest2
tank/zfstest2@300726135 242K - 2.24G -
tank/zfstest2@296730124 470K - 4.81G -
tank/zfstest2@224726639 771K - 7.39G -
tank/zfstest2@902427907 908K - 9.98G -
tank/zfstest2@540759649 1.07M - 12.6G -
tank/zfstest2@255536517 1.49M - 15.1G -
tank/zfstest2@199845779 1.62M - 17.7G -
tank/zfstest2@566124845 1.74M - 20.3G -
tank/zfstest2@956409770 1.87M - 22.8G -
tank/zfstest2@037434686 2.14M - 25.5G -
tank/zfstest2@301863766 2.63M - 28.1G -
tank/zfstest2@725218939 2.92M - 30.7G -
tank/zfstest2@694801830 3.06M - 33.2G -
tank/zfstest2@643148554 3.20M - 35.9G -
tank/zfstest2@492741743 3.33M - 38.5G -
tank/zfstest2@243681899 3.46M - 41.1G -
tank/zfstest2@659482439 3.59M - 43.7G -
tank/zfstest2@999733757 3.75M - 46.2G -
tank/zfstest2@921130344 3.69M - 48.9G -
linux-ts3r:/tank/zfstest # zfs snap tank/zfstest2@now
linux-ts3r:/tank/zfstest # zfs send -R -D tank/zfstest2@now > /tank/zfstest/testD.zfs
linux-ts3r:/tank/zfstest # zfs destroy -r tank/zfstest
zfs recv -v tank/zfstest2 < /tank/zfstest/testD.zfs
receiving full stream of tank/zfstest2@300726135 into tank/zfstest2@300726135
linux-ts3r:/tank/zfstest # zfs recv -v tank/zfstest2 < /tank/zfstest/testD.zfs
receiving full stream of tank/zfstest2@300726135 into tank/zfstest2@300726135
received 283MB stream in 19 seconds (14.9MB/sec)
receiving incremental stream of tank/zfstest2@296730124 into tank/zfstest2@296730124
received 319MB stream in 13 seconds (24.6MB/sec)
receiving incremental stream of tank/zfstest2@224726639 into tank/zfstest2@224726639
cannot receive incremental stream: invalid backup stream
There is an old issue (https://www.illumos.org/issues/755), describing an "invalid backup stream" failure, but this should be already solved and included in ZOL sources, isn麓t it?
@pyavdr We did pick up the fix for the issue you references a long time ago. We'll need to dig in to this and see what the cause is.
I needed to rearrange my partitions, so I did a ZFS send to copy off my ZFS data, repartitioned, and then tried to ZFS recv that data back. Unfortunately I seem to have independently found this bug. Now that the original ZFS data has been overwritten is my only choice restoring data from backups, or is there some sort of workaround? I tried using zfs-fuse but it said "cannot receive: stream has unsupported feature". Would maybe compiling ZFS for myself make the receive work for me?
It maybe possible to recv with another Zfs system. It isnt clear wether the bug happens on the send or the recv side. If it happens on recv side, your chances are better for a good recv with another system which supports deduplicated streams. If you have a valid backup try again without dedup the stream.
@behlendorf
Brian, since this bug is severe, how can i help? Strace is at #2453 . Quick producer? There is a stack at #2453 does it help?
If it helps at all, the problem appears to be in ZFS send. I setup a VM with nas4free (based on FreeBSD) and tried the ZFS receive from there. It failed with the same error at about the same point (~17 GB in). I will begin restoring from my bacula backups now.
I suggest that a warning be printed when a user tries to use the ZFS send -D option. I have decent backups but the next person might not be so lucky.
Also I would like to help if I can. I will try this weekend to create a smaller test case that doesn't contain all my personal data. I am not a programmer but might at least be able to help test potential fixes.
I am experiencing the same problem on the official 0.6.3 release for Debian and as well on the nightly build 0.6.3-10~672692~wheezy.
The issue can be simply reproduced (suppose backup_pool is an empty pool and the copied file is larger than pool block size; I am using a small deb archive here):
cp python-gitzilla_2.0-1_all.deb /backup_pool/p.deb
zfs create backup_pool/test
zfs snapshot -r backup_pool@snap1
cp python-gitzilla_2.0-1_all.deb /backup_pool/test/p1.deb
zfs snapshot -r backup_pool@snap2
cp python-gitzilla_2.0-1_all.deb /backup_pool/test/p2.deb
zfs snapshot -r backup_pool@snap3
cp python-gitzilla_2.0-1_all.deb /backup_pool/test/p3.deb
zfs snapshot -r backup_pool@snap4
zfs send -RDv backup_pool@snap4 > dedup_snapshot
zfs destroy -r backup_pool
zfs receive -vF backup_pool < dedup_snapshot
The output from the last command is as follows:
receiving full stream of backup_pool@snap1 into backup_pool@snap1
received 22,0KB stream in 2 seconds (11,0KB/sec)
receiving incremental stream of backup_pool@snap2 into backup_pool@snap2
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of backup_pool@snap3 into backup_pool@snap3
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of backup_pool@snap4 into backup_pool@snap4
received 312B stream in 1 seconds (312B/sec)
receiving full stream of backup_pool/test@snap1 into backup_pool/test@snap1
received 39,9KB stream in 1 seconds (39,9KB/sec)
receiving incremental stream of backup_pool/test@snap2 into backup_pool/test@snap2
cannot receive incremental stream: invalid backup stream
The problem is most probably on the receiving side because the same error is reported when trying to receive an older stream created by a 0.6.1 (?) version which I was able to successfully receive earlier.
For those stacked like me with no help from the ZFS for Linux team:
If you have the deduped full stream of the entire pool, this regression can be worked around by downgrading to version 0.6.2. The procedure is as follows (for Debian system with installed DKMS SPL/ZFS version 0.6.3):
Recommendation: If you are not sure that you backup stream is OK, backup your existing pool without -D option if possible before destroying it.
Hope it will help.
Sorry guys, I've been working on other issues. I've bumped this to the 0.6.4 tag and we'll definitely try to get it resolved since it's a regression.
Brian, thank you very much for your effort in ZFS for Linux development. In order to help you at least a little with the resolution of this issue (I am not a C programmer), I made bisect search over the ZFS Git repository using a Debian virtual machine.
The first commit which failed to receive the deduped stream (see my reproducing procedure above) is this one:
https://github.com/zfsonlinux/zfs/commit/13fe019870c8779bf2f5b3ff731b512cf89133ef
This can be with relatively high probability source of the regression. Unfortunatelly, this commit introduces a complex code change.
The previous commit
https://github.com/zfsonlinux/zfs/commit/6f1ffb06655008c9b519108ed29fbf03acd6e5de
received the deduped stream without problems.
Hope this could help.
The problem is triggered here: https://github.com/zfsonlinux/zfs/blob/0426c16804e572f088dd293029ab39fa75e1e203/module/zfs/dmu_send.c#L1506. I'm looking into why.
BTW, if you receive the stream into a new dataset, it seems to work (zfs receive
I have figured out the cause of this problem: In some circumstances, the guid of a WRITE_BYREF's referenced dataset (refobject) is not recorded in the restorearg.guid_to_ds_map. At the very least, this occurs when the refguid is the top-level dataset (named as the pool). It also happens when doing a force-receive.
The restructured sync task refactored many parts of the receive process, including among other things, the contents of dmu_receive_cookie_t and the way it is used.
I'm working on a fix for this problem.
@dweeezil Any chance this is related to #2602?
@FransUrbo I don't think so, at least not from my reading of the issue.
Update on this issue: There are actually two somewhat related problems. First, the referred-to object is not recorded in guid_to_ds_map when writing to an existing object (such as doing a "-F" receive to a top-level dataset; named after the pool) and second, the object recorded in dmu_recv_cookie.drc_newsnapobj does not always represent the object ID of the just-created snapshot.
I did manage to come up with an awful hack which fixes receiving into a top-level dataset (named as the pool) but is not a general solution. I'll continue to work on a proper solution.
The cause is pretty much the same as was the case for #372, however, the reason for the incomplete guid_to_ds_map is different.
@dweeezil Were you able to determine why ZoL was impacted and not the other ports?
@behlendorf I've been working on the assumption that other ports will be affected, too. More information as I get it. I'll be testing my saved send stream file under Illumos and FreeBSD.
@behlendorf The problem occurs under OpenIndiana 151a8. I see no reason this problem wouldn't occur under any other implementation as well. I will still try FreeBSD when I get a chance.
EDIT: any other implementation "with the restructured sync task code".
@behlendorf The problem occurs under FreeBSD 10.
I've not had much time to work on a fix since the weekend. The old receive cookie had a couple of fields to help track it:
struct dmu_recv_cookie {
...
* If logical and real are different, we are recving the stream
* into the "real" temporary clone, and then switching it with
* the "logical" target.
*/
struct dsl_dataset *drc_logical_ds;
struct dsl_dataset *drc_real_ds;
...
but in the new code, these have been replaced by uint64_t drc_newsnapobj with new logic to set it. Unfortunately, the new logic doesn't always grab the correct value (object id) of the newly-created snapshot. I've come up with a hack to calculate it correctly in the case where the receive is on top of an existing dataset but that only fixes one class of the problem and is, as I mentioned, a hack.
I suspect that @ahrens could fix this in short order because I'm trying to guess the intent of the new code whereas he actually wrote it. Regardless, I'll see if I can't come up with something that works properly.
@dweeezil Thanks for the clarification, for some reason I was under the impression this was Linux specific.
@dweeezil do you have a simple set of commands to hit this issue? I'd like to try and reproduce it locally.
@prakashsurya I think I can come up with a (short) reliable reproducer pretty easily. I've been preoccupied with other issues and haven't been able to look at this one all week... making a bit of time now.
It was a bit simpler than I thought: https://gist.github.com/dweeezil/78320cd167b41bc13cd3
Here's an abbreviated dump of the stream:
root@ubu14-zfstest:~# zstreamdump -v < send.tank-new | egrep 'refguid|fromguid|toguid|toname'
toguid = 0
fromguid = 0
toname = tank/fs@snap3
toguid = 286f24bc71a20861
fromguid = 0
toname = tank/fs@base
toguid = fb59a768878336da
fromguid = 286f24bc71a20861
toname = tank/fs@snap2
toguid = fb59a768878336da refguid = 286f24bc71a20861
toguid = d9b3f8ac5f2ab144
fromguid = fb59a768878336da
toname = tank/fs@snap3
toguid = d9b3f8ac5f2ab144 refguid = fb59a768878336da <<<<- the fb59 guid was not recorded
toguid = d9b3f8ac5f2ab144 refguid = 286f24bc71a20861
I added a comment where the problem occurs.
Here's a few interesting pointers into dmu_send.c: One issue is in dmu_recv_end_sync() where drc->drc_newsnapobj is set. The same value is used for both the "force" case and the non-force case. This is has been shown to be wrong to me. If the stream is being received into the top-level dataset, the proper new snapshot object is actually the "previous snapshot's next snapshot". In the non-force case, however, I've not figured out a reliable way to get the proper snapshot object. I also think that dmu_recv_existing_end() ultimately needs to do an avl_add() similarly to that which is done in dmu_recv_new_end(); without that, I don't see how the top-level dataset could ever be referenced. Both these functions did the add in the old version of the code but, of course, the whole works was structured differently.
@dweeezil Awesome! Thanks for the simple script. I can confirm it's not Linux specific, I see it in an illumos based VM.
Can somebody give this a try? It seems to work in my illumos based environment...
diff --git a/usr/src/uts/common/fs/zfs/dmu_send.c b/usr/src/uts/common/fs/zfs/dmu_send.c
index 8c91fc5..021d7e2 100644
--- a/usr/src/uts/common/fs/zfs/dmu_send.c
+++ b/usr/src/uts/common/fs/zfs/dmu_send.c
@@ -2790,6 +2790,7 @@ dmu_recv_end_sync(void *arg, dmu_tx_t *tx)
dsl_dataset_rele(origin_head, FTAG);
dsl_destroy_head_sync_impl(drc->drc_ds, tx);
+ drc->drc_newsnapobj = origin_head->ds_phys->ds_prev_snap_obj;
if (drc->drc_owner != NULL)
VERIFY3P(origin_head->ds_owner, ==, drc->drc_owner);
@@ -2807,8 +2808,8 @@ dmu_recv_end_sync(void *arg, dmu_tx_t *tx)
dmu_buf_will_dirty(ds->ds_dbuf, tx);
ds->ds_phys->ds_flags &= ~DS_FLAG_INCONSISTENT;
+ drc->drc_newsnapobj = drc->drc_ds->ds_phys->ds_prev_snap_obj;
}
- drc->drc_newsnapobj = drc->drc_ds->ds_phys->ds_prev_snap_obj;
/*
* Release the hold from dmu_recv_begin. This must be done before
* we return to open context, so that when we free the dataset's dnode,
@@ -2867,8 +2868,13 @@ dmu_recv_existing_end(dmu_recv_cookie_t *drc)
dmu_recv_end_check, dmu_recv_end_sync, drc,
dmu_recv_end_modified_blocks, ZFS_SPACE_CHECK_NORMAL);
- if (error != 0)
+ if (error != 0) {
dmu_recv_cleanup_ds(drc);
+ } else if (drc->drc_guid_to_ds_map != NULL) {
+ (void) add_ds_to_guidmap(drc->drc_tofs,
+ drc->drc_guid_to_ds_map,
+ drc->drc_newsnapobj);
+ }
return (error);
}
@prakashsurya I'll check it out, but I'm pretty sure this is the equivalent of the "awful hack" I mentioned above. IIRC, origin_head->ds_phys->ds_prev_snap_obj is not always the proper value for `drc_newsnapobj``. Have you tried restoring the deduped stream into a deeper part of a pool (with/without recv -F)?
@prakashsurya I applied and checked your patch, but it doesn't work: " invalid backup stream".
@dweeezil When posting that patch, I had only tested it with the simple test script you provided earlier. Since then I've run it through some more testing and hit a panic, which I believe to be a NULL pointer deference. So that's _almost_ correct, but as you say, it needs some more work. I'll try and devote some more time to this, and see what I can come up with.
Just ran into this myself. No data loss thanks to backups, but this is a really scary bug. Would it at least be possible to add a warning to zfs send -R until this can be fixed as crowtrobot suggests above?
The problem is that the user learns this when receiving, not when sending. And while yes, everyone should test their backups (indeed, that's how I found this myself), it's a really nasty surprise if you don't.
Just skip using -D is a fine workaround, but it's one that is far from obvious to users that they need to be employing it:-)
The issue is that receive is potentially happening (much) later than send.
In my case the issue is that send goes to a file, which is then backed up offsite, rather than being directly restored to a zpool, so the issue wouldn't be noticed until restore time. This is why I test them before I'm "done", since at least then I know there exists a version of ZFS that can receive them without error.
I understand there are always risks inherent in this approach that make testing much more crucial, but I'd argue that at the moment a flag that claims to do one thing while secretly corrupting your backup without any warning goes beyond the "always test" threshold and into the "we should warn the user" territory, ESPECIALLY since the -D flag is much more attractive if you're backing up to tape/etc than if you're immediately sending to another zpool.
No, I don't have easy access to any of those systems. I'm just deleting it and am recreating the backup from the original source, without -D. I'm not looking at actual data loss because I'm pretty cautious these days, but I'm remembering several youthful errors many years ago where a bug like this could have cost me everything:-)
If you need full receive, you can install ZFS 0.6.2 which is not yet affected by the problem and receive under this version. This rescued me a while ago. The problem is that you need to start completely from scratch.
I just hit this as well. Would be ideal to get this working properly because deduplication could have a lot of wins when replicating an entire pool.
Just a heads-up for anyone stumbling upon this issue by searching for "cannot receive incremental stream: invalid backup stream": This error message _can also_ occur if you have something running (like a cron job) that automatically takes snapshots on the pool you are zfs-receiving against.
In my case, I use puppet to automatically configure a bunch of Linux boxes that all have a standard config. I have a pool named 'tank' on all of them, and during normal production use zfs-auto-snapshot runs every 15 minutes to ensure customer data is protected. I was restoring data to a newly created 'tank' with much larger drives. The zfs receive would run for a while, then zfs-auto-snapshot would snapshot, and the next snapshot in the received stream would cause this error.
I believe the PR posted here (https://github.com/openzfs/openzfs/pull/159) should fix this issue, and it also needs a reviewer to get pushed upstream. If anyone here wants to take a look I'd be very appreciative!
@prakashsurya @dweeezil @behlendorf @dankimmel I tried to patch the current head of 7.0.-rc1_39* with the changes of (https://github.com/openzfs/openzfs/commit/b09697c8c18be68abfe538de9809938239402ae8) and it works successful. Receiving the deduplicated stream with snapshots works. But i am unsure with it, because there are a some ambiguous differences in the code and i cant check all of them. Find my patch file in the attachment. There should be an additional reviewer.
@pyavdr thanks for porting this. It'll be much easier to review and comment on if you open a new pull request against master with the patch. And make sure to include the test case from the original OpenZFS patch. You just need to add it to the tests directory, update the Makefile.am, and enabled it in the runfile.
Closed by #5689
Most helpful comment
Just a heads-up for anyone stumbling upon this issue by searching for "cannot receive incremental stream: invalid backup stream": This error message _can also_ occur if you have something running (like a cron job) that automatically takes snapshots on the pool you are zfs-receiving against.
In my case, I use puppet to automatically configure a bunch of Linux boxes that all have a standard config. I have a pool named 'tank' on all of them, and during normal production use zfs-auto-snapshot runs every 15 minutes to ensure customer data is protected. I was restoring data to a newly created 'tank' with much larger drives. The zfs receive would run for a while, then zfs-auto-snapshot would snapshot, and the next snapshot in the received stream would cause this error.