Restic: Backblaze B2 unexpected end of JSON input

Created on 13 Oct 2017  ยท  42Comments  ยท  Source: restic/restic

Output of restic version

restic 0.7.3
compiled with go1.9 on linux/amd64

How did you run restic exactly?

I tried to back up some fairly large (~ 200GB in total) VM images using restic using
restic backup --force /mnt/backup/dump/

I used these environment variables:

RESTIC_REPOSITORY "b2:<bucket_name:<path_to_backup>"
B2_ACCOUNT_ID="<some_account_id>"
B2_ACCOUNT_KEY "<some_account_key"
RESTIC_PASSWORD "<very_secret_password>"

I tried this multiple times, but restic always fails with an error after < 30 minutes.

error while saving data to the repo: unexpected end of JSON input
Close
github.com/restic/restic/internal/backend/b2.(*b2Backend).Save
        src/github.com/restic/restic/internal/backend/b2/b2.go:250
github.com/restic/restic/internal/repository.(*Repository).savePacker
        src/github.com/restic/restic/internal/repository/packer_manager.go:122
github.com/restic/restic/internal/repository.(*Repository).SaveAndEncrypt
        src/github.com/restic/restic/internal/repository/repository.go:203
github.com/restic/restic/internal/repository.(*Repository).SaveBlob
        src/github.com/restic/restic/internal/repository/repository.go:529
github.com/restic/restic/internal/archiver.(*Archiver).Save
        src/github.com/restic/restic/internal/archiver/archiver.go:104
github.com/restic/restic/internal/archiver.(*Archiver).saveChunk
        src/github.com/restic/restic/internal/archiver/archiver.go:161
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:2337
panic: Close: unexpected end of JSON input

goroutine 40839 [running]:
github.com/restic/restic/internal/archiver.(*Archiver).saveChunk(0xc4202ca1e0, 0xdff020, 0xc4200140c8, 0x619ae385, 0x12516a, 0x1b98a15d300000, 0xc4333fa000, 0x12516a, 0x5e4000, 0xc420080840, ...)
        src/github.com/restic/restic/internal/archiver/archiver.go:166 +0x41b
created by github.com/restic/restic/internal/archiver.(*Archiver).SaveFile
        src/github.com/restic/restic/internal/archiver/archiver.go:240 +0x3d0

What backend/server/service did you use?

Backlaze B2

Expected behavior

I'd like restic to finish the backup without an error or aborting.

Actual behavior

The backup fails with an error every time.

Steps to reproduce the behavior

  • Setup a restic repository on Backblaze B2
  • Backup a folder with some large files (~ 200GB across 20 files in my case)
  • Observe error message

Do you have any idea what may have caused this?

I found some issues only with a similar error. It seems, that this is caused by a invalid API response from B2. In the linked post the solution was to retry the backup, but this does not work in my case. The backup fails every time.

bug

All 42 comments

I have the same error, it's not as bad when doing a backup because you can just restart the process, but when doing a prune it's fatal because it seems like prune is not storing the progress.

Hey, thanks for the report. If the backup fails every time, it's a great candidate to do some digging and find out what's going on here.

Please do the following:

  • Install Go (>=1.8), check out the repository
  • Compile restic with debug: go run build.go -tags debug
  • Run the backup again, with this command line:
$ DEBUG_FILES=debug/round_tripper_debug.go,b2/b2.go ./restic backup --force /mnt/backup/dump/ 2>/tmp/restic-debug.log

Then remove the authentication tokens from the debug log with:

$ sed -i '/Authorization/d' /tmp/log
$ sed -i '/Key:' /tmp/log

Then please have a look at the last few HTTP requests/responses in that log, maybe paste it somewhere.

ping @kurin

I have the same issue. I am backing up 17 TB. And the bucket size never gets over 1 GB. But it takes almost two hours to crash.

So, then let's see what the debug log says :)

I get this error also. I built restic from HEAD now and made a debug version as asked above. I just noticed that some code was merged to HEAD. My version I just built has that code. Should I build 0.7.3 instead of HEAD and give a log file from that?

debug/round_tripper_debug.go:90 debug.loggingRoundTripper.RoundTrip 41226 ------------ HTTP RESPONSE ----------
HTTP/1.1 200 OK
Content-Length: 485
Cache-Control: max-age=0, no-cache, no-store
Content-Type: application/json;charset=UTF-8
Date: Sat, 14 Oct 2017 11:30:13 GMT
Server: Apache-Coyote/1.1

b2/b2.go:233 b2.(*b2Backend).Save 41185 Save , name eastwood2/data/47/47edf5c82227bcdc8e85c7174cdb4f1f61ec1f5845f4da0c4ee106fcf15b2de9
debug/round_tripper_debug.go:77 debug.loggingRoundTripper.RoundTrip 40832 ------------ HTTP REQUEST -----------
GET /file/FarmBrown/eastwood2/data/47/47edf5c82227bcdc8e85c7174cdb4f1f61ec1f5845f4da0c4ee106fcf15b2de9 HTTP/1.1
Host: f001.backblazeb2.com
User-Agent: Go-http-client/1.1
Range: bytes=0-0
X-Blazer-Method: b2_download_file_by_name
X-Blazer-Request-Id: 3829

debug/round_tripper_debug.go:90 debug.loggingRoundTripper.RoundTrip 41273 ------------ HTTP RESPONSE ----------
HTTP/1.1 200 OK
Content-Length: 485
Cache-Control: max-age=0, no-cache, no-store
Content-Type: application/json;charset=UTF-8
Date: Sat, 14 Oct 2017 11:30:13 GMT
Server: Apache-Coyote/1.1

b2/b2.go:233 b2.(*b2Backend).Save 41072 Save , name eastwood2/data/5c/5ca314a68150097c5d45f15104a8b680a2a39ac1025bbb276b8ae953cb93786d
debug/round_tripper_debug.go:77 debug.loggingRoundTripper.RoundTrip 41433 ------------ HTTP REQUEST -----------
GET /file/FarmBrown/eastwood2/data/5c/5ca314a68150097c5d45f15104a8b680a2a39ac1025bbb276b8ae953cb93786d HTTP/1.1
Host: f001.backblazeb2.com
User-Agent: Go-http-client/1.1
Range: bytes=0-0
X-Blazer-Method: b2_download_file_by_name
X-Blazer-Request-Id: 3830

debug/round_tripper_debug.go:90 debug.loggingRoundTripper.RoundTrip 41357 ------------ HTTP RESPONSE ----------
HTTP/1.1 404 Not Found
Content-Length: 178
Cache-Control: max-age=0, no-cache, no-store
Content-Type: application/json;charset=UTF-8
Date: Sat, 14 Oct 2017 11:30:13 GMT
Server: Apache-Coyote/1.1

b2/b2.go:244 b2.(*b2Backend).Save 41124 saved 4636805 bytes, err
debug/round_tripper_debug.go:77 debug.loggingRoundTripper.RoundTrip 41438 ------------ HTTP REQUEST -----------
POST /b2api/v1/b2_get_upload_url HTTP/1.1
Host: api001.backblazeb2.com
User-Agent: blazer/0.1.1
Content-Length: 39
X-Blazer-Method: b2_get_upload_url
X-Blazer-Request-Id: 3831
Accept-Encoding: gzip

debug/round_tripper_debug.go:90 debug.loggingRoundTripper.RoundTrip 41438 ------------ HTTP RESPONSE ----------
HTTP/1.1 503 Service Unavailable
Connection: close
Transfer-Encoding: chunked
Date: Sat, 14 Oct 2017 11:30:14 GMT
Server: Apache-Coyote/1.1

b2/b2.go:233 b2.(*b2Backend).Save 40826 Save , name eastwood2/data/41/412995df985cd194619695277f956069d50eadaf61d86d7780499314209f1c1c
panic: Close: unexpected end of JSON input

goroutine 41124 [running]:
github.com/restic/restic/internal/archiver.(Archiver).saveChunk(0xc4203854a0, 0xec3400, 0xc420014418, 0x633cd9, 0xbf39a, 0x0, 0xc4226fc000, 0xbf39a, 0x13a000, 0xc4204fa000, ...)
src/github.com/restic/restic/internal/archiver/archiver.go:166 +0x43a
created by github.com/restic/restic/internal/archiver.(
Archiver).SaveFile
src/github.com/restic/restic/internal/archiver/archiver.go:240 +0x3bf

How is that?

Ha, thank you very much.

@kurin can you maybe have a look? I think it'd be best to return an error about the 503 status code, instead of the JSON error. What's your take on that?

Could the code retry a certain number of times before giving up?

https://www.backblaze.com/b2/docs/integration_checklist.html

This page says the upload should be attempted again if there is a 503.

The current master branch will do that, at a different level. I think it'd be a good idea to add this to the library (https://github.com/kurin/blazer) we're using for accessing B2.

We had a similar problem which is now solved by #1353. ๐Ÿ‘

@jalainis Ah, that's great to hear! Does it work well? Do you like the messages restic prints?

Since I don't experience these errors so often I had to build a backend that randomly inserts errors to test it, but I've never seen it live.

I cannot see new messages concerning retries (similar for check and prune):

restic-master -r $B2_REPO rebuild-index
password is correct
counting files in repo
[19:48] 100.00% 2683 / 2683 packs
finding old index files
saved new index as 3e556787
remove 11 old index files

The random errors "error while saving data to the repo: unexpected end of JSON input" are just gone. So it seems to work well. In general i like the messages restic prints. ๐Ÿ˜‰

According to the Integration Checklist mentioned above there is a header _X-Bz-Test-Mode: fail_some_uploads_ which can be used to test the backend. Blazer seems to support this as a client option.

Just trying this out. You asked about error messages, and I'm getting:

Stat(<data/0acebc06f0>) returned error, retrying after 590.650684ms: Stat: unexpected end of JSON input

I don't like that "unexpected end of JSON" is still there, as it gives the impression something is "corrupt", when in fact the retry seems to succeed. I assume you are merely reporting the underlying (not entirely accurate?) error from blazer not handling the 503 yet?

Also I noticed recently restic started printing "password is correct"... is that intentional?

Thanks for the messages, that's what I expected. We're getting the error from the underlying library (blazer), I've suggested above to improve the message (or even just retry the request).

The "password is correct" message is printed when stdout is a terminal, this was added recently in #1324, requested long ago in #438 to provide some feedback to the user.

A little more feedback. During rebuild-index I have the following so far (it's still running):

counting files in repo
Stat(<data/0acebc06f0>) returned error, retrying after 590.650684ms: Stat: unexpected end of JSON input
Stat(<data/123b5d685e>) returned error, retrying after 628.005782ms: Stat: unexpected end of JSON input
Stat(<data/470010daf5>) returned error, retrying after 712.343092ms: Stat: unexpected end of JSON input
Stat(<data/470718d2cf>) returned error, retrying after 291.116455ms: Stat: unexpected end of JSON input
pack file cannot be listed 4db5bcc9: ReadAt: ReadFull(<data/4db5bcc958>): unexpected end of JSON input
Stat(<data/6205fb65bd>) returned error, retrying after 343.112469ms: Stat: unexpected end of JSON input
pack file cannot be listed 8a331fe4: ReadAt: ReadFull(<data/8a331fe4c0>): unexpected end of JSON input

I can't tell from the messages whether those "pack file cannot be listed" messages mean that the retries were exhausted and the call ultimately failed, or whether it succeeded at some point.

Anecdotally, I have noticed many more B2 JSON errors since I started using --limit-upload 128. Not sure there is really any cause-and-effect here, just an observation.

blazer should automatically retry on 503s, instead of surfacing that to clients. Let me chase this down.

Can some of you please test the restic branch update-blazer? I've pushed the master branch of github.com/kurin/blazer that @kurin prepared. Thanks!

I don't know for sure that this will actually fix it, because I can't reproduce this on my end at all, but I suspect that this is the source of the errors.

I have the issue still when running restic check in my b2 bucket. But I
don't have time to build restic. Could someone post a pre compiled one? I
need amd 64. The precompiled betas have been a God send.

Thanks
Dean

On 16 Oct 2017 18:51, "Toby Burress" notifications@github.com wrote:

I don't know for sure that this will actually fix it, because I can't
reproduce this on my end at all, but I suspect that this is the source of
the errors.

โ€”
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/restic/restic/issues/1351#issuecomment-336969554, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGB9ONwPIlGHAJssycJsX81e3mEUr8cNks5ss5ekgaJpZM4P4PMo
.

I'm testing the update-blazer branch now. I'm ~35% into a rebuild-index and no errors or restic-level retry attempts so far.

If you can reliably reproduce this, I'd like to get the request and especially response from a failing run. I'm running rebuild-index in a loop with code prior to the "fix" but not having a ton of luck generating an error.

Hi Toby, if you can get me a compiled version I can test it in the morning.

On 16 Oct 2017 22:27, "Toby Burress" notifications@github.com wrote:

If you can reliably reproduce this, I'd like to get the request and
especially response from a failing run. I'm running rebuild-index in a loop
with code prior to the "fix" but not having a ton of luck generating an
error.

โ€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/restic/restic/issues/1351#issuecomment-337048779, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGB9OJYVLpN0Zac-eoHZuFTpZvD9_F1Gks5ss8pPgaJpZM4P4PMo
.

Reporting back. rebuild-index completed fine (no JSON errors nor retries), but a subsequent prune failed with a timeout. Not really sure it's the same issue or not:

restic -r b2:mybucket:restic prune --limit-upload 150
password is correct
counting files in repo
building new index for repo
[3:40:03] 100.00%  33418 / 33418 packs
repository contains 33418 packs (882153 blobs) with 157.582 GiB
processed 882153 blobs: 3 duplicate blobs, 4.572 MiB duplicate
load all snapshots
find data that is still in use for 7 snapshots
[2:00] 100.00%  7 / 7 snapshots
found 819017 of 882153 data blobs still in use, removing 63136 blobs
will remove 0 invalid files
will delete 156 packs and rewrite 162 packs, this frees 1.282 GiB
net/http: TLS handshake timeoutewritten
Copy
github.com/restic/restic/internal/repository.Repack
    src/github.com/restic/restic/internal/repository/repack.go:43
main.pruneRepository
    src/github.com/restic/restic/cmd/restic/cmd_prune.go:274
main.runPrune
    src/github.com/restic/restic/cmd/restic/cmd_prune.go:85
main.glob..func14
    src/github.com/restic/restic/cmd/restic/cmd_prune.go:25
github.com/restic/restic/vendor/github.com/spf13/cobra.(*Command).execute
    src/github.com/restic/restic/vendor/github.com/spf13/cobra/command.go:650
github.com/restic/restic/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    src/github.com/restic/restic/vendor/github.com/spf13/cobra/command.go:729
github.com/restic/restic/vendor/github.com/spf13/cobra.(*Command).Execute
    src/github.com/restic/restic/vendor/github.com/spf13/cobra/command.go:688
main.main
    src/github.com/restic/restic/cmd/restic/main.go:69
runtime.main
    /usr/local/go/src/runtime/proc.go:185
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:2337
net/http: TLS handshake timeout

Uh yeah I think blazer is supposed to handle those. Even if it's not in the checklist it sounds retryable.

FYI to everyone using the latest master with the retry logic: I've found a bug, incomplete data may be saved to the backend when an error occurs and a partial file was already sent to the backend, it will be corrected in https://github.com/restic/restic/pull/1362. restic check --read-data will detect this, as well as running sha256sum on the repo data.

FYI incomplete writes to B2 don't go anywhere unless you use the large file API, and restic's writes are never big enough to trigger that.

The bug is that a subsequent retry for Save() will only read the remaining data from the reader in our backend API, which may be incomplete. The PR adds code to rewind the reader every time before Save() on the backend is called.

I have uploaded 300+GB to B2, would it be safer for me to start again once
this bug is fixed? re-reading all the data will probably cost me a lot in
B2 charges compared with re-uploading?

On 17 October 2017 at 21:23, Alexander Neumann notifications@github.com
wrote:

The bug is that a subsequent retry for Save() will only read the remaining
data from the reader, which may be incomplete.

โ€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/restic/restic/issues/1351#issuecomment-337358958, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGB9OP_pBIgnj-eaD11fel1Ffir6CKr3ks5stQzGgaJpZM4P4PMo
.

@deanroker123 when did you upload all that data? The code in question was only active between 14 October and today, so you'd only need to check files uploaded since then. And it was never part of any released version of restic. Maybe you can use rclone or a fuse file system to only download and check files with sha256sum that have been created since 14 October. Please report back when you do that, I'm highly interested if this problem occurred in practice at all.

If you don't want to download all that data again, starting over surely is a way to avoid this.

I'll think about how we can detect incompletely uploaded files that miss a bunch of data at the beginning (but have a valid header at the end of the file) without downloading them.

So after getting a clean bill of health from restic check --read-data overnight, I ran a prune. It completed successfully:

$ restic version
restic 0.7.3 (v0.7.3-180-g7507a658)
compiled with go1.9.1 on darwin/amd64

$ restic -r b2:mybucket:restic prune -o b2.connections=10 --limit-upload 175
password is correct
counting files in repo
building new index for repo
[1:33:38] 100.00%  33267 / 33267 packs
repository contains 33267 packs (846905 blobs) with 156.808 GiB
processed 846905 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 8 snapshots
[1:52] 100.00%  8 / 8 snapshots
found 846905 of 846905 data blobs still in use, removing 0 blobs
will remove 0 invalid files
will delete 0 packs and rewrite 0 packs, this frees 0B
counting files in repo
[1:27:38] 100.00%  33267 / 33267 packs
finding old index files
saved new index as 1e32b669
remove 10 old index files
done

Just for kicks, I tried another check, but without --read-data and I got the JSON errors again:

$ restic -r b2:mybucket:restic check -o b2.connections=10 --limit-upload 175
password is correct
Create exclusive lock for repository
Load indexes
Check all packs
Check snapshots, trees and blobs
error for tree 59d18a1f:
  ReadFull(<data/dd4f64c172>): unexpected end of JSON input
error for tree e5d5162a:
  ReadFull(<data/dd4f64c172>): unexpected end of JSON input
error for tree bbf98641:
  ReadFull(<data/dd4f64c172>): unexpected end of JSON input
  Interrupt received, cleaning up

Note that this is v0.7.3-180-g7507a658, which I believe has retry logic at least on the restic level, possibly also in Blazer.

Is this another transient 503 or similar problem that we're failing to catch, or did prune silently corrupt something?

I tried to examine that problematic file using restic cat blob dd4f64c172 but kept getting unable to parse ID: invalid length for hash... clearly I'm doing something wrong. The remote file does exist, and is ~5.7MB.

It's probably still Blazer. Do you have the debug log for that run?

No I don't have a debug log, sorry. I will capture one this afternoon. Interestingly, I ran it again overnight (this time without the bandwidth restrictions) and it succeeded without error.

restic -r b2:mybucket:restic check 
password is correct
Create exclusive lock for repository
Load indexes
Check all packs
Check snapshots, trees and blobs
No errors were found

So it does seem to indicate another transient bug.

Also, to confirm, this happens with blazer v0.2.1 patched in to restic v0.7.3?

This is with restic 0.7.3, built from master 7507a658ac529a49560b4f0e5570ae900de2ce61, using the version of Blazer that's included in vendor.

Running it again now with debug enabled.

Oooh, okay, that still has the old version of blazer.

I've updated the branch and will merge blazer 0.2.1 as soon as the integration tests pass.

Please leave a comment should you experience this issue again with restic version v0.7.3-187-g6a52bb6f or later, thanks!

FYI, the beta build will appear here in a few minutes: https://beta.restic.net/v0.7.3-187-g6a52bb6f/

Just following up- I am unable to produce any more B2 issues since upgrading to v0.7.3-193-ga3f8e9df. I ran check several times on a large repo (basically just ran it all day long) and could not get a single failure. ๐Ÿ‘

Thanks for the feedback!

@fd0 I have used rclone to mount the repo files and search for ones modifed since the 14th, because of the check problems I dont think I had run a full backup. so the only thing modified was an index file. I will download the version mentioned by @armhold and give it another go.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

christian-vent picture christian-vent  ยท  3Comments

axllent picture axllent  ยท  4Comments

fbartels picture fbartels  ยท  3Comments

reallinfo picture reallinfo  ยท  4Comments

fd0 picture fd0  ยท  3Comments