Restic: runtime: out of memory: cannot allocate 536870912-byte block (556793856 in use)

Created on 29 May 2017 · 48Comments · Source: restic/restic

Likely supersedes #526

Output of `restic version`

# restic.debug version
debug enabled
restic 0.6.0-rc.1 (v0.6.0-rc.1-1-g37b107b)
compiled with go1.8.3 on linux/arm

How did you start restic exactly? (Include the complete command line)

# /usr/local/sbin/restic.debug -r rest:http://qnap.ch.polyware.nl:8000 backup -x --tag CH / /boot --exclude "/var/log/*,/var/tmp/*,/tmp/*" --mem-profile=/tmp

What backend/server/service did you use?

rest

Expected behavior

Making a backup.

Actual behavior

Failed backup. out of memory.

# /usr/local/sbin/restic.debug -r rest:http://qnap.ch.polyware.nl:8000 backup -x --tag CH / /boot --exclude "/var/log/*,/var/tmp/*,/tmp/*" --mem-profile=/tmp
debug enabled
enter password for repository: 
runtime: out of memory: cannot allocate 536870912-byte block (556793856 in use)
fatal error: out of memory

runtime stack:
runtime.throw(0x4c85ae, 0xd)
    /usr/local/go/src/runtime/panic.go:596 +0x70
runtime.largeAlloc(0x1ffffe00, 0x6ae01, 0x711f50)
    /usr/local/go/src/runtime/malloc.go:809 +0xe0
runtime.mallocgc.func1()
    /usr/local/go/src/runtime/malloc.go:702 +0x30
runtime.systemstack(0x10a1ca80)
    /usr/local/go/src/runtime/asm_arm.s:264 +0x80
runtime.mstart()
    /usr/local/go/src/runtime/proc.go:1132

goroutine 117 [running]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_arm.s:209 +0x4 fp=0x10ba7a7c sp=0x10ba7a78
runtime.mallocgc(0x1ffffe00, 0x44ea60, 0x10b66e01, 0x10adab18)
    /usr/local/go/src/runtime/malloc.go:703 +0x9e0 fp=0x10ba7ad8 sp=0x10ba7a7c
runtime.makeslice(0x44ea60, 0x1ffffe00, 0x1ffffe00, 0x960, 0x0, 0x0)
    /usr/local/go/src/runtime/slice.go:54 +0x68 fp=0x10ba7afc sp=0x10ba7ad8
bytes.makeSlice(0x1ffffe00, 0x0, 0x0, 0x0)
    /usr/local/go/src/bytes/buffer.go:201 +0x58 fp=0x10ba7b18 sp=0x10ba7afc
bytes.(*Buffer).ReadFrom(0x10ba7b74, 0x6e16f0, 0x10aa3fc0, 0x200, 0x0, 0x10a80400, 0x0)
    /usr/local/go/src/bytes/buffer.go:173 +0x270 fp=0x10ba7b50 sp=0x10ba7b18
io/ioutil.readAll(0x6e16f0, 0x10aa3fc0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/io/ioutil/ioutil.go:33 +0xd0 fp=0x10ba7bc8 sp=0x10ba7b50
io/ioutil.ReadAll(0x6e16f0, 0x10aa3fc0, 0x10aa3fc0, 0x6e16f0, 0x10aa3fc0, 0x0, 0x0)
    /usr/local/go/src/io/ioutil/ioutil.go:42 +0x34 fp=0x10ba7bf0 sp=0x10ba7bc8
restic/backend.LoadAll(0x6e5fe0, 0x10ae0150, 0x4c1338, 0x5, 0x10b2dcc0, 0x40, 0x0, 0x0, 0x0, 0x0, ...)
    /tmp/restic-build-847077805/src/restic/backend/utils.go:28 +0x104 fp=0x10ba7c2c sp=0x10ba7bf0
restic/repository.(*Repository).LoadAndDecrypt(0x10adc180, 0x4c1338, 0x5, 0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, ...)
    /tmp/restic-build-847077805/src/restic/repository/repository.go:57 +0x120 fp=0x10ba7d50 sp=0x10ba7c2c
restic/repository.LoadIndexWithDecoder(0x6e6e20, 0x10adc180, 0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, 0x8d242524, ...)
    /tmp/restic-build-847077805/src/restic/repository/index.go:525 +0xc8 fp=0x10ba7de0 sp=0x10ba7d50
restic/repository.LoadIndex(0x6e6e20, 0x10adc180, 0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, 0x8d242524, ...)
    /tmp/restic-build-847077805/src/restic/repository/repository.go:352 +0x48 fp=0x10ba7e34 sp=0x10ba7de0
restic/repository.(*Repository).LoadIndex.func1(0x1daf0f27, 0x39e92cc6, 0x9c59383f, 0x97399bea, 0x21ed0b22, 0xb2d24563, 0xa65b3a67, 0x8d242524, 0x10b62080, 0x8d242524, ...)
    /tmp/restic-build-847077805/src/restic/repository/repository.go:320 +0x4c fp=0x10ba7ec0 sp=0x10ba7e34
restic/repository.ParallelWorkFuncParseID.func1(0x10b622c0, 0x40, 0x10b62080, 0x10b4ff00, 0x1)
    /tmp/restic-build-847077805/src/restic/repository/parallel.go:89 +0x148 fp=0x10ba7f54 sp=0x10ba7ec0
restic/repository.FilesInParallel.func1(0x10b689c0, 0x10b620c0, 0x10b34820, 0x10b62080, 0x10b62380)
    /tmp/restic-build-847077805/src/restic/repository/parallel.go:53 +0xd4 fp=0x10ba7fd4 sp=0x10ba7f54
runtime.goexit()
    /usr/local/go/src/runtime/asm_arm.s:1017 +0x4 fp=0x10ba7fd4 sp=0x10ba7fd4
created by restic/repository.FilesInParallel
    /tmp/restic-build-847077805/src/restic/repository/parallel.go:63 +0x130

goroutine 1 [chan receive]:
restic/repository.(*Repository).LoadIndex(0x10adc180, 0x4e9798, 0x10b2c3c0)
    /tmp/restic-build-847077805/src/restic/repository/repository.go:339 +0x124
main.runBackup(0x0, 0x0, 0x0, 0x10ad8a80, 0x3, 0x3, 0x0, 0x0, 0x0, 0x1, ...)
    /tmp/restic-build-847077805/src/cmds/restic/cmd_backup.go:375 +0x330
main.glob..func2(0x70d1b8, 0x10b397c0, 0x2, 0xa, 0x0, 0x0)
    /tmp/restic-build-847077805/src/cmds/restic/cmd_backup.go:38 +0xcc
github.com/spf13/cobra.(*Command).execute(0x70d1b8, 0x10b396d0, 0xa, 0xa, 0x70d1b8, 0x10b396d0)
    /tmp/restic-build-847077805/src/github.com/spf13/cobra/command.go:631 +0x308
github.com/spf13/cobra.(*Command).ExecuteC(0x70cf88, 0x10ade2b8, 0x44e820, 0x10b34180)
    /tmp/restic-build-847077805/src/github.com/spf13/cobra/command.go:710 +0x220
github.com/spf13/cobra.(*Command).Execute(0x70cf88, 0x9, 0x10b1bf74)
    /tmp/restic-build-847077805/src/github.com/spf13/cobra/command.go:669 +0x1c
main.main()
    /tmp/restic-build-847077805/src/cmds/restic/main.go:60 +0x88

goroutine 5 [syscall]:
os/signal.signal_recv(0x0)
    /usr/local/go/src/runtime/sigqueue.go:116 +0x154
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:22 +0x14
created by os/signal.init.1
    /usr/local/go/src/os/signal/signal_unix.go:28 +0x30

goroutine 6 [chan receive]:
restic.init.1.func1.1()
    /tmp/restic-build-847077805/src/restic/lock.go:265 +0x9c
created by restic.init.1.func1
    /tmp/restic-build-847077805/src/restic/lock.go:268 +0x24

goroutine 7 [select, locked to thread]:
runtime.gopark(0x4e9cdc, 0x0, 0x4c27d7, 0x6, 0x18, 0x2)
    /usr/local/go/src/runtime/proc.go:271 +0xfc
runtime.selectgoImpl(0x10a26fa0, 0x0, 0xc)
    /usr/local/go/src/runtime/select.go:423 +0x116c
runtime.selectgo(0x10a26fa0)
    /usr/local/go/src/runtime/select.go:238 +0x10
runtime.ensureSigM.func1()
    /usr/local/go/src/runtime/signal_unix.go:434 +0x2bc
runtime.goexit()
    /usr/local/go/src/runtime/asm_arm.s:1017 +0x4

goroutine 8 [chan receive]:
restic.init.2.func1(0x10a12440)
    /tmp/restic-build-847077805/src/restic/progress_unix.go:17 +0x38
created by restic.init.2
    /tmp/restic-build-847077805/src/restic/progress_unix.go:21 +0x90

goroutine 17 [chan receive]:
main.CleanupHandler(0x10ad6040)
    /tmp/restic-build-847077805/src/cmds/restic/cleanup.go:62 +0x38
created by main.init.1
    /tmp/restic-build-847077805/src/cmds/restic/cleanup.go:25 +0x90

goroutine 18 [chan receive]:
github.com/pkg/profile.Start.func7(0x10ad8ae0)
    /tmp/restic-build-847077805/src/github.com/pkg/profile/profile.go:206 +0x90
created by github.com/pkg/profile.Start
    /tmp/restic-build-847077805/src/github.com/pkg/profile/profile.go:212 +0x18c

goroutine 33 [select]:
net/http.(*persistConn).readLoop(0x10a7e280)
    /usr/local/go/src/net/http/transport.go:1599 +0x7b4
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1117 +0x814

goroutine 34 [select]:
net/http.(*persistConn).writeLoop(0x10a7e280)
    /usr/local/go/src/net/http/transport.go:1704 +0x394
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1118 +0x830

goroutine 105 [IO wait]:
net.runtime_pollWait(0x76ee4088, 0x72, 0x10b57000)
    /usr/local/go/src/runtime/netpoll.go:164 +0x44
net.(*pollDesc).wait(0x10b984fc, 0x72, 0x6e20e0, 0x6e0138)
    /usr/local/go/src/net/fd_poll_runtime.go:75 +0x28
net.(*pollDesc).waitRead(0x10b984fc, 0x10b57000, 0x1000)
    /usr/local/go/src/net/fd_poll_runtime.go:80 +0x24
net.(*netFD).Read(0x10b984c0, 0x10b57000, 0x1000, 0x1000, 0x0, 0x6e20e0, 0x6e0138)
    /usr/local/go/src/net/fd_unix.go:250 +0x148
net.(*conn).Read(0x10b648d0, 0x10b57000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:181 +0x58
net/http.(*persistConn).Read(0x10b5c460, 0x10b57000, 0x1000, 0x1000, 0x4, 0x67d2c, 0x10b63140)
    /usr/local/go/src/net/http/transport.go:1316 +0x174
bufio.(*Reader).fill(0x10ae0cf0)
    /usr/local/go/src/bufio/bufio.go:97 +0xf4
bufio.(*Reader).Peek(0x10ae0cf0, 0x1, 0x10bcada8, 0x1, 0x0, 0x10b98680, 0x0)
    /usr/local/go/src/bufio/bufio.go:129 +0x58
net/http.(*persistConn).readLoop(0x10b5c460)
    /usr/local/go/src/net/http/transport.go:1474 +0x160
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1117 +0x814

goroutine 106 [select]:
net/http.(*persistConn).writeLoop(0x10b5c460)
    /usr/local/go/src/net/http/transport.go:1704 +0x394
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1118 +0x830

goroutine 44 [select]:
main.refreshLocks(0x711a70, 0x10b0bf00)
    /tmp/restic-build-847077805/src/cmds/restic/lock.go:70 +0x218
created by main.lockRepository
    /tmp/restic-build-847077805/src/cmds/restic/lock.go:47 +0x294

goroutine 45 [semacquire]:
sync.runtime_Semacquire(0x10b689cc)
    /usr/local/go/src/runtime/sema.go:47 +0x24
sync.(*WaitGroup).Wait(0x10b689c0)
    /usr/local/go/src/sync/waitgroup.go:131 +0x94
restic/repository.FilesInParallel(0x76ee6d88, 0x10ae0150, 0x4c1338, 0x5, 0x14, 0x10b34820, 0x0, 0x0)
    /tmp/restic-build-847077805/src/restic/repository/parallel.go:66 +0x14c
restic/repository.(*Repository).LoadIndex.func2(0x10b62040, 0x10b0bfc0, 0x10adc180, 0x10b689b0)
    /tmp/restic-build-847077805/src/restic/repository/repository.go:336 +0x9c
created by restic/repository.(*Repository).LoadIndex
    /tmp/restic-build-847077805/src/restic/repository/repository.go:337 +0x100

goroutine 138 [IO wait]:
net.runtime_pollWait(0x76ee4100, 0x72, 0x12c06000)
    /usr/local/go/src/runtime/netpoll.go:164 +0x44
net.(*pollDesc).wait(0x10a13cfc, 0x72, 0x6e20e0, 0x6e0138)
    /usr/local/go/src/net/fd_poll_runtime.go:75 +0x28
net.(*pollDesc).waitRead(0x10a13cfc, 0x12c06000, 0x1000)
    /usr/local/go/src/net/fd_poll_runtime.go:80 +0x24
net.(*netFD).Read(0x10a13cc0, 0x12c06000, 0x1000, 0x1000, 0x0, 0x6e20e0, 0x6e0138)
    /usr/local/go/src/net/fd_unix.go:250 +0x148
net.(*conn).Read(0x10b34ae8, 0x12c06000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:181 +0x58
net/http.(*persistConn).Read(0x10b3e6e0, 0x12c06000, 0x1000, 0x1000, 0x4, 0x67d2c, 0x10b632c0)
    /usr/local/go/src/net/http/transport.go:1316 +0x174
bufio.(*Reader).fill(0x10a16e10)
    /usr/local/go/src/bufio/bufio.go:97 +0xf4
bufio.(*Reader).Peek(0x10a16e10, 0x1, 0x12c57da8, 0x1, 0x0, 0x10b63200, 0x0)
    /usr/local/go/src/bufio/bufio.go:129 +0x58
net/http.(*persistConn).readLoop(0x10b3e6e0)
    /usr/local/go/src/net/http/transport.go:1474 +0x160
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1117 +0x814

goroutine 139 [select]:
net/http.(*persistConn).writeLoop(0x10b3e6e0)
    /usr/local/go/src/net/http/transport.go:1704 +0x394
created by net/http.(*Transport).dialConn
    /usr/local/go/src/net/http/transport.go:1118 +0x830

And dmesg

May 27 01:32:44 rp1 kernel: [1779339.099132] apache2 invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
May 27 01:32:47 rp1 kernel: [1779339.099149] apache2 cpuset=/ mems_allowed=0
May 27 01:32:47 rp1 kernel: [1779339.099183] CPU: 2 PID: 776 Comm: apache2 Not tainted 4.9.24-v7+ #993
May 27 01:32:47 rp1 kernel: [1779339.099189] Hardware name: BCM2835
May 27 01:32:47 rp1 kernel: [1779339.099236] [<8010fb3c>] (unwind_backtrace) from [<8010c058>] (show_stack+0x20/0x24)
May 27 01:32:47 rp1 rsyslogd-2007: action 'action 17' suspended, next retry is Sat May 27 01:34:17 2017 [try http://www.rsyslog.com/e/2007 ]
May 27 01:32:50 rp1 kernel: [1779339.099259] [<8010c058>] (show_stack) from [<80455200>] (dump_stack+0xd4/0x118)
May 27 01:32:50 rp1 kernel: [1779339.099281] [<80455200>] (dump_stack) from [<8026c998>] (dump_header+0x9c/0x1f4)
May 27 01:32:50 rp1 kernel: [1779339.099306] [<8026c998>] (dump_header) from [<80210364>] (oom_kill_process+0x3e0/0x4e4)
May 27 01:32:50 rp1 kernel: [1779339.099327] [<80210364>] (oom_kill_process) from [<802107cc>] (out_of_memory+0x124/0x334)
May 27 01:32:50 rp1 kernel: [1779339.099348] [<802107cc>] (out_of_memory) from [<802158f0>] (__alloc_pages_nodemask+0xcf4/0xdd0)
May 27 01:32:50 rp1 kernel: [1779339.099370] [<802158f0>] (__alloc_pages_nodemask) from [<8020e694>] (filemap_fault+0x474/0x674)
May 27 01:32:50 rp1 kernel: [1779339.099390] [<8020e694>] (filemap_fault) from [<8030c74c>] (ext4_filemap_fault+0x3c/0x50)
May 27 01:32:50 rp1 kernel: [1779339.099412] [<8030c74c>] (ext4_filemap_fault) from [<8023d844>] (__do_fault+0x7c/0x100)
May 27 01:32:50 rp1 kernel: [1779339.099430] [<8023d844>] (__do_fault) from [<80241b70>] (handle_mm_fault+0x5fc/0xde4)
May 27 01:32:50 rp1 kernel: [1779339.099452] [<80241b70>] (handle_mm_fault) from [<807193f4>] (do_page_fault+0x33c/0x3b0)
May 27 01:32:50 rp1 kernel: [1779339.099471] [<807193f4>] (do_page_fault) from [<801012a8>] (do_PrefetchAbort+0x44/0xa8)
May 27 01:32:50 rp1 kernel: [1779339.099488] [<801012a8>] (do_PrefetchAbort) from [<80718ea4>] (ret_from_exception+0x0/0x1c)
May 27 01:32:50 rp1 kernel: [1779339.099497] Exception stack(0xb2a89fb0 to 0xb2a89ff8)
May 27 01:32:50 rp1 kernel: [1779339.099509] 9fa0:                                     000f4240 00000000 00000003 00011175
May 27 01:32:50 rp1 kernel: [1779339.099524] 9fc0: 7ea3aad4 7ea3aacc 7ea3aad0 00000000 00000004 76912080 76c44cb0 553e0f60
May 27 01:32:50 rp1 kernel: [1779339.099536] 9fe0: 76ec9274 7ea3aa30 54bcd908 54bc3f68 20000010 ffffffff
May 27 01:32:50 rp1 kernel: [1779339.099543] Mem-Info:
May 27 01:32:51 rp1 kernel: [1779339.099573] active_anon:103001 inactive_anon:102938 isolated_anon:32
May 27 01:32:51 rp1 kernel: [1779339.099573]  active_file:344 inactive_file:665 isolated_file:0
May 27 01:32:51 rp1 kernel: [1779339.099573]  unevictable:0 dirty:0 writeback:0 unstable:0
May 27 01:32:51 rp1 kernel: [1779339.099573]  slab_reclaimable:1659 slab_unreclaimable:4074
May 27 01:32:51 rp1 kernel: [1779339.099573]  mapped:1022 shmem:686 pagetables:772 bounce:0
May 27 01:32:51 rp1 kernel: [1779339.099573]  free:4085 free_pcp:357 free_cma:0
May 27 01:32:51 rp1 kernel: [1779339.099599] Node 0 active_anon:412004kB inactive_anon:411852kB active_file:1376kB inactive_file:2660kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4088kB dirty:0kB writeback:0kB shmem:2744kB writeback_tmp:0kB unstable:0kB pages_scanned:37542 all_unreclaimable? yes
May 27 01:32:51 rp1 kernel: [1779339.099630] Normal free:16340kB min:16384kB low:20480kB high:24576kB active_anon:412004kB inactive_anon:411752kB active_file:1376kB inactive_file:2660kB unevictable:0kB writepending:0kB present:901120kB managed:880564kB mlocked:0kB slab_reclaimable:6636kB slab_unreclaimable:16296kB kernel_stack:1840kB pagetables:3088kB bounce:0kB free_pcp:1428kB local_pcp:316kB free_cma:0kB
May 27 01:32:51 rp1 kernel: lowmem_reserve[]: 0 0
May 27 01:32:51 rp1 kernel: [1779339.099651] Normal: 129*4kB (UME) 86*8kB (E) 50*16kB (UME) 50*32kB (ME) 26*64kB (UME) 51*128kB (UME) 8*256kB (UME) 3*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 16404kB
May 27 01:32:51 rp1 kernel: 1747 total pagecache pages
May 27 01:32:51 rp1 kernel: [1779339.099758] 62 pages in swap cache
May 27 01:32:51 rp1 kernel: [1779339.099767] Swap cache stats: add 95904, delete 95842, find 35132/41154
May 27 01:32:51 rp1 kernel: [1779339.099772] Free swap  = 0kB
May 27 01:32:51 rp1 kernel: [1779339.099776] Total swap = 102396kB
May 27 01:32:51 rp1 kernel: [1779339.099782] 225280 pages RAM
May 27 01:32:51 rp1 kernel: [1779339.099787] 0 pages HighMem/MovableOnly
May 27 01:32:51 rp1 kernel: [1779339.099792] 5139 pages reserved
...

Last few msgs in the debug log:

2017/05/29 03:12:50 debug/round_tripper_debug.go:87     debug.loggingRoundTripper.RoundTrip     103     ------------  HTTP RESPONSE ----------
HTTP/1.1 206 Partial Content
Content-Length: 449555642
Accept-Ranges: bytes
Content-Range: bytes 0-449555641/449555642
Content-Type: application/octet-stream
Date: Mon, 29 May 2017 01:12:50 GMT


2017/05/29 03:12:50 repository/index.go:454     repository.DecodeIndex  111     Start decoding index
2017/05/29 03:12:50 repository/index.go:486     repository.DecodeIndex  111     done
2017/05/29 03:12:54 repository/index.go:454     repository.DecodeIndex  108     Start decoding index
2017/05/29 03:13:01 repository/index.go:486     repository.DecodeIndex  108     done

Steps to reproduce the behavior

Pick a RP3, run apache and domotics (both not small memory users) and try to make a backup and see it oom.

Do you have any idea what may have caused this?

Excessive sized mallocs? A 512MB alloc is rather steep...
runtime: out of memory: cannot allocate 536870912-byte block (556793856 in use)
I mean... it did not even start scanning, so likely it was reading indexes oid. Which then confuses me, because there should be no index of that size.
\

-rw-------    1 admin    administ  428.7M May 28 15:19 270faf1dc62ce9393f38599cea9b3997220bed216345d2b2673a5ba62425248d
-rw-------    1 admin    administ    6.1M May 29 00:33 bef6d99358953f642115c290b1dd312ced2e2f7b34650524ac14156ce2a036f7
-rw-------    1 admin    administ   48.7K May 29 00:34 e91b8b4186b086b8def47025cb88efc12a0ef04ef91f885e72d0c5af801b4724

and this was after a prune earlier today.

So... does this mean restic is impossible to use on systems with memory <= 1GB like the RP3? Or many VMs in the world? That seems like a show stopper for wider deployment.

bug

Source

middelink

👍1

Most helpful comment

Ah, I see. I didn't realize that was a hard and fast rule about verifying the MAC first. I was thinking maybe it could be done as part of closing the stream or something. Sorry that I didn't read that earlier comment more closely, and I apologize for dragging you down this rabbit hole with me!

Don't worry about it :slightly_smiling_face: I get to see crypto break regularly during my day job as a pentester (sometimes I'm the one breaking it), so I'm very sensitive to these issues. On the other hand I know that most people don't know all the details from the top of their heads (which is fine).

Is MAC pre-verification done in order to guard against a particular vulnerability (e.g. executable code during JSON decode), or is it more of a best practice simply to validate everything up front before it flows through the program?

Checking the MAC only after data has been processed enables vulnerabilities. For example, Padding Oracle vulnerabilities rely on the fact that processing is done before a MAC is checked (if there even is one). For these vulnerabilities, leaking one bit of information about the plaintext enables decrypting the data and even encrypting arbitrary other data, without knowing the encryption key. It's really astonishing. In general, verifying the authenticity of the ciphertext before even attempting to decrypt the data is an established best practice.

For a while, there was a discussion if encrypt-then-mac (MAC is on the outside, so it can be checked before decryption) or mac-then-encrypt (MAC is encrypted, needs to be decrypted before it can be checked) is better. From my point of view, this has been settled and the former is clearly better. Here's a question on Security Stackexchange: https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac

fd0 on 20 Nov 2017

👍2 ❤1

All 48 comments

That's the Go stdlib JSON decoder for the index file. Hm. We need to find a solution for the index...

fd0 on 29 May 2017

Not sure if helpful, but json.NewDecoder(r).Decode(&thing) will decode the JSON by streaming it, rather than having to buffer the whole string in memory. I think.

mholt on 19 Jun 2017

That's unfortunately not applicable: It loads the whole (large) object into a buffer and then parses it. I've benchmarked that.

fd0 on 19 Jun 2017

😕1

Dang, okay. I wasn't sure, but it seemed promising since it took a Reader instead of a []byte.

mholt on 19 Jun 2017

As far as I know the stdlib JSON parser (and all other parsers I looked at) checks that the JSON document is well formed before starting to parse it. So it first needs to load the document (into a buffer) checks it, and only then starts parsing. This way, the parser can return an error early if the document is not well formed, without modifying the target structure.

I've thought about writing a small parser myself that splits the large document (which is just a list of smaller documents) into a stream of smaller documents. That may help. Haven't had the time to do that though.

fd0 on 20 Jun 2017

Does this issue mean that if the index for a restic backup repository gets over 512MB, we can no longer take backups to it?

Can we at least restore from repos with large indices?

Josh-West-Bose on 27 Sep 2017

Currently, restic won't work in this situation on a small memory constrained system like a RP3, neither backup nor restore. We're aware of the issue and there are plans to mitigate this, but no work has been done so far.

fd0 on 27 Sep 2017

Maybe as first step split large JSON into series of small ones? As I understand, internally restic takes not so much memory for storing data comparing to JSON thing.

pztrn on 28 Oct 2017

I've got a plan to improve this: Do not create one large index, but many smaller ones < 100MiB, and process them sequentially. That's completely backwards compatible and does not break the repo with older versions of restic. But I haven't had the time to implement this yet.

fd0 on 28 Oct 2017

I suddenly thought of a different way. How about bloom filters? there is a working golang implementation - under the same license as restic, so that should not pose a problem.

So the idea is to generate a bloom filter and during backup add newly generated hashes to the filter and writing them out. rebuild-index and prune would be in charge of deleting old and creating a new combined version, just like they handle indexes right now. It needs to be separate from the index file though, otherwise it defeats the whole memory blowup. Also we need to make changes to the indexfile, to be able to easily append packs/blocks to it, i guess.

I wrote a small test program, in which I tested bloom filters with a 0.01(76Mbits), 0.001(115Mbits) and 0.0001(153Mbits) probability of a false positive. Meaning that restic in that case would write out a duplicate block. However, as prune is well equipped to handle them (since overlapping backups windows will potentially create duplicate blocks as well), I do not see a problem.

In effect, when the backup wants to check if the freshly generated hash already exists, instead of going through the index, it does a bloomfilter lookup.

It would allow smaller boxes like my RP, to backup themself. Granted, it will not allow listing or finding inside the repo or mounting for that matter, but hey...

middelink on 7 Nov 2017

@mholt :

Not sure if helpful, but json.NewDecoder(r).Decode(&thing) will decode the JSON by streaming it,
rather than having to buffer the whole string in memory. I think.

@fd0 :

That's unfortunately not applicable: It loads the whole (large) object into a buffer and then
parses it. I've benchmarked that.

Hey, I did some experimentation and @mholt's comment above seems to be correct. You do need to do a tiny bit of manual parsing (to navigate brackets and whatnot). But once you do that, you can call Decode(&thing) on just a small part of the JSON stream and not need to keep the entire JSON in memory all at once.

I wrote a stream-friendly version of LoadIndex and it performs way better on RAM usage than the current code which uses json.Unmarshal. Check the memory logging in TestLoadGiantIndexUnmarshal vs TestLoadGiantIndexStreaming.

Unfortunately this is not a drop-in replacement, because much of restic's other Load* funcs deal in []byte rather than Reader. And I know you're looking into a more-general solution (multi-level indices, or whatever). But maybe something to think about for the future.

armhold on 10 Nov 2017

👍1

Uhm. I'm confused. The index is encrypted and all decoding happens inplace, which means the entire index is loaded in memory as a single []byte and decoded. How does parsing json piecemeal help here? In fact, json.Unmarshal() takes the []byte without further copying as far as i can tell. So unless you convert the jsonunpacker to a streaming interface... Oh wait, you wrote that :P

middelink on 11 Nov 2017

(Actually, I think it would be kinda awesome to have Reader/Writer versions of the encrypted loader/writers, way more go-isk. The writer might needs some thinking as you would have to write to a (local?) temp file since only at the end we know the hash and therefore its filename.)

middelink on 11 Nov 2017

The index is encrypted and all decoding happens inplace, which means the entire index is loaded in memory as a single []byte and decoded

Well like I said, it's not a drop-in solution for the current design. Much would have to be converted to Reader/Writer instead of []byte to make this work.

I just wanted to clarify that it is indeed possible to unmarshal a JSON stream piecemeal with lower memory constraints, because @fd0 indicated above that he was considering writing such a parser.

armhold on 11 Nov 2017

Personally I think the drive for keeping the internal data in json format is a mistake. The data is encrypted, so not directly readable anyway and restic cat xxx can marshal to json and show that instead. Doing that would allow restic to use the more performant gob encoders or whatever we come up with.

middelink on 11 Nov 2017

👍2

(Actually, I think it would be kinda awesome to have Reader/Writer versions of the encrypted loader/writers, way more go-isk.

You are right, and I tried that in the past (look for DecryptReader in the restic repo history). The underlying dilemma is that for encrypted data, we need to make sure the data is unmodified before we try to decrypt, read and process it. This implies that the whole encrypted object (e.g. the index) needs to be loaded and checked before. And for that we use the Go crypto functions, which all take a []byte, e.g. https://godoc.org/golang.org/x/crypto/poly1305#Verify.

So the only solution that I see here is making the objects we need to verify, decrypt and decode as small as possibel. For the index, this may mean having several smaller index files instead of one large index.

This also applies to encrypted blobs in a pack file. For data blobs we limit the max size to about 16MiB with the chunker, for tree blobs there's no limit. For blobs I can see several improvements:

Compression: will benefit both data and tree blobs
Using something else for the data structures, e.g. not JSON, benefits the tree blobs.
Use less concurrency during backup, so that less buffers are needed at the same time (I'm working on refactoring the archiver at the moment, this will be much easier with the new archiver).

@middelink In retrospect I think we could've chosen something else than JSON. I disagree with your opinion that it is "a mistake" to hang onto JSON, since having a stable and dependable repository format is very important for me. I'm open to switching to something else, but that must be done with a new revision of the repository format. Until then, we'll stick with JSON for backwards compatibility. In addition, I'd like to support the old repository format for a while.

@armhold Writing a streaming decoder for the index files was on my todo list a long time, I'll have a look!

fd0 on 11 Nov 2017

@armhold I've converted the tests in your streaming benchmark example to standard Benchmark functions and I don't see a huge difference in memory allocation (patch here):

$ go test -v -run xxxx -bench LoadGiant -benchmem
BenchmarkLoadGiantIndexUnmarshal-4             1        25576216163 ns/op       4184774680 B/op 50000170 allocs/op
BenchmarkLoadGiantIndexStreaming-4             1        26812757112 ns/op       4216545816 B/op 50000096 allocs/op

I've then tried to find a more realistic example, loading an index (~64MiB) from a file and decoding that (patch here):

$ go test -v -run xxxx -bench LoadGiant -benchmem
BenchmarkLoadGiantIndexUnmarshal-4             1        1870846300 ns/op        375686904 B/op   3503414 allocs/op
BenchmarkLoadGiantIndexStreaming-4             1        1950738010 ns/op        376708168 B/op   3503343 allocs/op

In both cases, the allocations happen in json.Unmarshal() (shown here for the streaming decoder):

$ go tool pprof --alloc_space mem.pprof
File: index.test
Type: alloc_space
Time: Nov 11, 2017 at 10:43am (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 418.10MB, 99.81% of 418.89MB total
Dropped 2 nodes (cum <= 2.09MB)
Showing top 10 nodes out of 18
      flat  flat%   sum%        cum   cum%
  145.04MB 34.63% 34.63%   181.04MB 43.22%  encoding/json.Unmarshal /usr/lib/go/src/encoding/json/decode.go
  115.42MB 27.55% 62.18%   115.42MB 27.55%  reflect.unsafe_NewArray /usr/lib/go/src/runtime/malloc.go
   64.09MB 15.30% 77.48%    64.09MB 15.30%  io/ioutil.readAll /usr/lib/go/src/io/ioutil/ioutil.go
   49.50MB 11.82% 89.30%   230.54MB 55.04%  github.com/restic/restic/internal/restic.(*ID).UnmarshalJSON /home/fd0/shared/work/go/src/github.com/restic/restic/internal/restic/id.go
      42MB 10.03% 99.33%   236.54MB 56.47%  encoding/json.(*decodeState).literalStore /usr/lib/go/src/encoding/json/decode.go
    1.50MB  0.36% 99.68%   116.92MB 27.91%  reflect.MakeSlice /usr/lib/go/src/reflect/value.go
    0.54MB  0.13% 99.81%   354.79MB 84.70%  github.com/restic/restic/internal/index.(*indexJsonStreamer).LoadIndex /home/fd0/shared/work/go/src/github.com/restic/restic/internal/index/index_stream.go
         0     0% 99.81%   354.26MB 84.57%  encoding/json.(*Decoder).Decode /usr/lib/go/src/encoding/json/stream.go
         0     0% 99.81%   351.97MB 84.02%  encoding/json.(*decodeState).array /usr/lib/go/src/encoding/json/decode.go
         0     0% 99.81%   236.54MB 56.47%  encoding/json.(*decodeState).literal /usr/lib/go/src/encoding/json/decode.go


(pprof) top10 -cum
Showing nodes accounting for 0.54MB, 0.13% of 418.89MB total
Dropped 2 nodes (cum <= 2.09MB)
Showing top 10 nodes out of 18
      flat  flat%   sum%        cum   cum%
         0     0%     0%   418.89MB   100%  github.com/restic/restic/internal/index.BenchmarkLoadGiantIndexStreaming /home/fd0/shared/work/go/src/github.com/restic/restic/internal/index/index_test.go
         0     0%     0%   418.89MB   100%  testing.(*B).run1.func1 /usr/lib/go/src/testing/benchmark.go
         0     0%     0%   418.89MB   100%  testing.(*B).runN /usr/lib/go/src/testing/benchmark.go
    0.54MB  0.13%  0.13%   354.79MB 84.70%  github.com/restic/restic/internal/index.(*indexJsonStreamer).LoadIndex /home/fd0/shared/work/go/src/github.com/restic/restic/internal/index/index_stream.go
         0     0%  0.13%   354.26MB 84.57%  encoding/json.(*Decoder).Decode /usr/lib/go/src/encoding/json/stream.go
         0     0%  0.13%   354.26MB 84.57%  github.com/restic/restic/internal/index.(*indexJsonStreamer).decodeNextValue /home/fd0/shared/work/go/src/github.com/restic/restic/internal/index/index_stream.go
         0     0%  0.13%   353.47MB 84.38%  encoding/json.(*decodeState).object /usr/lib/go/src/encoding/json/decode.go
         0     0%  0.13%   353.47MB 84.38%  encoding/json.(*decodeState).unmarshal /usr/lib/go/src/encoding/json/decode.go
         0     0%  0.13%   353.47MB 84.38%  encoding/json.(*decodeState).value /usr/lib/go/src/encoding/json/decode.go
         0     0%  0.13%   351.97MB 84.02%  encoding/json.(*decodeState).array /usr/lib/go/src/encoding/json/decode.go

(pprof) list decodeNextValue
Total: 418.89MB
ROUTINE ======================== github.com/restic/restic/internal/index.(*indexJsonStreamer).decodeNextValue in /home/fd0/shared/work/go/src/github.com/restic/restic/internal/index/index_stream.go
         0   354.26MB (flat, cum) 84.57% of Total
         .          .    108:func (j *indexJsonStreamer) decodeNextValue(d interface{}) {
         .          .    109:   if j.err != nil {
         .          .    110:           return
         .          .    111:   }
         .          .    112:
         .   354.26MB    113:   err := j.dec.Decode(d)
         .          .    114:   if err != nil {
         .          .    115:           j.err = err
         .          .    116:   }
         .          .    117:}
         .          .    118:

(pprof) list json.*Decoder.*Decode
Total: 418.89MB
ROUTINE ======================== encoding/json.(*Decoder).Decode in /usr/lib/go/src/encoding/json/stream.go
         0   354.26MB (flat, cum) 84.57% of Total
         .          .     52:   if !dec.tokenValueAllowed() {
         .          .     53:           return &SyntaxError{msg: "not at beginning of value"}
         .          .     54:   }
         .          .     55:
         .          .     56:   // Read whole value into buffer.
         .   809.97kB     57:   n, err := dec.readValue()
         .          .     58:   if err != nil {
         .          .     59:           return err
         .          .     60:   }
         .          .     61:   dec.d.init(dec.buf[dec.scanp : dec.scanp+n])
         .          .     62:   dec.scanp += n
         .          .     63:
         .          .     64:   // Don't save err from unmarshal into dec.err:
         .          .     65:   // the connection is still usable since we read a complete JSON
         .          .     66:   // object from it before the error happened.
         .   353.47MB     67:   err = dec.d.unmarshal(v)
         .          .     68:
         .          .     69:   // fixup token streaming state
         .          .     70:   dec.tokenValueEnd()
         .          .     71:
         .          .     72:   return err

And there we are again at json.Unmarshal(). Hm.

fd0 on 11 Nov 2017

@fd0 thanks for looking at my code. I'm somewhat new to profiling go, so I apologize if the analysis that follows is incorrect. But your benchmarks are showing # of allocs/op which is does not really answer the question "how much ram does this thing need to run?"

I actually started with a benchmark test and found that it seemed to be geared toward overall throughput/efficiency, and not peak memory requirements (which I think is causing failures for some users).

Regarding the pprof output and ~418MB usage- are you confident that this accurately represents the memory usage? The reason I ask is that I too tried this, but was stymied by the fact that it produced a totally empty memory profile! After lots of googling, I found that I had to run it with GODEBUG="memprofilerate=1" in order to get any kind of data.

But even then, "top10" shows only 300-400 kB for me. Yet in htop/Activity Monitor I was regularly seeing RSS usage in the GBs for my tests. In fact top10 seemed to show the same amount no matter how big I made the JSON! I know it's a statistical sample of what the program is doing at various points in time, but pprof (or perhaps my usage of it) is just not capturing the memory usage accurately.

So then I took a look at runtime.ReadMemStats(), and found that mem.HeapSys seemed to more closely track the RSS usage I was observing. If you look at the output of runMemoryLogger() and compare it to "htop" (or similar), I find that on my system, TestLoadGiantIndexUnmarshal uses > 6GB of ram where TestLoadGiantIndexStreaming uses only ~900MB. Do you get similar results on your system?

armhold on 11 Nov 2017

Clarification: I get ~300-400kB usage from pprof for inuse_space (which, I think is what we want to measure?) For alloc_space I get data in the GB range, but if my understanding is correct that only tells us how much as been allocated (and potentially freed) overall, not memory still in use.

armhold on 11 Nov 2017

Disclaimer/TL;DR: I'm not entirely sure I use the pprof tools the right way. As far as I understand it, looking at the size of the allocations of a function gives me an upper bound for the memory requirement.

I'll try to explain it the way I understood it with an example: Let's assume we have a function that allocates ten separate slices of []byte on the heap with size 1MiB each. The slices are used one by one and then discarded. When we run this function and look at the allocations via go test -bench . -benchmem [...], then we should get an allocation size of about 10MiB (not counting extra bytes that are allocated when the slices are made).

When the functions runs within a program, the first slices is allocated and used, then the second slice is allocated and used and so on. Let's consider both extremes that may happen for the Garbage Collector: When the GC does not run at all, the memory required is about 10MiB, so it may happen that the program uses 20MiB (because the allocator will request more from the OS than it currently needs). On the other hand, when the GC runs after each slice isn't in use any more, it may happen that the same chunk of memory is used for all the slices, so the overall memory requirement is just 1MiB and the program may use just 2MiB or so.

Using inuse_space for benchmark functions is futile (I've experienced that myself), and I think the reason is that a benchmark function is usually self-contained and doesn't reference any allocated memory after the function has returned, so there's nothing "in use" any more after the function ended. Can anybody confirm or falsify that? Or explain it better?

I've built a quick test program here: https://github.com/fd0/alloctest, it allocates ten slices of 1MiB each, fills them with random and then dumps that to stdout. Running the program with a small shell script (xtime) shows that it uses about 6MiB of memory:

$ for i in {1..10}; do xtime ./alloctest > /dev/null; done
0.00u 0.05s 0.05r 5892kB ./alloctest
0.00u 0.05s 0.05r 5868kB ./alloctest
0.00u 0.05s 0.05r 5864kB ./alloctest
0.00u 0.05s 0.05r 5928kB ./alloctest
0.00u 0.05s 0.05r 5864kB ./alloctest
0.00u 0.05s 0.05r 5912kB ./alloctest
0.00u 0.05s 0.05r 5896kB ./alloctest
0.00u 0.05s 0.05r 5956kB ./alloctest
0.00u 0.05s 0.05r 5872kB ./alloctest
0.00u 0.05s 0.05r 5900kB ./alloctest

Forcing a GC run after each individual dump via FORCEGC=1 (built into the program) shows that at least some memory is reused, the overall memory usage drops to about 3MiB:

$ for i in {1..10}; do FORCEGC=1 xtime ./alloctest > /dev/null; done
0.00u 0.06s 0.05r 2816kB ./alloctest
0.00u 0.05s 0.05r 2808kB ./alloctest
0.00u 0.05s 0.06r 2832kB ./alloctest
0.00u 0.06s 0.06r 2800kB ./alloctest
0.00u 0.06s 0.06r 2820kB ./alloctest
0.00u 0.06s 0.06r 2812kB ./alloctest
0.01u 0.05s 0.06r 2872kB ./alloctest
0.00u 0.07s 0.06r 2816kB ./alloctest
0.00u 0.07s 0.07r 2824kB ./alloctest
0.00u 0.07s 0.07r 2868kB ./alloctest

When we now use the go test and write a memprofile with the benchmark function, we get this:

$ go test -v -bench . -benchmem -memprofile mem.pprof
goos: linux
goarch: amd64
pkg: github.com/fd0/alloctest
BenchmarkSpewRandom-4                 20          52532117 ns/op        10485763 B/op         10 allocs/op
PASS
ok      github.com/fd0/alloctest        1.109s

The size is right (10MiB) and the number of allocations (10) also matches nicely.

As I thought the inuse_space profiling shows nothing:

$ go tool pprof mem.pprof 
File: alloctest.test
Type: inuse_space
Time: Nov 12, 2017 at 5:21pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 0, 0% of 0 total
      flat  flat%   sum%        cum   cum%

Looking at alloc_space is much more interesting:

$ go tool pprof --alloc_space mem.pprof   
File: alloctest.test
Type: alloc_space
Time: Nov 12, 2017 at 5:21pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 242.87MB, 100% of 242.87MB total
      flat  flat%   sum%        cum   cum%
  242.87MB   100%   100%   242.87MB   100%  github.com/fd0/alloctest.spewRandom /home/fd0/shared/work/go/src/github.com/fd0/alloctest/spewrnd.go
         0     0%   100%   242.87MB   100%  github.com/fd0/alloctest.BenchmarkSpewRandom /home/fd0/shared/work/go/src/github.com/fd0/alloctest/spewrnd_test.go
         0     0%   100%   231.30MB 95.24%  testing.(*B).launch /usr/lib/go/src/testing/benchmark.go
         0     0%   100%    11.57MB  4.76%  testing.(*B).run1.func1 /usr/lib/go/src/testing/benchmark.go
         0     0%   100%   242.87MB   100%  testing.(*B).runN /usr/lib/go/src/testing/benchmark.go

(pprof) list spewRandom
Total: 242.87MB
ROUTINE ======================== github.com/fd0/alloctest.spewRandom in /home/fd0/shared/work/go/src/github.com/fd0/alloctest/spewrnd.go
  242.87MB   242.87MB (flat, cum)   100% of Total
         .          .      7:   "runtime"
         .          .      8:)
         .          .      9:
         .          .     10:func spewRandom(out io.Writer) error {
         .          .     11:   for i := 0; i < 10; i++ {
  242.87MB   242.87MB     12:           buf := make([]byte, 1*1024*1024)
         .          .     13:           _, err := io.ReadFull(rand.Reader, buf)
         .          .     14:           if err != nil {
         .          .     15:                   return err
         .          .     16:           }
         .          .     17:

It found the main source of the memory usage easily, the line allocated 242MiB when the function is run 20 times (as done by the benchmarking code).

When we re-run this with a very limited time so that the function can only be run once, the number above becomes much more realistic:

$ go test -v -bench . -benchmem -benchtime 50ms -memprofile mem.pprof
goos: linux
goarch: amd64
pkg: github.com/fd0/alloctest
BenchmarkSpewRandom-4                  1          57192511 ns/op        10495520 B/op         25 allocs/op
PASS
ok      github.com/fd0/alloctest        0.060s

$ go tool pprof --alloc_space mem.pprof
File: alloctest.test
Type: alloc_space
Time: Nov 12, 2017 at 5:27pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
Showing nodes accounting for 12358.75kB, 100% of 12358.75kB total
Showing top 10 nodes out of 11
      flat  flat%   sum%        cum   cum%
11842.74kB 95.82% 95.82% 11842.74kB 95.82%  github.com/fd0/alloctest.spewRandom /home/fd0/shared/work/go/src/github.com/fd0/alloctest/spewrnd.go
  516.01kB  4.18%   100%   516.01kB  4.18%  unicode.init /usr/lib/go/src/unicode/tables.go
         0     0%   100%   516.01kB  4.18%  flag.init <autogenerated>
         0     0%   100%   516.01kB  4.18%  fmt.init <autogenerated>
         0     0%   100% 11842.74kB 95.82%  github.com/fd0/alloctest.BenchmarkSpewRandom /home/fd0/shared/work/go/src/github.com/fd0/alloctest/spewrnd_test.go
         0     0%   100%   516.01kB  4.18%  main.init <autogenerated>
         0     0%   100%   516.01kB  4.18%  reflect.init <autogenerated>
         0     0%   100%   516.01kB  4.18%  runtime.main /usr/lib/go/src/runtime/proc.go
         0     0%   100% 11842.74kB 95.82%  testing.(*B).run1.func1 /usr/lib/go/src/testing/benchmark.go
         0     0%   100% 11842.74kB 95.82%  testing.(*B).runN /usr/lib/go/src/testing/benchmark.go

(pprof) list spewRandom
Total: 12.07MB
ROUTINE ======================== github.com/fd0/alloctest.spewRandom in /home/fd0/shared/work/go/src/github.com/fd0/alloctest/spewrnd.go
   11.57MB    11.57MB (flat, cum) 95.82% of Total
         .          .      7:   "runtime"
         .          .      8:)
         .          .      9:
         .          .     10:func spewRandom(out io.Writer) error {
         .          .     11:   for i := 0; i < 10; i++ {
   11.57MB    11.57MB     12:           buf := make([]byte, 1*1024*1024)
         .          .     13:           _, err := io.ReadFull(rand.Reader, buf)
         .          .     14:           if err != nil {
         .          .     15:                   return err
         .          .     16:           }
         .          .     17:

It looks like the pprof tool may use MB instead of MiB.

fd0 on 12 Nov 2017

Hey @fd0, I really appreciate your patience and detailed example/code on this issue. I think maybe I'm doing a poor job explaining the "optimization" here.

My point in writing indexJsonStreamer was to respond your statement in this comment
I.e. to show that a streaming parser does not need to buffer the entire contents of the Reader before processing. While it's true that dec.Decode() seems to retain the entire JSON stream during processing, it's possible to write a simple parser that avoids this problem.

So I am claiming that the streaming implementation allows the index to be constructed with smaller RAM (potentially avoiding OOMkiller) because it does not need to retain the entire JSON byte array while processing it. That's it. :) I do understand (and others have mentioned) that the rest of the restic pipeline (download, decryption) deals in []byte, so this far from a complete solution to the problem.

That said, I applied your benchmark patch and pushed it here. A couple of things. I notice that in BenchmarkLoadGiantIndexUnmarshal, you're calling ResetTimer() after reading in the byte array. So that completely disregards the memory used by the array.

And then in BenchmarkLoadGiantIndexStreaming, you're re-using "rd" in the benchmark loop, which results in an EOF if the benchmark runs more than once (that had me scratching my head for a while... I thought it was a bug in my code.) You didn't get any test failures here?

I removed the calls to ResetTimer, and moved the reading of the JSON into the benchmark loop for both tests. I set the giantPackCount = 125000, which results in a JSON doc of ~64MB (you can print out the len(buf) to verify). I also created a third benchmark, BenchmarkLoadGiantIndexNaiveDecode, which naively uses dec.Decode(&indexJSON) on the stream.

You can see this in my branch, as of c54d2b7bf26bbd49a0b93c049e09bf1f6ca9d17a).

With this, I get the following results:

go test -v -run xxxx -bench LoadGiant -benchmem
goos: darwin
goarch: amd64
pkg: github.com/restic/restic/internal/index
BenchmarkLoadGiantIndexUnmarshal-4             1    1348193933 ns/op    529813576 B/op   3125180 allocs/op
BenchmarkLoadGiantIndexNaiveDecode-4           1    1330803104 ns/op    529808144 B/op   3125096 allocs/op
BenchmarkLoadGiantIndexStreaming-4             1    1344295237 ns/op    263871944 B/op   3125080 allocs/op
PASS
ok      github.com/restic/restic/internal/index 4.060s

Also with xtime:

$ xtime go test -v -run xxxx -bench BenchmarkLoadGiantIndexUnmarshal -benchmem
goos: darwin
goarch: amd64
pkg: github.com/restic/restic/internal/index
BenchmarkLoadGiantIndexUnmarshal-4             1    1359775293 ns/op    529813656 B/op   3125182 allocs/op
PASS
ok      github.com/restic/restic/internal/index 1.382s
2.62u 0.32s 2.45r 214128kB go test -v -run xxxx -bench BenchmarkLoadGiantIndexUnmarshal -benchmem

$ xtime go test -v -run xxxx -bench BenchmarkLoadGiantIndexNaiveDecode -benchmem
goos: darwin
goarch: amd64
pkg: github.com/restic/restic/internal/index
BenchmarkLoadGiantIndexNaiveDecode-4           1    1437833130 ns/op    529813896 B/op   3125184 allocs/op
PASS
ok      github.com/restic/restic/internal/index 1.464s
2.67u 0.33s 2.54r 279312kB go test -v -run xxxx -bench BenchmarkLoadGiantIndexNaiveDecode -benchmem

$ xtime go test -v -run xxxx -bench BenchmarkLoadGiantIndexStreaming -benchmem
goos: darwin
goarch: amd64
pkg: github.com/restic/restic/internal/index
BenchmarkLoadGiantIndexStreaming-4             1    1378258345 ns/op    263876024 B/op   3125144 allocs/op
PASS
ok      github.com/restic/restic/internal/index 1.390s
2.67u 0.28s 2.47r 93512kB go test -v -run xxxx -bench BenchmarkLoadGiantIndexStreaming -benchmem

armhold on 13 Nov 2017

Hi George,

My point in writing indexJsonStreamer was to respond your statement in this comment I.e. to show that a streaming parser does not need to buffer the entire contents of the Reader before processing. While it's true that dec.Decode() seems to retain the entire JSON stream during processing, it's possible to write a simple parser that avoids this problem.

Ok, thanks for the clarification. I think it's valuable to investigate in this direction and make the index parser more memory efficient, so thanks again for tackling this issue.

A couple of things. I notice that in BenchmarkLoadGiantIndexUnmarshal, you're calling ResetTimer() after reading in the byte array. So that completely disregards the memory used by the array.

Indeed, and that was on purpose: When testing a function which parses an index, I'd like to only benchmark that function, not the setup code needed to generate the data to be tested. That's why the ResetTimer() is there. When you remove it, you get the memory allocations for generating the testdata AND parsing it.

After all, for a streaming parser it shouldn't matter where the data comes from: From memory via a bytes.Reader or from a file or from an on-the-fly generator. So the source shouldn't be in the benchmark. This way, the benchmark functions can also be used with a very large in-memory buffer, as long as the buffer is allocated before ResetTimer() is called. I think maybe pprof ignores that, but then you could move the allocation to a separate function and ignore that in go tool pprof.

And then in BenchmarkLoadGiantIndexStreaming, you're re-using "rd" in the benchmark loop, which results in an EOF if the benchmark runs more than once (that had me scratching my head for a while... I thought it was a bug in my code.) You didn't get any test failures here?

Whoops, that's probably a bug. I can't remember any test failures.

I removed the calls to ResetTimer, and moved the reading of the JSON into the benchmark loop for both tests. I set the giantPackCount = 125000, which results in a JSON doc of ~64MB (you can print out the len(buf) to verify).

Hm. But now you're not testing the stream decoder. You're testing generate+decode, which is quite different. Is that on purpose? Why is the generation of the testdata relevant to the decoder? Did I maybe misunderstood something?

Your benchmark results look great, so I'll try to find some time to play with the code again. Please be aware that when using xtime with go test, the reported numbers (time, memory usage) include compiling the test binary. Use go test -c and then only run the test binary with xtime to get accurate numbers.

Keep up the good work! I'm still learning how Go manages memory, and I appreciate the fruitful discussion!

fd0 on 14 Nov 2017

Indeed, and that was on purpose: When testing a function which parses an index, I'd like to
only benchmark that function, not the setup code needed to generate the data to be tested. That's
why the ResetTimer() is there. When you remove it, you get the memory allocations for
generating the testdata AND parsing it.

I understand about not wanting to test setup code, but in this case the setup code has an important side effect- it creates a []byte on the heap that is required for json.Unmarshal. The size of that array is (arguably?) what's causing restic to crash with OOM for some people.

The streaming method does not create any such []byte, but rather processes the doc in small chunks (packs). So don't you think it makes sense to take the size of that array into account for the benchmark? I mean, that's entirely the point of this exercise- eliminate the huge byte array.

Hm. But now you're not testing the stream decoder. You're testing generate+decode, which
is quite different. Is that on purpose? Why is the generation of the testdata relevant to
the decoder? Did I maybe misunderstood something?

It's not strictly intentional to test the generation of test data, but rather the overall process of converting a JSON string -> JSON struct. In the case of using json.Unmarshal, part of that process is reading the bytes of the JSON string and having it in memory when Unmarshal is called. If you want to measure memory requirements, I can't see how you can leave that out. I guess you can leave the ResetTimer() call in place but then somehow manually add back len(buf) to the stats, but that seems like an unnecessary hoop to jump through.

If you think the jsonIndexProducer is somehow tainting the results, you can always replace it with a file on disk and ReadFile()/Open. I just found it handy to be able to alter the JSON size by updating a constant. (And also prevent adding a 64MB+ file to the repo.)

Use go test -c and then only run the test binary with xtime to get accurate numbers

Thanks, I didn't know about "go test -c"! Here are the new results:

$ go test -c
$ xtime ./index.test -test.benchmem -test.run xxxx -test.bench BenchmarkLoadGiantIndexUnmarshal
goos: darwin
goarch: amd64
pkg: github.com/restic/restic/internal/index
BenchmarkLoadGiantIndexUnmarshal-4             1    1376994447 ns/op    529813560 B/op   3125180 allocs/op
PASS
1.34u 0.07s 1.40r 279324kB ./index.test -test.benchmem -test.run xxxx -test.bench BenchmarkLoadGiantIndexUnmarshal

$ xtime ./index.test -test.benchmem -test.run xxxx -test.bench BenchmarkLoadGiantIndexNaiveDecode
goos: darwin
goarch: amd64
pkg: github.com/restic/restic/internal/index
BenchmarkLoadGiantIndexNaiveDecode-4           1    1452151516 ns/op    529813800 B/op   3125183 allocs/op
PASS
1.40u 0.08s 1.47r 279364kB ./index.test -test.benchmem -test.run xxxx -test.bench BenchmarkLoadGiantIndexNaiveDecode

$ xtime ./index.test -test.benchmem -test.run xxxx -test.bench BenchmarkLoadGiantIndexStreaming
goos: darwin
goarch: amd64
pkg: github.com/restic/restic/internal/index
BenchmarkLoadGiantIndexStreaming-4             1    1373019670 ns/op    263875880 B/op   3125142 allocs/op
PASS
1.41u 0.02s 1.38r 69684kB ./index.test -test.benchmem -test.run xxxx -test.bench BenchmarkLoadGiantIndexStreaming

armhold on 15 Nov 2017

Dave Cheney has published slides from a new talk recently, this may be interesting: http://talks.godoc.org/github.com/davecheney/qconsf-2017/high-performance-go.slide#20 (I did not have the time to look at them in detail).

fd0 on 16 Nov 2017

I've plugged the index decoder into restic and ran a few tests with one of my repos.

$ du -sh $RESTIC_REPOSITORY/index
100M    /srv/data/Backup/fd0/restic-repo/index
$ find $RESTIC_REPOSITORY/index -type f | wc -l
14

First, with the default decoder (using json.Unmarshal):

$ for i in {1..10}; do xtime ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d > /dev/null ; done
4.34u 0.06s 6.16r 475472kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.23u 0.05s 4.10r 478024kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.25u 0.06s 4.08r 468204kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.20u 0.07s 4.05r 471504kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.38u 0.06s 4.19r 411056kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.26u 0.09s 4.14r 478412kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.22u 0.06s 4.04r 468856kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.22u 0.07s 4.07r 377944kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.26u 0.06s 4.07r 465476kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.18u 0.08s 4.03r 474536kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d

Then with the streaming code:

4.29u 0.04s 5.97r 458628kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.27u 0.06s 4.12r 458740kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.29u 0.06s 4.12r 461960kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.24u 0.06s 4.08r 474260kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.24u 0.07s 4.10r 452684kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.38u 0.07s 4.21r 493912kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.31u 0.04s 4.17r 469864kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.27u 0.07s 4.13r 452656kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.31u 0.07s 4.14r 457768kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.26u 0.06s 4.11r 458692kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d

I've then rebuild the index (so it's just one large index), and re-ran the test:

Default decoder:

4.32u 0.08s 6.41r 539184kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.32u 0.07s 5.02r 537380kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.29u 0.08s 5.07r 537068kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.49u 0.07s 4.69r 539496kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.65u 0.10s 5.23r 514568kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.52u 0.09s 5.06r 539272kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.50u 0.07s 5.06r 539124kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.27u 0.07s 4.82r 506424kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.30u 0.07s 5.10r 541400kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.26u 0.07s 5.07r 537172kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d

Streaming decoder:

4.39u 0.10s 4.88r 612212kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.35u 0.07s 5.01r 605396kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.39u 0.07s 5.09r 605860kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.59u 0.10s 5.27r 605316kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.49u 0.07s 4.98r 604272kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.36u 0.07s 4.56r 605296kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.35u 0.06s 5.03r 607400kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.33u 0.08s 5.07r 580840kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.36u 0.07s 5.08r 605456kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.32u 0.07s 5.05r 576320kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d

So it looks like at least for the current restic code (and the way it uses the index decoder), the streaming decoder uses more memory. Hm.

fd0 on 18 Nov 2017

How was it plugged in? Is LoadAndDecrypt() still returning a []byte, and you're just feeding that to a reader? I'm completely unsurprised if the memory footprint does not change (or gets a little worse) if you just stream from a []byte.

armhold on 18 Nov 2017

Me too. I've pushed the branch to the improve-index-decoder branch in case you want to play with it. There's a second commit in it (which removes the decoder for the old index format, which we haven't used in ages), don't worry about that. Here's the relevant part: https://github.com/restic/restic/blob/5925fa0d1c79d5155de41ef92d13f9cb50c1ffeb/internal/repository/index.go#L465-L472

fd0 on 18 Nov 2017

Removing the intermediate structs just used for the JSON encoding, the memory consumption of the streaming decoder is roughly the same:

4.51u 0.18s 6.89r 542656kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.39u 0.16s 5.04r 573636kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.46u 0.17s 5.21r 570744kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.41u 0.15s 5.03r 571840kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.81u 0.16s 5.48r 571724kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.67u 0.16s 5.30r 573168kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.71u 0.14s 5.29r 543688kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.66u 0.13s 5.23r 572708kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.62u 0.16s 5.19r 544160kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d
4.75u 0.15s 5.31r 543296kB ./restic cat blob f14462ad116184ebcbf6e9830459ebfdabea0b9a940d0dc62c857ce227e04c9d

I've also pushed that commit.

fd0 on 18 Nov 2017

Yeah, that code still generates a big []byte and then streams from it, so no surprises there.

Using the indexJsonStreamer is not going to help unless the entire workflow from backend -> json can be streamed. To that end, I experimented a bit with a LoadAndDecryptStream() method on repository. You can grab it here.

I had trouble with two things:

The mac bytes are stored at the end of the file, and these bytes should not be fed to the decryption func. So we need a stream that "reserves" the last n bytes from the end of a stream. I wrote a "ReservedReadCloser". I'm not entirely confident it covers every edgecase, but I wrote a quick unit test to cover the obvious ones.
I don't see a stream-compatible way to do poly1305.Verify(). Maybe I didn't look hard enough? For the moment, my code just punts on this with a TODO.

It's not exactly apples-to-apples, since poly1305 is not implemented, but here is a quick test with a 256MB index. Note that this does not involve JSON decoding at all; it just writes the data to stdout.

$ du -hs /Volumes/backup1/restic-repo/index/ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
256M    /Volumes/backup1/restic-repo/index/ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58


$ for i in {1..10}; do xtime ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58  > /dev/null ; done
1.77u 0.31s 8.74r 1088172kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.75u 0.31s 2.26r 1089420kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.74u 0.32s 2.24r 1089384kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.74u 0.31s 2.23r 1089284kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.76u 0.31s 2.26r 1089312kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.81u 0.37s 2.37r 1089368kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.77u 0.33s 2.27r 1090124kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.75u 0.32s 2.25r 1090104kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.77u 0.32s 2.26r 1089292kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58
1.76u 0.32s 2.26r 1089316kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58

$ for i in {1..10}; do xtime ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream  > /dev/null ; done
1.41u 0.10s 1.72r 40468kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.11s 1.75r 40428kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40468kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.10s 1.72r 40320kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40480kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40504kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.10s 1.72r 40296kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40480kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.41u 0.10s 1.72r 40268kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream
1.42u 0.10s 1.73r 40416kB ./restic -r /Volumes/backup1/restic-repo cat index ef78688a1dfe063256ad45dce39b1e66c82557b98020ec69958401bb6fa96b58 --stream

armhold on 19 Nov 2017

👍1

Just to be clear, as I hinted in https://github.com/restic/restic/issues/979#issuecomment-343651588 above, we must verify the MAC before attempting to processing the contents in any way. Especially, we must not decrypt or even JSON decode data for which the MAC has not been verified yet.

I don't see how we can do that while having a streaming architecture (and I've tried, look for DecryptReader in the git history). We can (and should) build a streaming architecture, but after decryption. That's why I was so interested in the memory usage of the JSON decoder independent of the source of the data (e.g. the []byte).

I've had a quick look at the code in your branch and it looks like the code in there does exactly what we must not do: start decrypting data before verifying the MAC. That's terribly insecure and violates established best practice in crypto. I won't merge any code which does that.

My ideas for improving the situation are:

Change the code so that the objects are smaller: For indexes, this means having more than one index file (e.g. as produced by rebuild-index), which are then processed sequentially. Each index file can then be limited to < 100MiB.
Make the JSON decoder as efficient as possible so that it (ideally) doesn't allocate/use any additional memory.

I really appreciate your effort!

fd0 on 19 Nov 2017

👍1

Ah, I see. I didn't realize that was a hard and fast rule about verifying the MAC first. I was thinking maybe it could be done as part of closing the stream or something. Sorry that I didn't read that earlier comment more closely, and I apologize for dragging you down this rabbit hole with me!

Is MAC pre-verification done in order to guard against a particular vulnerability (e.g. executable code during JSON decode), or is it more of a best practice simply to validate everything up front before it flows through the program? I'm not questioning your decisions at all; just asking for my own education.

armhold on 20 Nov 2017

Ah, I see. I didn't realize that was a hard and fast rule about verifying the MAC first. I was thinking maybe it could be done as part of closing the stream or something. Sorry that I didn't read that earlier comment more closely, and I apologize for dragging you down this rabbit hole with me!

Is MAC pre-verification done in order to guard against a particular vulnerability (e.g. executable code during JSON decode), or is it more of a best practice simply to validate everything up front before it flows through the program?

fd0 on 20 Nov 2017

👍2 ❤1

I've implemented a fix in #1584, I'd be interested in feedback.

fd0 on 25 Jan 2018

Did not really help. In my smaller VMs (512MB and 1GB ones) restic still provokes a OOM kill before it even prints anything after the "password correct". It only completes in a 4GB VM and the VM hosts which have a shitload^T^Tenough memory. So still having to do my rsync backups :(

middelink on 4 Mar 2018

@middelink did you run rebuild-index (on a machine with more RAM) before?

fd0 on 4 Mar 2018

@fd0 not rebuild index per se, but there is a forget/prune running every Sunday morning, so I have had the multi-index files for a while.

middelink on 18 Mar 2018

Ah, hm. That's correct.

fd0 on 19 Mar 2018

You could try running restic with loadIndexParallelism = 1, here:

https://github.com/restic/restic/blob/4a524da73653e9c235164821723fc3c5758f9e4b/internal/repository/repository.go#L376

fd0 on 19 Mar 2018

Re: the JSON decoding taking lots of memory, I just stumbled on this (brand new!) repository today: https://github.com/francoispqt/gojay

It claims to be the fastest JSON parser and it supports stream decoding. Could that solve any problems?

mholt on 26 Apr 2018

Maybe, but someone would need to build a prototype. Although from the comments/issues (especially the one about unsafe use of unsafe) I'd rather let is stabilize for some time...

fd0 on 27 Apr 2018

Yeah it definitely needs some work. I wouldn't spend any time on it (yet).

mholt on 27 Apr 2018

is there any work around on this? I run my restic server on a 2GB armv7 nas. And one of the repo suddenly crashes when I run check due to the above OOM.

donbowman on 15 Aug 2018

@donbowman I presume you already tried with things like GOGC=20 prepended before the restic command? It helped with one of my very memory constrained machines.

rawtaz on 15 Aug 2018

I tried w/ the above loadparallelism change as well.

# env GOGC=20 RESTIC_PASSWORD=XXXX RESTIC_REPOSITORY=XXX /tmp/restic check
using temporary cache in /tmp/restic-check-cache-488621355
repository e0167288 opened successfully, password is correct
create exclusive lock for repository
load indexes
Killed

I guess the only difference is the lack of the large go stack trace about running out of memory :)

forgot to mention. I have 4GB of swap to go w/ the 2GB of ram. Not sure how much gets used when this runs.

donbowman on 15 Aug 2018

I've started running into this problem on a not-so-tiny machine. x86-64 VM with 8GB of RAM, about 2GB of which are in use by applications. restic gets OOM killed after using up the remaining 6GB of RAM.

I am only backing up a directory with 2 files (total size 3843 bytes). Number and size of the files being backed up isn't an issue.

$ du -bc /var/spool/cron/*
3298    /var/spool/cron/foo
545 /var/spool/cron/root
3843    total

I am using AWS S3 as a backend, using a bucket shared by about 10 other machines. Some of the other machines have somewhat large (multi-TB) backup sets, but this machine does not. This machine did backup successfully a few times, but has been unable to create a backup for about a week.

I have only ever used restic 0.9.1 and 0.9.2, so I don't think I suffer from the "large index" problem mentioned earlier within this issue. I listed my S3 bucket's /index/ directory and sorted by file size. The index files range from a few KB to 6.67 MB at the largest. There are only 16 index files >= 5 MB.

Here is the version number.

$ /usr/bin/restic version
restic 0.9.2 compiled with go1.10.3 on linux/amd64

Here is my backup command:

$ GOGC=20 /usr/bin/restic --verbose=1000 backup /var/spool/cron
open repository
repository abcd01234 opened successfully, password is correct
lock repository
load index files

Is there anything at all I can do, other than wait for this situation to be improved? I'm an experienced sysadmin (lots C++ and Python experience, but no golang experience). If there is anything I can do to help troubleshoot this, please ask!

irasnyd on 10 Oct 2018

I'm also experiencing this issue using the b2 backend. Repository is right around 1.5TB total, index directory is 6.8GB (2274 files, largest one <30MB). restic rapidly takes up ~23.5GB memory, then it seems to stabilize.

harrim4n on 1 Dec 2018

👍2

I have the same problem with a 100GB directory. Restic gets killed after running for two hours because of low memory.

unamuno on 8 Feb 2019

I looked into the backup data and find a connection between memory usage and file count. I have over 2 million files most of them very small. Excluding them worked for me. 512mb ram was needed for a backup of 100.000 files.

unamuno on 14 Feb 2019

Was this page helpful?

0 / 5 - 0 ratings