Caddy: Memory leak using file_server with Caddy 2.2.1 (Update: high memory usage, but probably not a leak)

Created on 21 Oct 2020  路  7Comments  路  Source: caddyserver/caddy

Hi folks,

I am running what should be the absolute simplest use-case for Caddy and yet I'm experiencing an on-going memory leak that only lets me run Caddy for 30 minutes at a time.

I am using it as a file_server for two domains, and also handling two redirects from the naked domain to www.

Here is the Caddyfile I am running:

www.mikehearn.com {
    file_server {
        root /home/mikehearn/sites/mikehearn.com
    }
}

mikehearn.com {
    redir https://www.mikehearn.com{uri}
}

www.transparenttextures.com {
    file_server {
        root /home/mikehearn/sites/transparenttextures.com/output
    }
}

transparenttextures.com {
    redir https://www.transparenttextures.com{uri}
}

I'm running this on a Digital Ocean droplet with 1GB of RAM. When starting Caddy via systemd, it begins using ~1-2% of RAM and then steadily increases until it goes >50% and the server stops responding. If I restart it, the cycle starts anew.

I'm not sure where to begin diagnosing this leak, but I wanted to start the discussion and hopefully someone can point me in the right direction.

Caddy verison: v2.2.1 h1:Q62GWHMtztnvyRU+KPOpw6fNfeCD3SkwH7SfT1Tgt2c=
Installed via: Github release .deb and apt
OS: Ubuntu 20.04.1 LTS
Kernel: 5.4.0-45-generic

Most helpful comment

Rebooted the server with all timeouts set to 60s, turned off Cloudflare, and memory usage never got about 5%. Seems like that was the cause!

Thank you for the help. I'll close this issue.

Also, one final thought, I saw you linked to an issue regarding server-level vars in the Caddyfile structure. I wholeheartedly support this. In theory my config should be dead simple: two file_servers, two redirects and now the timeout settings. But the JSON structure makes it seem like an immensely complicated setup. It's now 149 lines compared to what should be something like 25.

I'd love if there was a way to set the timeouts without having to get deep into the JSON. To be totally honest, if caddy adapt didn't exist, I don't think I would have had the patience to write it from scratch.

Just my two cents on that issue. Thanks again.

All 7 comments

I tried running the same config via the Docker image (id e4fd2a84cc27) and the memory leak persists.

Hmm, that sounds unawesome...

Is the server busy? What are the requests like? Does the memory usage grow proportionate to the number of requests? Does it happen if you disconnect the server from any public-facing interfaces and you make the requests instead?

Please open localhost:2019/debug/pprof in your browser and take a look at allocations and goroutines. What do you see? (Full output would be ideal.) You could also download a profile, but I'm kinda clumsy with the tooling; usually seeing goroutines and allocations give enough of a clue for starters.

transparenttextures.com gets a fair amount of traffic, most of it consisting of directly serving images. Before switching to Caddy I was using nginx, and at any given time it was serving between 2-4mbps. Over the course of the month that adds up to about 1TB.

Regarding the RAM increasing as requests grow, I believe the answer is no. The RAM usage was growing too fast to be linear to requests. Caddy would go from restart to maxing the RAM and crashing over the course of about 30-60 minutes, and I reproduced this at various points throughout the day. During those times I believe traffic was stable, no major spikes.

Yesterday evening I put the server behind Cloudflare which seems to have mitigated enough of the traffic to keep it stable for the time being. It's now running consistently using about 35% of the 1GB of RAM on the droplet.

Here are the details from pprof:


goroutine (count 5474)

goroutine profile: total 5580
2759 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65361f 0x65362a 0x47d2e7 0x6de2c9 0x6de27a 0x6deb45 0x6e9c29 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54     runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44     internal/poll/fd_poll_runtime.go:87
#   0x4b4890    internal/poll.(*pollDesc).waitRead+0x1b0    internal/poll/fd_poll_runtime.go:92
#   0x4b4872    internal/poll.(*FD).Read+0x192          internal/poll/fd_unix.go:159
#   0x58a86e    net.(*netFD).Read+0x4e              net/fd_posix.go:55
#   0x59e4ad    net.(*conn).Read+0x8d               net/net.go:182
#   0x6502e1    crypto/tls.(*atLeastReader).Read+0x61       crypto/tls/conn.go:779
#   0x487890    bytes.(*Buffer).ReadFrom+0xb0           bytes/buffer.go:204
#   0x650532    crypto/tls.(*Conn).readFromUntil+0xf2       crypto/tls/conn.go:801
#   0x64d354    crypto/tls.(*Conn).readRecordOrCCS+0x114    crypto/tls/conn.go:608
#   0x65361e    crypto/tls.(*Conn).readRecord+0x15e     crypto/tls/conn.go:576
#   0x653629    crypto/tls.(*Conn).Read+0x169           crypto/tls/conn.go:1252
#   0x47d2e6    io.ReadAtLeast+0x86             io/io.go:314
#   0x6de2c8    io.ReadFull+0x88                io/io.go:333
#   0x6de279    net/http.http2readFrameHeader+0x39      net/http/h2_bundle.go:1477
#   0x6deb44    net/http.(*http2Framer).ReadFrame+0xa4      net/http/h2_bundle.go:1735
#   0x6e9c28    net/http.(*http2serverConn).readFrames+0xa8 net/http/h2_bundle.go:4314

2759 @ 0x43a2a5 0x44a3e5 0x6ea83c 0x6e8985 0x73d470 0x71b434 0x470001
#   0x6ea83b    net/http.(*http2serverConn).serve+0x59b     net/http/h2_bundle.go:4428
#   0x6e8984    net/http.(*http2Server).ServeConn+0x724     net/http/h2_bundle.go:4038
#   0x73d46f    net/http.http2ConfigureServer.func1+0xef    net/http/h2_bundle.go:3864
#   0x71b433    net/http.(*conn).serve+0x1233           net/http/server.go:1834

23 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65361f 0x65362a 0x714dad 0x4cd3c5 0x4ce11d 0x4ce354 0x691e2c 0x70ef4a 0x70ef79 0x71623a 0x71a905 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54     runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44     internal/poll/fd_poll_runtime.go:87
#   0x4b4890    internal/poll.(*pollDesc).waitRead+0x1b0    internal/poll/fd_poll_runtime.go:92
#   0x4b4872    internal/poll.(*FD).Read+0x192          internal/poll/fd_unix.go:159
#   0x58a86e    net.(*netFD).Read+0x4e              net/fd_posix.go:55
#   0x59e4ad    net.(*conn).Read+0x8d               net/net.go:182
#   0x6502e1    crypto/tls.(*atLeastReader).Read+0x61       crypto/tls/conn.go:779
#   0x487890    bytes.(*Buffer).ReadFrom+0xb0           bytes/buffer.go:204
#   0x650532    crypto/tls.(*Conn).readFromUntil+0xf2       crypto/tls/conn.go:801
#   0x64d354    crypto/tls.(*Conn).readRecordOrCCS+0x114    crypto/tls/conn.go:608
#   0x65361e    crypto/tls.(*Conn).readRecord+0x15e     crypto/tls/conn.go:576
#   0x653629    crypto/tls.(*Conn).Read+0x169           crypto/tls/conn.go:1252
#   0x714dac    net/http.(*connReader).Read+0x1ac       net/http/server.go:798
#   0x4cd3c4    bufio.(*Reader).fill+0x104          bufio/bufio.go:101
#   0x4ce11c    bufio.(*Reader).ReadSlice+0x3c          bufio/bufio.go:360
#   0x4ce353    bufio.(*Reader).ReadLine+0x33           bufio/bufio.go:389
#   0x691e2b    net/textproto.(*Reader).readLineSlice+0x6b  net/textproto/reader.go:58
#   0x70ef49    net/textproto.(*Reader).ReadLine+0xa9       net/textproto/reader.go:39
#   0x70ef78    net/http.readRequest+0xd8           net/http/request.go:1012
#   0x716239    net/http.(*conn).readRequest+0x199      net/http/server.go:984
#   0x71a904    net/http.(*conn).serve+0x704            net/http/server.go:1851

14 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65156d 0x651578 0x66a025 0x669985 0x653c69 0x71a3a5 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54     runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44     internal/poll/fd_poll_runtime.go:87
#   0x4b4890    internal/poll.(*pollDesc).waitRead+0x1b0    internal/poll/fd_poll_runtime.go:92
#   0x4b4872    internal/poll.(*FD).Read+0x192          internal/poll/fd_unix.go:159
#   0x58a86e    net.(*netFD).Read+0x4e              net/fd_posix.go:55
#   0x59e4ad    net.(*conn).Read+0x8d               net/net.go:182
#   0x6502e1    crypto/tls.(*atLeastReader).Read+0x61       crypto/tls/conn.go:779
#   0x487890    bytes.(*Buffer).ReadFrom+0xb0           bytes/buffer.go:204
#   0x650532    crypto/tls.(*Conn).readFromUntil+0xf2       crypto/tls/conn.go:801
#   0x64d354    crypto/tls.(*Conn).readRecordOrCCS+0x114    crypto/tls/conn.go:608
#   0x65156c    crypto/tls.(*Conn).readRecord+0x6c      crypto/tls/conn.go:576
#   0x651577    crypto/tls.(*Conn).readHandshake+0x77       crypto/tls/conn.go:992
#   0x66a024    crypto/tls.(*Conn).readClientHello+0x44     crypto/tls/handshake_server.go:127
#   0x669984    crypto/tls.(*Conn).serverHandshake+0x44     crypto/tls/handshake_server.go:40
#   0x653c68    crypto/tls.(*Conn).Handshake+0xc8       crypto/tls/conn.go:1362
#   0x71a3a4    net/http.(*conn).serve+0x1a4            net/http/server.go:1817

5 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x714dad 0x4cd3c5 0x4ce11d 0x4ce354 0x691e2c 0x70ef4a 0x70ef79 0x71623a 0x71a905 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54     runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44     internal/poll/fd_poll_runtime.go:87
#   0x4b4890    internal/poll.(*pollDesc).waitRead+0x1b0    internal/poll/fd_poll_runtime.go:92
#   0x4b4872    internal/poll.(*FD).Read+0x192          internal/poll/fd_unix.go:159
#   0x58a86e    net.(*netFD).Read+0x4e              net/fd_posix.go:55
#   0x59e4ad    net.(*conn).Read+0x8d               net/net.go:182
#   0x714dac    net/http.(*connReader).Read+0x1ac       net/http/server.go:798
#   0x4cd3c4    bufio.(*Reader).fill+0x104          bufio/bufio.go:101
#   0x4ce11c    bufio.(*Reader).ReadSlice+0x3c          bufio/bufio.go:360
#   0x4ce353    bufio.(*Reader).ReadLine+0x33           bufio/bufio.go:389
#   0x691e2b    net/textproto.(*Reader).readLineSlice+0x6b  net/textproto/reader.go:58
#   0x70ef49    net/textproto.(*Reader).ReadLine+0xa9       net/textproto/reader.go:39
#   0x70ef78    net/http.readRequest+0xd8           net/http/request.go:1012
#   0x716239    net/http.(*conn).readRequest+0x199      net/http/server.go:984
#   0x71a904    net/http.(*conn).serve+0x704            net/http/server.go:1851

3 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65361f 0x65362a 0x47d2e7 0x73d57e 0x73d510 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54         runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44         internal/poll/fd_poll_runtime.go:87
#   0x4b4890    internal/poll.(*pollDesc).waitRead+0x1b0        internal/poll/fd_poll_runtime.go:92
#   0x4b4872    internal/poll.(*FD).Read+0x192              internal/poll/fd_unix.go:159
#   0x58a86e    net.(*netFD).Read+0x4e                  net/fd_posix.go:55
#   0x59e4ad    net.(*conn).Read+0x8d                   net/net.go:182
#   0x6502e1    crypto/tls.(*atLeastReader).Read+0x61           crypto/tls/conn.go:779
#   0x487890    bytes.(*Buffer).ReadFrom+0xb0               bytes/buffer.go:204
#   0x650532    crypto/tls.(*Conn).readFromUntil+0xf2           crypto/tls/conn.go:801
#   0x64d354    crypto/tls.(*Conn).readRecordOrCCS+0x114        crypto/tls/conn.go:608
#   0x65361e    crypto/tls.(*Conn).readRecord+0x15e         crypto/tls/conn.go:576
#   0x653629    crypto/tls.(*Conn).Read+0x169               crypto/tls/conn.go:1252
#   0x47d2e6    io.ReadAtLeast+0x86                 io/io.go:314
#   0x73d57d    io.ReadFull+0xbd                    io/io.go:333
#   0x73d50f    net/http.(*http2serverConn).readPreface.func1+0x4f  net/http/h2_bundle.go:4536

3 @ 0x43a2a5 0x44a3e5 0x6eb6b1 0x6ea585 0x6e8985 0x73d470 0x71b434 0x470001
#   0x6eb6b0    net/http.(*http2serverConn).readPreface+0x150   net/http/h2_bundle.go:4546
#   0x6ea584    net/http.(*http2serverConn).serve+0x2e4     net/http/h2_bundle.go:4404
#   0x6e8984    net/http.(*http2Server).ServeConn+0x724     net/http/h2_bundle.go:4038
#   0x73d46f    net/http.http2ConfigureServer.func1+0xef    net/http/h2_bundle.go:3864
#   0x71b433    net/http.(*conn).serve+0x1233           net/http/server.go:1834

2 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65156d 0x651578 0x67477a 0x66f7c7 0x6699fc 0x653c69 0x71a3a5 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54             runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44             internal/poll/fd_poll_runtime.go:87
#   0x4b4890    internal/poll.(*pollDesc).waitRead+0x1b0            internal/poll/fd_poll_runtime.go:92
#   0x4b4872    internal/poll.(*FD).Read+0x192                  internal/poll/fd_unix.go:159
#   0x58a86e    net.(*netFD).Read+0x4e                      net/fd_posix.go:55
#   0x59e4ad    net.(*conn).Read+0x8d                       net/net.go:182
#   0x6502e1    crypto/tls.(*atLeastReader).Read+0x61               crypto/tls/conn.go:779
#   0x487890    bytes.(*Buffer).ReadFrom+0xb0                   bytes/buffer.go:204
#   0x650532    crypto/tls.(*Conn).readFromUntil+0xf2               crypto/tls/conn.go:801
#   0x64d354    crypto/tls.(*Conn).readRecordOrCCS+0x114            crypto/tls/conn.go:608
#   0x65156c    crypto/tls.(*Conn).readRecord+0x6c              crypto/tls/conn.go:576
#   0x651577    crypto/tls.(*Conn).readHandshake+0x77               crypto/tls/conn.go:992
#   0x674779    crypto/tls.(*serverHandshakeStateTLS13).readClientFinished+0x39 crypto/tls/handshake_server_tls13.go:840
#   0x66f7c6    crypto/tls.(*serverHandshakeStateTLS13).handshake+0x146     crypto/tls/handshake_server_tls13.go:74
#   0x6699fb    crypto/tls.(*Conn).serverHandshake+0xbb             crypto/tls/handshake_server.go:50
#   0x653c68    crypto/tls.(*Conn).Handshake+0xc8               crypto/tls/conn.go:1362
#   0x71a3a4    net/http.(*conn).serve+0x1a4                    net/http/server.go:1817

1 @ 0x40c4f4 0x46c85d 0x9e2165 0x470001
#   0x46c85c    os/signal.signal_recv+0x9c  runtime/sigqueue.go:147
#   0x9e2164    os/signal.loop+0x24     os/signal/signal_unix.go:23

1 @ 0x43a2a5 0x406745 0x40638b 0x9fea89 0x470001
#   0x9fea88    github.com/caddyserver/caddy/v2.trapSignalsCrossPlatform.func1+0x128    github.com/caddyserver/caddy/[email protected]/sigtrap.go:42

1 @ 0x43a2a5 0x406745 0x4063cb 0x9ff099 0x470001
#   0x9ff098    github.com/caddyserver/caddy/v2.trapSignalsPosix.func1+0x138    github.com/caddyserver/caddy/[email protected]/sigtrap_posix.go:34

1 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b643c 0x4b641e 0x58bde5 0x5a84d2 0x5a72a5 0x9f1183 0x67c837 0x71f666 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54                 runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44                 internal/poll/fd_poll_runtime.go:87
#   0x4b643b    internal/poll.(*pollDesc).waitRead+0x1fb                internal/poll/fd_poll_runtime.go:92
#   0x4b641d    internal/poll.(*FD).Accept+0x1dd                    internal/poll/fd_unix.go:394
#   0x58bde4    net.(*netFD).accept+0x44                        net/fd_unix.go:172
#   0x5a84d1    net.(*TCPListener).accept+0x31                      net/tcpsock_posix.go:139
#   0x5a72a4    net.(*TCPListener).Accept+0x64                      net/tcpsock.go:261
#   0x9f1182    github.com/caddyserver/caddy/v2.(*fakeCloseListener).Accept+0x42    github.com/caddyserver/caddy/[email protected]/listeners.go:121
#   0x67c836    crypto/tls.(*listener).Accept+0x36                  crypto/tls/tls.go:67
#   0x71f665    net/http.(*Server).Serve+0x265                      net/http/server.go:2937

1 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b643c 0x4b641e 0x58bde5 0x5a84d2 0x5a72a5 0x9f1183 0x71f666 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54                 runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44                 internal/poll/fd_poll_runtime.go:87
#   0x4b643b    internal/poll.(*pollDesc).waitRead+0x1fb                internal/poll/fd_poll_runtime.go:92
#   0x4b641d    internal/poll.(*FD).Accept+0x1dd                    internal/poll/fd_unix.go:394
#   0x58bde4    net.(*netFD).accept+0x44                        net/fd_unix.go:172
#   0x5a84d1    net.(*TCPListener).accept+0x31                      net/tcpsock_posix.go:139
#   0x5a72a4    net.(*TCPListener).Accept+0x64                      net/tcpsock.go:261
#   0x9f1182    github.com/caddyserver/caddy/v2.(*fakeCloseListener).Accept+0x42    github.com/caddyserver/caddy/[email protected]/listeners.go:121
#   0x71f665    net/http.(*Server).Serve+0x265                      net/http/server.go:2937

1 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b643c 0x4b641e 0x58bde5 0x5a84d2 0x5a72a5 0x9f1183 0x71f666 0x9fcedc 0x470001
#   0x46a654    internal/poll.runtime_pollWait+0x54                 runtime/netpoll.go:220
#   0x4b3844    internal/poll.(*pollDesc).wait+0x44                 internal/poll/fd_poll_runtime.go:87
#   0x4b643b    internal/poll.(*pollDesc).waitRead+0x1fb                internal/poll/fd_poll_runtime.go:92
#   0x4b641d    internal/poll.(*FD).Accept+0x1dd                    internal/poll/fd_unix.go:394
#   0x58bde4    net.(*netFD).accept+0x44                        net/fd_unix.go:172
#   0x5a84d1    net.(*TCPListener).accept+0x31                      net/tcpsock_posix.go:139
#   0x5a72a4    net.(*TCPListener).Accept+0x64                      net/tcpsock.go:261
#   0x9f1182    github.com/caddyserver/caddy/v2.(*fakeCloseListener).Accept+0x42    github.com/caddyserver/caddy/[email protected]/listeners.go:121
#   0x71f665    net/http.(*Server).Serve+0x265                      net/http/server.go:2937
#   0x9fcedb    github.com/caddyserver/caddy/v2.replaceAdmin.func2+0x5b         github.com/caddyserver/caddy/[email protected]/admin.go:261

1 @ 0x43a2a5 0x4496d9 0xa10ef5 0xa163a8 0x1432065 0x439ea9 0x470001
#   0xa10ef4    github.com/caddyserver/caddy/v2/cmd.cmdRun+0x1414   github.com/caddyserver/caddy/[email protected]/cmd/commandfuncs.go:274
#   0xa163a7    github.com/caddyserver/caddy/v2/cmd.Main+0x247      github.com/caddyserver/caddy/[email protected]/cmd/main.go:85
#   0x1432064   main.main+0x24                      command-line-arguments/main.go:37
#   0x439ea8    runtime.main+0x208                  runtime/proc.go:204

1 @ 0x43a2a5 0x44a3e5 0x9ccf45 0x470001
#   0x9ccf44    github.com/caddyserver/certmagic.(*Cache).maintainAssets+0x1e4  github.com/caddyserver/[email protected]/maintain.go:70

1 @ 0x43a2a5 0x44a3e5 0x9d60a7 0x9d5668 0x470001
#   0x9d60a6    github.com/caddyserver/certmagic.(*RingBufferRateLimiter).permit+0xe6   github.com/caddyserver/[email protected]/ratelimiter.go:216
#   0x9d5667    github.com/caddyserver/certmagic.(*RingBufferRateLimiter).loop+0xa7 github.com/caddyserver/[email protected]/ratelimiter.go:89

1 @ 0x43a2a5 0x44a3e5 0xcf0395 0x470001
#   0xcf0394    github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).keepStorageClean.func1+0xf4 github.com/caddyserver/caddy/[email protected]/modules/caddytls/tls.go:397

1 @ 0x46a1fd 0x7df942 0x7df705 0x7dc2d2 0x7ea0a5 0x7eb985 0x71bca4 0x9fe70d 0x71bca4 0x71dbcd 0x9e4da8 0x9e4b70 0x71f2a3 0x71aaad 0x470001
#   0x46a1fc    runtime/pprof.runtime_goroutineProfileWithLabels+0x5c           runtime/mprof.go:716
#   0x7df941    runtime/pprof.writeRuntimeProfile+0xe1                  runtime/pprof/pprof.go:724
#   0x7df704    runtime/pprof.writeGoroutine+0xa4                   runtime/pprof/pprof.go:684
#   0x7dc2d1    runtime/pprof.(*Profile).WriteTo+0x3f1                  runtime/pprof/pprof.go:331
#   0x7ea0a4    net/http/pprof.handler.ServeHTTP+0x384                  net/http/pprof/pprof.go:256
#   0x7eb984    net/http/pprof.Index+0x944                      net/http/pprof/pprof.go:367
#   0x71bca3    net/http.HandlerFunc.ServeHTTP+0x43                 net/http/server.go:2042
#   0x9fe70c    github.com/caddyserver/caddy/v2.instrumentHandlerCounter.func1+0xac github.com/caddyserver/caddy/[email protected]/metrics.go:46
#   0x71bca3    net/http.HandlerFunc.ServeHTTP+0x43                 net/http/server.go:2042
#   0x71dbcc    net/http.(*ServeMux).ServeHTTP+0x1ac                    net/http/server.go:2417
#   0x9e4da7    github.com/caddyserver/caddy/v2.adminHandler.serveHTTP+0xe7     github.com/caddyserver/caddy/[email protected]/admin.go:368
#   0x9e4b6f    github.com/caddyserver/caddy/v2.adminHandler.ServeHTTP+0x64f        github.com/caddyserver/caddy/[email protected]/admin.go:326
#   0x71f2a2    net/http.serverHandler.ServeHTTP+0xa2                   net/http/server.go:2843
#   0x71aaac    net/http.(*conn).serve+0x8ac                        net/http/server.go:1925

1 @ 0x7147e1 0x470001
#   0x7147e0    net/http.(*connReader).backgroundRead+0x0   net/http/server.go:689

And the full goroutine output

With Cloudflare activated I'm seeing the memory staying fairly stable - going up and down, but not consistently rising. It's possible that the "leak" was actually just Caddy's memory usage going up as new connections hit the server, and it would have stabilized at some point, but it just happens that whatever the "stabilization" point would be is too high for the 1GB server to handle.

That being said, I don't totally grasp how or why Caddy would be using near 1GB of memory in order to serve static files, even if they're images and if we're serving a lot of them. Are there timeout defaults that are extremely long, so as connections hit the server, they hang on for longer than necessary? (I'm just spitballing, I don't have a ton of experience w/ diagnosing stuff like this.)

Great, thanks for the details.

There are plenty of busy Caddy instances that serve files with lots of connections and don't have memory issues. Putting Cloudflare in front and having the memory usage become more regulated is somewhat telling. It shows that probably your clients are on bad behavior.

For example, it looks like at any given time there are about 3,000 goroutines serving connections. Not a big deal, but from the full goroutine dump, I see stuff like this:

goroutine 1211831 [IO wait, 241 minutes]:

That goroutine has been in a waiting state for 4 hours. (I found one in there that existed for 750 minutes!) Your clients are likely doing slowloris or are buggy as heck and not closing connections. They might request a file but never read the response, which drains resources.

Caddy doesn't configure these timeouts by default, though, because doing so breaks a lot of legitimate use cases (e.g. serving large files to clients with legitimately slow Internet connections). You can configure these timeouts easily though, you'll just need to use JSON config for now (because server-level properties don't map well to the Caddyfile structure): https://caddyserver.com/docs/json/apps/http/servers/#read_timeout (notice there are 4 different timeouts).

Try configuring those and see if that helps the memory usage go down (without Cloudflare).

Rebooted the server with all timeouts set to 60s, turned off Cloudflare, and memory usage never got about 5%. Seems like that was the cause!

Thank you for the help. I'll close this issue.

Also, one final thought, I saw you linked to an issue regarding server-level vars in the Caddyfile structure. I wholeheartedly support this. In theory my config should be dead simple: two file_servers, two redirects and now the timeout settings. But the JSON structure makes it seem like an immensely complicated setup. It's now 149 lines compared to what should be something like 25.

I'd love if there was a way to set the timeouts without having to get deep into the JSON. To be totally honest, if caddy adapt didn't exist, I don't think I would have had the patience to write it from scratch.

Just my two cents on that issue. Thanks again.

Yeah, it's a wart with the Caddyfile we're well aware of. We just need to do it right, because we'll need to be happy with the solution. I'll take a crack at implementing my proposal at the end of the thread soon, when I have the motivation 馃槄

Was this page helpful?
0 / 5 - 0 ratings