Hi folks,
I am running what should be the absolute simplest use-case for Caddy and yet I'm experiencing an on-going memory leak that only lets me run Caddy for 30 minutes at a time.
I am using it as a file_server for two domains, and also handling two redirects from the naked domain to www.
Here is the Caddyfile I am running:
www.mikehearn.com {
file_server {
root /home/mikehearn/sites/mikehearn.com
}
}
mikehearn.com {
redir https://www.mikehearn.com{uri}
}
www.transparenttextures.com {
file_server {
root /home/mikehearn/sites/transparenttextures.com/output
}
}
transparenttextures.com {
redir https://www.transparenttextures.com{uri}
}
I'm running this on a Digital Ocean droplet with 1GB of RAM. When starting Caddy via systemd, it begins using ~1-2% of RAM and then steadily increases until it goes >50% and the server stops responding. If I restart it, the cycle starts anew.
I'm not sure where to begin diagnosing this leak, but I wanted to start the discussion and hopefully someone can point me in the right direction.
Caddy verison: v2.2.1 h1:Q62GWHMtztnvyRU+KPOpw6fNfeCD3SkwH7SfT1Tgt2c=
Installed via: Github release .deb and apt
OS: Ubuntu 20.04.1 LTS
Kernel: 5.4.0-45-generic
I tried running the same config via the Docker image (id e4fd2a84cc27) and the memory leak persists.
Hmm, that sounds unawesome...
Is the server busy? What are the requests like? Does the memory usage grow proportionate to the number of requests? Does it happen if you disconnect the server from any public-facing interfaces and you make the requests instead?
Please open localhost:2019/debug/pprof in your browser and take a look at allocations and goroutines. What do you see? (Full output would be ideal.) You could also download a profile, but I'm kinda clumsy with the tooling; usually seeing goroutines and allocations give enough of a clue for starters.
transparenttextures.com gets a fair amount of traffic, most of it consisting of directly serving images. Before switching to Caddy I was using nginx, and at any given time it was serving between 2-4mbps. Over the course of the month that adds up to about 1TB.
Regarding the RAM increasing as requests grow, I believe the answer is no. The RAM usage was growing too fast to be linear to requests. Caddy would go from restart to maxing the RAM and crashing over the course of about 30-60 minutes, and I reproduced this at various points throughout the day. During those times I believe traffic was stable, no major spikes.
Yesterday evening I put the server behind Cloudflare which seems to have mitigated enough of the traffic to keep it stable for the time being. It's now running consistently using about 35% of the 1GB of RAM on the droplet.
Here are the details from pprof:
goroutine (count 5474)
goroutine profile: total 5580
2759 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65361f 0x65362a 0x47d2e7 0x6de2c9 0x6de27a 0x6deb45 0x6e9c29 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b4890 internal/poll.(*pollDesc).waitRead+0x1b0 internal/poll/fd_poll_runtime.go:92
# 0x4b4872 internal/poll.(*FD).Read+0x192 internal/poll/fd_unix.go:159
# 0x58a86e net.(*netFD).Read+0x4e net/fd_posix.go:55
# 0x59e4ad net.(*conn).Read+0x8d net/net.go:182
# 0x6502e1 crypto/tls.(*atLeastReader).Read+0x61 crypto/tls/conn.go:779
# 0x487890 bytes.(*Buffer).ReadFrom+0xb0 bytes/buffer.go:204
# 0x650532 crypto/tls.(*Conn).readFromUntil+0xf2 crypto/tls/conn.go:801
# 0x64d354 crypto/tls.(*Conn).readRecordOrCCS+0x114 crypto/tls/conn.go:608
# 0x65361e crypto/tls.(*Conn).readRecord+0x15e crypto/tls/conn.go:576
# 0x653629 crypto/tls.(*Conn).Read+0x169 crypto/tls/conn.go:1252
# 0x47d2e6 io.ReadAtLeast+0x86 io/io.go:314
# 0x6de2c8 io.ReadFull+0x88 io/io.go:333
# 0x6de279 net/http.http2readFrameHeader+0x39 net/http/h2_bundle.go:1477
# 0x6deb44 net/http.(*http2Framer).ReadFrame+0xa4 net/http/h2_bundle.go:1735
# 0x6e9c28 net/http.(*http2serverConn).readFrames+0xa8 net/http/h2_bundle.go:4314
2759 @ 0x43a2a5 0x44a3e5 0x6ea83c 0x6e8985 0x73d470 0x71b434 0x470001
# 0x6ea83b net/http.(*http2serverConn).serve+0x59b net/http/h2_bundle.go:4428
# 0x6e8984 net/http.(*http2Server).ServeConn+0x724 net/http/h2_bundle.go:4038
# 0x73d46f net/http.http2ConfigureServer.func1+0xef net/http/h2_bundle.go:3864
# 0x71b433 net/http.(*conn).serve+0x1233 net/http/server.go:1834
23 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65361f 0x65362a 0x714dad 0x4cd3c5 0x4ce11d 0x4ce354 0x691e2c 0x70ef4a 0x70ef79 0x71623a 0x71a905 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b4890 internal/poll.(*pollDesc).waitRead+0x1b0 internal/poll/fd_poll_runtime.go:92
# 0x4b4872 internal/poll.(*FD).Read+0x192 internal/poll/fd_unix.go:159
# 0x58a86e net.(*netFD).Read+0x4e net/fd_posix.go:55
# 0x59e4ad net.(*conn).Read+0x8d net/net.go:182
# 0x6502e1 crypto/tls.(*atLeastReader).Read+0x61 crypto/tls/conn.go:779
# 0x487890 bytes.(*Buffer).ReadFrom+0xb0 bytes/buffer.go:204
# 0x650532 crypto/tls.(*Conn).readFromUntil+0xf2 crypto/tls/conn.go:801
# 0x64d354 crypto/tls.(*Conn).readRecordOrCCS+0x114 crypto/tls/conn.go:608
# 0x65361e crypto/tls.(*Conn).readRecord+0x15e crypto/tls/conn.go:576
# 0x653629 crypto/tls.(*Conn).Read+0x169 crypto/tls/conn.go:1252
# 0x714dac net/http.(*connReader).Read+0x1ac net/http/server.go:798
# 0x4cd3c4 bufio.(*Reader).fill+0x104 bufio/bufio.go:101
# 0x4ce11c bufio.(*Reader).ReadSlice+0x3c bufio/bufio.go:360
# 0x4ce353 bufio.(*Reader).ReadLine+0x33 bufio/bufio.go:389
# 0x691e2b net/textproto.(*Reader).readLineSlice+0x6b net/textproto/reader.go:58
# 0x70ef49 net/textproto.(*Reader).ReadLine+0xa9 net/textproto/reader.go:39
# 0x70ef78 net/http.readRequest+0xd8 net/http/request.go:1012
# 0x716239 net/http.(*conn).readRequest+0x199 net/http/server.go:984
# 0x71a904 net/http.(*conn).serve+0x704 net/http/server.go:1851
14 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65156d 0x651578 0x66a025 0x669985 0x653c69 0x71a3a5 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b4890 internal/poll.(*pollDesc).waitRead+0x1b0 internal/poll/fd_poll_runtime.go:92
# 0x4b4872 internal/poll.(*FD).Read+0x192 internal/poll/fd_unix.go:159
# 0x58a86e net.(*netFD).Read+0x4e net/fd_posix.go:55
# 0x59e4ad net.(*conn).Read+0x8d net/net.go:182
# 0x6502e1 crypto/tls.(*atLeastReader).Read+0x61 crypto/tls/conn.go:779
# 0x487890 bytes.(*Buffer).ReadFrom+0xb0 bytes/buffer.go:204
# 0x650532 crypto/tls.(*Conn).readFromUntil+0xf2 crypto/tls/conn.go:801
# 0x64d354 crypto/tls.(*Conn).readRecordOrCCS+0x114 crypto/tls/conn.go:608
# 0x65156c crypto/tls.(*Conn).readRecord+0x6c crypto/tls/conn.go:576
# 0x651577 crypto/tls.(*Conn).readHandshake+0x77 crypto/tls/conn.go:992
# 0x66a024 crypto/tls.(*Conn).readClientHello+0x44 crypto/tls/handshake_server.go:127
# 0x669984 crypto/tls.(*Conn).serverHandshake+0x44 crypto/tls/handshake_server.go:40
# 0x653c68 crypto/tls.(*Conn).Handshake+0xc8 crypto/tls/conn.go:1362
# 0x71a3a4 net/http.(*conn).serve+0x1a4 net/http/server.go:1817
5 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x714dad 0x4cd3c5 0x4ce11d 0x4ce354 0x691e2c 0x70ef4a 0x70ef79 0x71623a 0x71a905 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b4890 internal/poll.(*pollDesc).waitRead+0x1b0 internal/poll/fd_poll_runtime.go:92
# 0x4b4872 internal/poll.(*FD).Read+0x192 internal/poll/fd_unix.go:159
# 0x58a86e net.(*netFD).Read+0x4e net/fd_posix.go:55
# 0x59e4ad net.(*conn).Read+0x8d net/net.go:182
# 0x714dac net/http.(*connReader).Read+0x1ac net/http/server.go:798
# 0x4cd3c4 bufio.(*Reader).fill+0x104 bufio/bufio.go:101
# 0x4ce11c bufio.(*Reader).ReadSlice+0x3c bufio/bufio.go:360
# 0x4ce353 bufio.(*Reader).ReadLine+0x33 bufio/bufio.go:389
# 0x691e2b net/textproto.(*Reader).readLineSlice+0x6b net/textproto/reader.go:58
# 0x70ef49 net/textproto.(*Reader).ReadLine+0xa9 net/textproto/reader.go:39
# 0x70ef78 net/http.readRequest+0xd8 net/http/request.go:1012
# 0x716239 net/http.(*conn).readRequest+0x199 net/http/server.go:984
# 0x71a904 net/http.(*conn).serve+0x704 net/http/server.go:1851
3 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65361f 0x65362a 0x47d2e7 0x73d57e 0x73d510 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b4890 internal/poll.(*pollDesc).waitRead+0x1b0 internal/poll/fd_poll_runtime.go:92
# 0x4b4872 internal/poll.(*FD).Read+0x192 internal/poll/fd_unix.go:159
# 0x58a86e net.(*netFD).Read+0x4e net/fd_posix.go:55
# 0x59e4ad net.(*conn).Read+0x8d net/net.go:182
# 0x6502e1 crypto/tls.(*atLeastReader).Read+0x61 crypto/tls/conn.go:779
# 0x487890 bytes.(*Buffer).ReadFrom+0xb0 bytes/buffer.go:204
# 0x650532 crypto/tls.(*Conn).readFromUntil+0xf2 crypto/tls/conn.go:801
# 0x64d354 crypto/tls.(*Conn).readRecordOrCCS+0x114 crypto/tls/conn.go:608
# 0x65361e crypto/tls.(*Conn).readRecord+0x15e crypto/tls/conn.go:576
# 0x653629 crypto/tls.(*Conn).Read+0x169 crypto/tls/conn.go:1252
# 0x47d2e6 io.ReadAtLeast+0x86 io/io.go:314
# 0x73d57d io.ReadFull+0xbd io/io.go:333
# 0x73d50f net/http.(*http2serverConn).readPreface.func1+0x4f net/http/h2_bundle.go:4536
3 @ 0x43a2a5 0x44a3e5 0x6eb6b1 0x6ea585 0x6e8985 0x73d470 0x71b434 0x470001
# 0x6eb6b0 net/http.(*http2serverConn).readPreface+0x150 net/http/h2_bundle.go:4546
# 0x6ea584 net/http.(*http2serverConn).serve+0x2e4 net/http/h2_bundle.go:4404
# 0x6e8984 net/http.(*http2Server).ServeConn+0x724 net/http/h2_bundle.go:4038
# 0x73d46f net/http.http2ConfigureServer.func1+0xef net/http/h2_bundle.go:3864
# 0x71b433 net/http.(*conn).serve+0x1233 net/http/server.go:1834
2 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b4891 0x4b4873 0x58a86f 0x59e4ae 0x6502e2 0x487891 0x650533 0x64d355 0x65156d 0x651578 0x67477a 0x66f7c7 0x6699fc 0x653c69 0x71a3a5 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b4890 internal/poll.(*pollDesc).waitRead+0x1b0 internal/poll/fd_poll_runtime.go:92
# 0x4b4872 internal/poll.(*FD).Read+0x192 internal/poll/fd_unix.go:159
# 0x58a86e net.(*netFD).Read+0x4e net/fd_posix.go:55
# 0x59e4ad net.(*conn).Read+0x8d net/net.go:182
# 0x6502e1 crypto/tls.(*atLeastReader).Read+0x61 crypto/tls/conn.go:779
# 0x487890 bytes.(*Buffer).ReadFrom+0xb0 bytes/buffer.go:204
# 0x650532 crypto/tls.(*Conn).readFromUntil+0xf2 crypto/tls/conn.go:801
# 0x64d354 crypto/tls.(*Conn).readRecordOrCCS+0x114 crypto/tls/conn.go:608
# 0x65156c crypto/tls.(*Conn).readRecord+0x6c crypto/tls/conn.go:576
# 0x651577 crypto/tls.(*Conn).readHandshake+0x77 crypto/tls/conn.go:992
# 0x674779 crypto/tls.(*serverHandshakeStateTLS13).readClientFinished+0x39 crypto/tls/handshake_server_tls13.go:840
# 0x66f7c6 crypto/tls.(*serverHandshakeStateTLS13).handshake+0x146 crypto/tls/handshake_server_tls13.go:74
# 0x6699fb crypto/tls.(*Conn).serverHandshake+0xbb crypto/tls/handshake_server.go:50
# 0x653c68 crypto/tls.(*Conn).Handshake+0xc8 crypto/tls/conn.go:1362
# 0x71a3a4 net/http.(*conn).serve+0x1a4 net/http/server.go:1817
1 @ 0x40c4f4 0x46c85d 0x9e2165 0x470001
# 0x46c85c os/signal.signal_recv+0x9c runtime/sigqueue.go:147
# 0x9e2164 os/signal.loop+0x24 os/signal/signal_unix.go:23
1 @ 0x43a2a5 0x406745 0x40638b 0x9fea89 0x470001
# 0x9fea88 github.com/caddyserver/caddy/v2.trapSignalsCrossPlatform.func1+0x128 github.com/caddyserver/caddy/[email protected]/sigtrap.go:42
1 @ 0x43a2a5 0x406745 0x4063cb 0x9ff099 0x470001
# 0x9ff098 github.com/caddyserver/caddy/v2.trapSignalsPosix.func1+0x138 github.com/caddyserver/caddy/[email protected]/sigtrap_posix.go:34
1 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b643c 0x4b641e 0x58bde5 0x5a84d2 0x5a72a5 0x9f1183 0x67c837 0x71f666 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b643b internal/poll.(*pollDesc).waitRead+0x1fb internal/poll/fd_poll_runtime.go:92
# 0x4b641d internal/poll.(*FD).Accept+0x1dd internal/poll/fd_unix.go:394
# 0x58bde4 net.(*netFD).accept+0x44 net/fd_unix.go:172
# 0x5a84d1 net.(*TCPListener).accept+0x31 net/tcpsock_posix.go:139
# 0x5a72a4 net.(*TCPListener).Accept+0x64 net/tcpsock.go:261
# 0x9f1182 github.com/caddyserver/caddy/v2.(*fakeCloseListener).Accept+0x42 github.com/caddyserver/caddy/[email protected]/listeners.go:121
# 0x67c836 crypto/tls.(*listener).Accept+0x36 crypto/tls/tls.go:67
# 0x71f665 net/http.(*Server).Serve+0x265 net/http/server.go:2937
1 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b643c 0x4b641e 0x58bde5 0x5a84d2 0x5a72a5 0x9f1183 0x71f666 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b643b internal/poll.(*pollDesc).waitRead+0x1fb internal/poll/fd_poll_runtime.go:92
# 0x4b641d internal/poll.(*FD).Accept+0x1dd internal/poll/fd_unix.go:394
# 0x58bde4 net.(*netFD).accept+0x44 net/fd_unix.go:172
# 0x5a84d1 net.(*TCPListener).accept+0x31 net/tcpsock_posix.go:139
# 0x5a72a4 net.(*TCPListener).Accept+0x64 net/tcpsock.go:261
# 0x9f1182 github.com/caddyserver/caddy/v2.(*fakeCloseListener).Accept+0x42 github.com/caddyserver/caddy/[email protected]/listeners.go:121
# 0x71f665 net/http.(*Server).Serve+0x265 net/http/server.go:2937
1 @ 0x43a2a5 0x432bbb 0x46a655 0x4b3845 0x4b643c 0x4b641e 0x58bde5 0x5a84d2 0x5a72a5 0x9f1183 0x71f666 0x9fcedc 0x470001
# 0x46a654 internal/poll.runtime_pollWait+0x54 runtime/netpoll.go:220
# 0x4b3844 internal/poll.(*pollDesc).wait+0x44 internal/poll/fd_poll_runtime.go:87
# 0x4b643b internal/poll.(*pollDesc).waitRead+0x1fb internal/poll/fd_poll_runtime.go:92
# 0x4b641d internal/poll.(*FD).Accept+0x1dd internal/poll/fd_unix.go:394
# 0x58bde4 net.(*netFD).accept+0x44 net/fd_unix.go:172
# 0x5a84d1 net.(*TCPListener).accept+0x31 net/tcpsock_posix.go:139
# 0x5a72a4 net.(*TCPListener).Accept+0x64 net/tcpsock.go:261
# 0x9f1182 github.com/caddyserver/caddy/v2.(*fakeCloseListener).Accept+0x42 github.com/caddyserver/caddy/[email protected]/listeners.go:121
# 0x71f665 net/http.(*Server).Serve+0x265 net/http/server.go:2937
# 0x9fcedb github.com/caddyserver/caddy/v2.replaceAdmin.func2+0x5b github.com/caddyserver/caddy/[email protected]/admin.go:261
1 @ 0x43a2a5 0x4496d9 0xa10ef5 0xa163a8 0x1432065 0x439ea9 0x470001
# 0xa10ef4 github.com/caddyserver/caddy/v2/cmd.cmdRun+0x1414 github.com/caddyserver/caddy/[email protected]/cmd/commandfuncs.go:274
# 0xa163a7 github.com/caddyserver/caddy/v2/cmd.Main+0x247 github.com/caddyserver/caddy/[email protected]/cmd/main.go:85
# 0x1432064 main.main+0x24 command-line-arguments/main.go:37
# 0x439ea8 runtime.main+0x208 runtime/proc.go:204
1 @ 0x43a2a5 0x44a3e5 0x9ccf45 0x470001
# 0x9ccf44 github.com/caddyserver/certmagic.(*Cache).maintainAssets+0x1e4 github.com/caddyserver/[email protected]/maintain.go:70
1 @ 0x43a2a5 0x44a3e5 0x9d60a7 0x9d5668 0x470001
# 0x9d60a6 github.com/caddyserver/certmagic.(*RingBufferRateLimiter).permit+0xe6 github.com/caddyserver/[email protected]/ratelimiter.go:216
# 0x9d5667 github.com/caddyserver/certmagic.(*RingBufferRateLimiter).loop+0xa7 github.com/caddyserver/[email protected]/ratelimiter.go:89
1 @ 0x43a2a5 0x44a3e5 0xcf0395 0x470001
# 0xcf0394 github.com/caddyserver/caddy/v2/modules/caddytls.(*TLS).keepStorageClean.func1+0xf4 github.com/caddyserver/caddy/[email protected]/modules/caddytls/tls.go:397
1 @ 0x46a1fd 0x7df942 0x7df705 0x7dc2d2 0x7ea0a5 0x7eb985 0x71bca4 0x9fe70d 0x71bca4 0x71dbcd 0x9e4da8 0x9e4b70 0x71f2a3 0x71aaad 0x470001
# 0x46a1fc runtime/pprof.runtime_goroutineProfileWithLabels+0x5c runtime/mprof.go:716
# 0x7df941 runtime/pprof.writeRuntimeProfile+0xe1 runtime/pprof/pprof.go:724
# 0x7df704 runtime/pprof.writeGoroutine+0xa4 runtime/pprof/pprof.go:684
# 0x7dc2d1 runtime/pprof.(*Profile).WriteTo+0x3f1 runtime/pprof/pprof.go:331
# 0x7ea0a4 net/http/pprof.handler.ServeHTTP+0x384 net/http/pprof/pprof.go:256
# 0x7eb984 net/http/pprof.Index+0x944 net/http/pprof/pprof.go:367
# 0x71bca3 net/http.HandlerFunc.ServeHTTP+0x43 net/http/server.go:2042
# 0x9fe70c github.com/caddyserver/caddy/v2.instrumentHandlerCounter.func1+0xac github.com/caddyserver/caddy/[email protected]/metrics.go:46
# 0x71bca3 net/http.HandlerFunc.ServeHTTP+0x43 net/http/server.go:2042
# 0x71dbcc net/http.(*ServeMux).ServeHTTP+0x1ac net/http/server.go:2417
# 0x9e4da7 github.com/caddyserver/caddy/v2.adminHandler.serveHTTP+0xe7 github.com/caddyserver/caddy/[email protected]/admin.go:368
# 0x9e4b6f github.com/caddyserver/caddy/v2.adminHandler.ServeHTTP+0x64f github.com/caddyserver/caddy/[email protected]/admin.go:326
# 0x71f2a2 net/http.serverHandler.ServeHTTP+0xa2 net/http/server.go:2843
# 0x71aaac net/http.(*conn).serve+0x8ac net/http/server.go:1925
1 @ 0x7147e1 0x470001
# 0x7147e0 net/http.(*connReader).backgroundRead+0x0 net/http/server.go:689
And the full goroutine output
With Cloudflare activated I'm seeing the memory staying fairly stable - going up and down, but not consistently rising. It's possible that the "leak" was actually just Caddy's memory usage going up as new connections hit the server, and it would have stabilized at some point, but it just happens that whatever the "stabilization" point would be is too high for the 1GB server to handle.
That being said, I don't totally grasp how or why Caddy would be using near 1GB of memory in order to serve static files, even if they're images and if we're serving a lot of them. Are there timeout defaults that are extremely long, so as connections hit the server, they hang on for longer than necessary? (I'm just spitballing, I don't have a ton of experience w/ diagnosing stuff like this.)
Great, thanks for the details.
There are plenty of busy Caddy instances that serve files with lots of connections and don't have memory issues. Putting Cloudflare in front and having the memory usage become more regulated is somewhat telling. It shows that probably your clients are on bad behavior.
For example, it looks like at any given time there are about 3,000 goroutines serving connections. Not a big deal, but from the full goroutine dump, I see stuff like this:
goroutine 1211831 [IO wait, 241 minutes]:
That goroutine has been in a waiting state for 4 hours. (I found one in there that existed for 750 minutes!) Your clients are likely doing slowloris or are buggy as heck and not closing connections. They might request a file but never read the response, which drains resources.
Caddy doesn't configure these timeouts by default, though, because doing so breaks a lot of legitimate use cases (e.g. serving large files to clients with legitimately slow Internet connections). You can configure these timeouts easily though, you'll just need to use JSON config for now (because server-level properties don't map well to the Caddyfile structure): https://caddyserver.com/docs/json/apps/http/servers/#read_timeout (notice there are 4 different timeouts).
Try configuring those and see if that helps the memory usage go down (without Cloudflare).
Rebooted the server with all timeouts set to 60s, turned off Cloudflare, and memory usage never got about 5%. Seems like that was the cause!
Thank you for the help. I'll close this issue.
Also, one final thought, I saw you linked to an issue regarding server-level vars in the Caddyfile structure. I wholeheartedly support this. In theory my config should be dead simple: two file_servers, two redirects and now the timeout settings. But the JSON structure makes it seem like an immensely complicated setup. It's now 149 lines compared to what should be something like 25.
I'd love if there was a way to set the timeouts without having to get deep into the JSON. To be totally honest, if caddy adapt didn't exist, I don't think I would have had the patience to write it from scratch.
Just my two cents on that issue. Thanks again.
Yeah, it's a wart with the Caddyfile we're well aware of. We just need to do it right, because we'll need to be happy with the solution. I'll take a crack at implementing my proposal at the end of the thread soon, when I have the motivation 馃槄
Most helpful comment
Rebooted the server with all timeouts set to
60s, turned off Cloudflare, and memory usage never got about 5%. Seems like that was the cause!Thank you for the help. I'll close this issue.
Also, one final thought, I saw you linked to an issue regarding server-level vars in the Caddyfile structure. I wholeheartedly support this. In theory my config should be dead simple: two file_servers, two redirects and now the timeout settings. But the JSON structure makes it seem like an immensely complicated setup. It's now 149 lines compared to what should be something like 25.
I'd love if there was a way to set the timeouts without having to get deep into the JSON. To be totally honest, if
caddy adaptdidn't exist, I don't think I would have had the patience to write it from scratch.Just my two cents on that issue. Thanks again.