Caddy: Caddy stops listening on port 80 on FreeBSD after some amount of time

Created on 31 Jan 2020  Â·  24Comments  Â·  Source: caddyserver/caddy

Caddy v1.0.4 (via FreeBSD pkg)

I have a Caddyfile which serves several TLS domains, and HTTP->HTTPS redirects for them. After some period of time (exactly how long is unknown to me right now, I last restarted Caddy at 12:48 GMT today, and noticed the issue at 19:54 GMT) Caddy stops listening on port 80. This has happened several times, and results in "connection refused" on the client when making a plaintext HTTP request.

This is not a NAT issue as the server has a public IP, and is not an external firewall issue as I see the SYN received in tcpdump on the webserver. There is a pf firewall running on the server itself, however this is configured to allow port 80 and port 443 from all source IPs, as befits a webserver.

This issue appears to be the same as #2804 however:

  1. I'm running v1.0.4 which should have the referenced PR/commit (#2787) in it
  2. This commit also appears to be specific to OpenBSD so I don't really see how this previous issue could actually have been fixed by the referenced commit. If the two BSDs have shared behaviour in this area then it seems that this commit should be more generic and include more BSD variants?

There don't appear to be any relevant log entries, although I don't know the exact time to look at. I now have monitoring in place so I will know within a couple of minutes the next time Caddy stops responding and I can check the logs next time this happens.

I'm sure you will ask for a sample site - http://www.couragetherapies.co.uk is one such. It should redirect you to the HTTPS version of that page, but at some point it will stop doing so for no obvious reason.

help wanted v1

Most helpful comment

Same thing for me.

Caddy 1.0.4
Many freebsd servers

Stops after random time, from days to hours of use.

"service caddy restart" brings it back to life.

All 24 comments

I also found this thread: https://caddy.community/t/stops-responding-on-port-80-443-setsockopt-error/6793

I decided to try randomly nmap scanning the webserver. After one of the scans, I got a similar "setsockopt" error:

Jan 31 20:54:05 server caddy[63741]: 2020/01/31 20:54:05 set tcp x.x.x.x:80->y.y.y.y:42247: setsockopt: connection reset by peer

And port 80 closed according to sockstat

Also see #2694 (potentially)

Pinging if @jungle-boogie or @didil has any ideas? I don't know enough about BSD here...

For what it's worth, I did the same nmap scan again (I believe it was -sA
that killed it off) after the restart. Nothing happened. So I did a few
more of the more unusual nmap scans, and nothing. As of now, Caddy has
stayed running for HTTP redirects since I opened this issue 3 days ago. So
the problem is not predictable, and is not (so far) reproducible on demand,
however it is still some kind of problem.

On Mon, 3 Feb 2020 at 15:48, Matt Holt notifications@github.com wrote:

Also see #2694 https://github.com/caddyserver/caddy/issues/2694
(potentially)

Pinging if @jungle-boogie https://github.com/jungle-boogie or @didil
https://github.com/didil has any ideas? I don't know enough about BSD
here...

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/caddyserver/caddy/issues/3016?email_source=notifications&email_token=AAVLJE5IOACUES6GD54ASLDRBA4FLA5CNFSM4KONNST2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKUKQ6Y#issuecomment-581478523,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAVLJE5LFF23RSNAAZXCZQ3RBA4FLANCNFSM4KONNSTQ
.

Happened again this morning around 3:39am - no setsockopt error in the logs this time, but as before the port is closed in sockstat so Caddy has just stopped listening on 80 entirely.

I decided to try a graceful USR1 restart, but that didn't work:

Feb  4 06:33:50 server caddy[64978]: 2020/02/04 06:33:50 [INFO] SIGUSR1: Reloading
Feb  4 06:33:50 server caddy[64978]: 2020/02/04 06:33:50 [INFO] Reloading
Feb  4 06:33:50 server caddy[64978]: 2020/02/04 06:33:50 [INFO][cache:0xc0003aeeb0] Started certificate maintenance routine
Feb  4 06:33:52 server caddy[64978]: 2020/02/04 06:33:52 [ERROR] Restart failed: getting old listener file: file tcp [::]:80: use of closed network connection
Feb  4 06:33:52 server caddy[64978]: 2020/02/04 06:33:52 [ERROR] SIGUSR1: starting with listener file descriptors: getting old listener file: file tcp [::]:80: use of closed network connection

So I had to do a complete restart to get port 80 back up.

Happened again just now, this time there was a setsockopt message again

Feb  4 21:58:55 server caddy[81993]: 2020/02/04 21:58:55 set tcp 185.157.234.59:80->51.77.110.48:53465: setsockopt: connection reset by peer

And the same error when I tried a graceful restart:

Feb  4 22:05:40 server caddy[81993]: 2020/02/04 22:05:40 [INFO] SIGUSR1: Reloading
Feb  4 22:05:40 server caddy[81993]: 2020/02/04 22:05:40 [INFO] Reloading
Feb  4 22:05:40 server caddy[81993]: 2020/02/04 22:05:40 [INFO][cache:0xc000228ff0] Started certificate maintenance routine
Feb  4 22:05:40 server caddy[81993]: 2020/02/04 22:05:40 [ERROR] Restart failed: getting old listener file: file tcp [::]:80: use of closed network connection
Feb  4 22:05:40 server caddy[81993]: 2020/02/04 22:05:40 [ERROR] SIGUSR1: starting with listener file descriptors: getting old listener file: file tcp [::]:80: use of closed network connection

So there's nothing consistent about the timing (it happened twice in around 7 hours the day I opened the issue, then not for several days, then twice in around 16 hours today).

Does this also happen with v2?

I also found this thread: https://caddy.community/t/stops-responding-on-port-80-443-setsockopt-error/6793

Hi, I posted that issue. To rule out hardware problems on the server, I swapped my production and staging servers a couple weeks ago. It's been running without incident until earlier tonight when the server stopped responding on 443, and the last entry in the log was

2020/02/04 21:07:14 set tcp [::d8a8:8252]:443->185.153.199.246:53787: setsockopt: invalid argument

I haven't tried Caddy v2 yet.

I haven't tried Caddy 2 yet, as it isn't a drop-in replacemrnt
configuration wise, and it's not (yet) packaged for FreeBSD so I'd have to
re-teach myself how to write an rc.d script for it.

If you think V2 may help then I'm happy to give it a try when I have a few
minutes spare to port the config and the startup script.

On Wed, 5 Feb 2020, 01:32 Matt Holt, notifications@github.com wrote:

Does this also happen with v2?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/caddyserver/caddy/issues/3016?email_source=notifications&email_token=AAVLJE34TYFONJPWFBJAYK3RBIJKDA5CNFSM4KONNST2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKZ2PSI#issuecomment-582199241,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAVLJE67DIYQSXKDKOITE73RBIJKDANCNFSM4KONNSTQ
.

Same thing for me.

Caddy 1.0.4
Many freebsd servers

Stops after random time, from days to hours of use.

"service caddy restart" brings it back to life.

Another data point for what it's worth: My caddy log file goes back to 2017, and the first appearance of the setsockopt log entry isn't until 2019/08/16. (I've been updating via homebrew on macOS, my guess is that would have been 1.0.1, but I don't know for sure.)

Hi @g-a-c,

Does this still occur? Have you tried having caddy run in the foreground and seeing if anything is printed to the standard output when/if caddy dies?

Have you figured out how to make this a predictable crash? It doesn't look like nmap _always_ causes the crash from what you said above. Do too many requests cause a problem? Maybe make a simple curl script to the http -> https site.

Have you tried earlier and/or later versions of caddy?
Check https://github.com/caddyserver/caddy/releases and notice there's 77 releases available!

Sorry, I just have a bunch of questions and no solutions for you. You could try a different OS (like OpenBSD) and see if the crash occurs with your same caddy config.

It hasn't happened for a while, but it has always been an intermittent and sporadic problem - it's one of those that may happen 4 times in a night, or not happen for 4 weeks.

Nmap is _not_ a reliable way to crash it, that only happened the first time and could not be repeated. The websites are not heavily used (double digit hits per week, if that) so I don't think it's a load/concurrency issue. I've run cURL requests against the HTTP redirect and it always works for me - so I have no reliable reproducer.

I haven't tried different versions yet; I don't like running outdated software when there's no obvious end date, and Caddy v2 currently has quite a high barrier for entry in that it would need the configuration replacing. Plus I like the fact that I can geofence the WordPress admin pages to limit them to the UK, and the ipfilter module doesn't seem to be available for Caddy v2 yet?

You're almost certainly right in that this is an OS issue - one of the bugs mentioned above mentioned OpenBSD by name, and I've regretted the choice of FreeBSD on a few occasions (it's been running so long I can't remember why it was FreeBSD in the first place) - but this is a favour for a friend so I never seem to get around to finding the day or so it would need to back everything up and migrate it to Linux...

Heh, would you believe it happened while I was writing that message? I clicked comment, checked my mailbox, to find two notifications that the site had stopped responding. Log file entry shows the usual:

Mar 31 23:14:51 server caddy[53892]: 2020/03/31 23:14:51 set tcp 185.157.234.59:80->208.91.109.18:52957: setsockopt: connection reset by peer

and the ipfilter module doesn't seem to be available for Caddy v2 yet?

Unless you're using a geolocation database, it's not needed -- Caddy 2 has native IP/CIDR matching capabilities (along with many other ways of matching requests!): https://caddyserver.com/docs/modules/http.matchers.remote_ip

I'm using the Maxmind lite DB - I only care about country granularity so I can restrict /wp-admin to the UK (since the author never travels and the last thing I need is some automated WP exploit from some Chinese/US/other botnet)

Hi @g-a-c,

What crazy luck! Matt pinged me nearly two months ago and it's been fine, except for today.

I agree about outdated software. It was only as a way to see if you can have the crash occur. That kind of test could even be done on your LAN, disconnected from the internet. Provided we knew of a way to reproduce this problem.

Are you using a recent version of FreeBSD? I don't know the latest/supported release build, 12.x something.

It's probably worth asking on https://www.reddit.com/r/freebsd/ or a freebsd mailing list for assistance.

Yeah - the timing is....impeccable? I only mentioned it in case someone had used the website in the original post and managed to reproduce it. If that's anyone's IP, do whatever you just did and lets see if it happens again...

This machine isn't on my LAN, I'd have to build a FreeBSD box/VM, which I don't really have the time for for a non-guaranteed repro, and unless I actually start hosting the website out of the house, I'm not likely to pick up whatever random traffic triggers this. Given that the errors always seem to be related to "connection reset by peer" I just tried opening a port 80 connection with netcat then terminating it on my firewall before sending anything. The RST gets received by the server, but it's handled properly. I've tried sending the first line of an HTTP request then terminating, that didn't kill it either. So perhaps sending a RST at some kind of exact time during the process will crash it?

It's the latest version of FreeBSD (12.1), and the latest packaged version of Caddy (1.0.4). I'll definitely try Reddit/mailing lists/forums if you folks don't have enough FreeBSD experience - cos frankly, nor do I with anything on this level.

This happened three times the other night while there was some activity on here, then not for a couple of days.

I'm not a Go/C developer so I can't touch the source of Caddy or FreeBSD, but I am a low-mid-level network engineer so I decided to break out the tool I _do_ know, and started a rolling tcpdump capture in hourly chunks.

There have been two such crashes in the last 2 days, both caused by the same host, and both with the same traffic (I'll upload the PCAPs). The long and short of is that from the point of view of my server running Caddy, for the most recent crash I see:
| Direction | Flags | Relative time (ms) |
|-----------|---------|------------|
| In | SYN | 0 |
| OUT | SYN/ACK | 0.083 |
| In | ACK | 149.2 |
| In | RST/ACK | 0.019 |

Given that the troublesome host is around 150ms RTT away (I'm in the UK, this IP seems to be listed in Delaware), the remote end must have sent its RST/ACK at pretty much the same time as its ACK. I tried doing this exact thing using scapy, but I still couldn't reproduce the issue. I intend to dig out the documentation for the netem Linux module and see if I can abuse that to introduce an artificial delay on my firewall and see if that somehow makes any difference...it is at least maybe useful context. Seems to point more to an OS bug to me at this point though? It's a very small pattern so far, but the tcpdump is still running (there's low-enough traffic and high-enough disk space) so I'll see what future incidents look like.

Do you run any other websites on different hosts with the same version of Caddy?
Might be worth setting up similar tcpdump capture to compare connection attempts and seeing how it's treated for that host.

Linux has a tool called tc for traffic manipulation:
https://bencane.com/2012/07/16/tc-adding-simulated-network-latency-to-your-linux-server/
(some random webblog about it).

I don't, but maybe @conectia could do the same thing if they're still receiving notifications from this issue as they said they were also affected - for me it's just this one box with one personal site (more of an archive, it gets almost zero traffic) and a couple of small business sites for a friend. tc is what I plan to use, I just need to try and get the netem module built for my firewall as it's not included by default then I can add the same 75ms each way and try again.

The capture I used is:

tcpdump -nni vtnet0 -U -w /tmp/http-80-pcaps/trace-%m-%d-%H-%M-%S-%s -W 24 -G 3600 -C 1 '(src host my.ip.address and src port 80) or (dst host my.ip.address and dst port 80)'

edit - I should point out that since this server doesn't host anything except 302 redirects on port 80, and doesn't get a lot of traffic, in my case this doesn't take a lot of disk space. It's been generating less than 1MB/day of pcaps on average. YMMV...

I have Caddy on FreeBSD 12.1 set up to listen to specific sockets, not any wildcard address. Every few days one of its IPv4 sockets just disappears, with nothing in the logs. This started not long after I upgraded from 0.11.5 to 1.0.4 (using pkg, so the FreeBSD ports tree-supplied binary).

For what it's worth: Caddy v2 (build from source without any changes) doesn't seem to be facing the issue. At least it's running over here fairly smoothly for a while now on some FreeBSD 12.1.

Yeah - at some point I put the effort into making my v2 Caddyfile (would still like geoip filtering at some point...) and set it up, and this hasn't alarmed since. So if we're all happy that v1.x is "dead" then this can probably be closed along with the FreeBSD bug I raised.

And I guess I should log in and stop that tcpdump...

I suppose I'm relieved that it hasn't happened (yet) on FreeBSD, although it might still be a matter of time? Nothing that I can think of is different about our networking stack, except by now we're using a newer version of Go, and config reloads are done differently in v2 than they are in v1 (including the recycling of network sockets -- but this only applies through config reloads).

Will close this tentatively for now, and if it definitely reoccurs in v2 we can reopen... thanks for the teamwork with investigating!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

billop picture billop  Â·  3Comments

mikolysz picture mikolysz  Â·  3Comments

mschneider82 picture mschneider82  Â·  3Comments

crvv picture crvv  Â·  3Comments

aeroxy picture aeroxy  Â·  3Comments