Caddy: Bug in active health checker in reverse_proxy using health_port?

Created on 28 Aug 2020  ·  9Comments  ·  Source: caddyserver/caddy

I have this weird problem that I cannot track down, even when building Caddy from source and adding debug information. I'm not sure if it's a bug in Caddy or my misunderstanding, but here goes.

I'm using a reverse proxy with active health checking, where the port is different than for the upstream. I'm getting a 404 from the active health checker:

http.handlers.reverse_proxy.health_checker.active   status code out of tolerances   {"status_code": 404, "host": "localhost:2020"}

This is my Caddyfile:

http://localhost {
  reverse_proxy {
    to localhost:2020

    health_path /health
    health_port 2021
    health_interval 10s
    health_timeout 5s
  }
}

This is the output from upstream when calling http://localhost:2021/health directly in another terminal:

$ http http://localhost:2021/health
HTTP/1.1 200 OK
Content-Length: 2
Content-Type: text/plain; charset=utf-8
Date: Fri, 28 Aug 2020 12:22:06 GMT

OK

Note that the host mentioned in the log line is upstream, not where the health check is happening (this confused me at first).

I've verified (by building Caddy from source) that Caddy is actually trying to connect to the right URL, in the code here: https://github.com/caddyserver/caddy/blob/744d04c2585d50f64cf3d43d139c310a18e78f73/modules/caddyhttp/reverseproxy/healthchecks.go#L226

A fmt.Println of the request right before Do:

&{Method:GET URL:http://localhost:2021/health Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[] Body:<nil> GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:localhost:2021 Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr: RequestURI: TLS:<nil> Cancel:<nil> Response:<nil> ctx:0xc0005a8510}

Can anyone help me understand what's going on here? I'd be happy to help with a PR if the issue is in Caddy.

bug

All 9 comments

Looks like Caddy is actually trying on the wrong port, 2020. When using nc to listen on 2020:

$ nc -l 2020
GET /health HTTP/1.1
Host: localhost:2021
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip

From the Caddy code, I don't understand why though. Note that the Host header has the correct port (2021), but it seems it actually connects on 2020.

I'm guessing it has something to do with the http client the active health checker is using?

Okay, it works as expected when commenting out the custom transport in the http client here:

https://github.com/caddyserver/caddy/blob/744d04c2585d50f64cf3d43d139c310a18e78f73/modules/caddyhttp/reverseproxy/reverseproxy.go#L260

I'm not sure how to proceed, because I don't know how you want to handle the transport settings in the client for the health checker.

Maybe we should provision a transport per unique host? I guess with a map that stores the transports by "host:port" as the key?

I'm not sure what the best approach is, so I'll let @mholt answer.

Thanks a ton @markuswustenberg for digging into the code to figure out what's going on, it's very much appreciated!

@Mohammed90 thanks for the quick turnaround! 😊

@markuswustenberg your pointer made it easy to figure out the fix :D

From the Caddy code, I don't understand why though. Note that the Host header has the correct port (2021), but it seems it actually connects on 2020.

Okay, it works as expected when commenting out the custom transport in the http client here:

This might be because the Transports we use read the dial info from the context.

This is where the health check port is set in the Host header: https://github.com/caddyserver/caddy/blob/494032584436338896cd275d7563bc7b325cd1c4/modules/caddyhttp/reverseproxy/healthchecks.go#L209-L217

And this is where the dialInfo is added to the context; notice that it uses the original Host value (with the unchanged port): https://github.com/caddyserver/caddy/blob/494032584436338896cd275d7563bc7b325cd1c4/modules/caddyhttp/reverseproxy/healthchecks.go#L221-L226

@Mohammed90 Thanks for the PR! However I'm not sure that using a new Transport is quite correct? Don't we just need to adjust the dialInfo to have the right port for the health check?

@Mohammed90 Thanks for the PR! However I'm not sure that using a new Transport is quite correct? Don't we just need to adjust the dialInfo to have the right port for the health check?

Oops, you're right 🤦‍♂️ I'll re-work it. Hold on.

No problem, thank you!! it's definitely not obvious. Sorry I wasn't available to chime in earlier.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ericmdantas picture ericmdantas  ·  3Comments

dafanasiev picture dafanasiev  ·  3Comments

la0wei picture la0wei  ·  3Comments

muhammadmuzzammil1998 picture muhammadmuzzammil1998  ·  3Comments

xfzka picture xfzka  ·  3Comments