Caddy version: v2.2.1
The feature introduced in https://github.com/caddyserver/caddy/pull/3479 is not using the specified addresses for upstream DNS resolution.
Here are is an error log that is occurring every minute in my setup:
{"level":"error","ts":1602812552.1035602,"logger":"http.log.error.log0","msg":"making dial info: lookup app.service.consul on 169.254.169.253:53: dial udp 169.254.169.253:53: operation was canceled","request":{"remote_addr":"24.88.156.180:4608","proto":"HTTP/2.0","method":"OPTIONS","host":"my.app.site","uri":"/registry/user/ribbon","headers":{"Accept-Language":["en-US,en;q=0.5"],"Accept-Encoding":["gzip, deflate, br"],"User-Agent":["Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0"],"Accept":["*/*"],"Access-Control-Request-Method":["GET"],"Access-Control-Request-Headers":["authorization,x-client-app"],"Referer":["https://app2.site/computers/29a8240d-6fd2-4464-a187-3227276b398f"],"Origin":["https://app2.site"],"Dnt":["1"],"Te":["trailers"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.app.site"}},"duration":0.000193545,"status":502,"err_id":"3jv1hi6bp","err_trace":"reverseproxy.(*Handler).ServeHTTP (reverseproxy.go:388)"}
The 169.254.169.253 IP is only present in my /etc/resolv.conf and should never be used after I have specified the resolvers in the transport config.
My config:
{
"admin": {
"disabled": true
},
"apps": {
"http": {
"http_port": 80,
"servers": {
"srv0": {
"automatic_https": {
"disable": true,
"disable_redirects": true
},
"listen": [
":80"
],
"logs": {
"default_logger_name": "log0"
},
"routes": [
{
"handle": [
{
"handler": "subroute",
"routes": [
{
"group": "app",
"handle": [
{
"handler": "reverse_proxy",
"transport": {
"protocol": "http",
"resolver": {
"addresses": [
"tcp/127.0.0.1:53"
]
}
},
"upstreams": [
{
"lookup_srv": "app.service.consul"
}
]
}
],
"match": [
{
"path": [
"/registry/investigation*"
]
}
]
},
{
"group": "app",
"handle": [
{
"handler": "reverse_proxy",
"transport": {
"protocol": "http",
"resolver": {
"addresses": [
"tcp/127.0.0.1:53"
]
}
},
"upstreams": [
{
"lookup_srv": "app2.service.consul"
}
]
}
],
"match": [
{
"path": [
"/registry*"
]
}
]
}
]
}
],
"match": [
{
"host": [
"my.app.site"
]
}
],
"terminal": true
}
]
}
}
}
}
}
The dial call fails to 169.254.169.253 because that DNS will fail to resolve any .service.consul query.
Note: There's no way I can disable the error logs to leak sensitive information being sent via HTTP Headers, since Caddy is logging the full context of the call because of the DNS query failure. Also there's no way to filter these.
That's because the resolver is specifically for the http transport dialer. The SRV dialer uses net.DefaultResolver always (currently).
The transports and upstreams are separate from eachother, the chosen upstreams are passed through the transport. There would probably need to be a separate resolver config for upstreams, but that seems annoying to configure.
/cc @Mohammed90
But its there a way for me to not log those in stderr?
Not really. You'll need to filter the logs out yourself with some other tool. Pipe the Caddy logs through jq or something, excluding any lines that match some query, and output the resulting logs somewhere else.
Ok, I think these errors are related to context canceled requests and are bubbling up the stack and canceling the dns lookups as well but its just an assumption from my part.
Regarding the logs, there's no way we can pipe this out to jq with our current stack without a major restructuring.
@Mohammed90 I think what we should do is move the Resolver config up to the reverse_proxy handler instead, then push it down to the transport on provision (try to type assert for HTTPTransport, if so, set Resolver; or we can set up a NeedsResolver type which has a SetResolver function which HTTPTransport can implement, to generalize), and also pass it to fillDialInfo to be used for SRV. We should also deprecate the transport resolver config and warn that it should be moved to the handler.
Bit more insight on this issue... I logged in all my 3 nodes and manually edited my /etc/resolv.conf to point to the DNS server I wanted. The issue persists... so, I believe this issue is related to when requests are canceled on the client side and the server just cancels the context of the request, which ends up cancelling the dns lookup due to _, records, err := net.DefaultResolver.LookupSRV(r.Context(), "", "", srvName). The DNS server IP in the error logs just seems to be always the last entry of the resolv.conf file.
I have no reports of normal requests failing at all and I can't see any issues in my unbound setup.
I think we can use this issue to have the resolvers set for both dial and lookup_srv... meanwhile I'll track the error reported here in the appropriate issue here
Is the lookup_srv resolvers still a problem? Do we need to keep the issue open, and if so, what's the current status? Sorry, it looks like this issue diverged into two separate things. It looks like https://github.com/caddyserver/caddy/pull/3816 addresses it?
Hey @mholt... so, I think this issue is still valid since @francislavoie already confirmed that the custom resolver is only being applied to the http transport dialer, not the LookupSRV. IMO, both dial and lookup_srv directives should use the specified resolver (if any)