I've come across this problem when using one of the DNS-to-HTTPS proxies that use the Cloudflare's 1.1.1.1 service. Every time the PC came out of hibernation the DNS requests kept timing out until I restarted the proxy.
To illustrate the issue I wrote a simple client:
package main
import (
"fmt"
"io"
"io/ioutil"
"net/http"
"time"
)
func main() {
client := &http.Client{
Timeout: 1 * time.Second,
}
for {
resp, err := client.Get("https://1.1.1.1")
if err != nil {
fmt.Printf("Error: %v\n", err)
} else {
io.Copy(ioutil.Discard, resp.Body)
resp.Body.Close()
fmt.Println("Success")
time.Sleep(1 * time.Second)
}
}
}
I started the client and then after a few seconds I simulated a NAT timeout by creating a firewall rule that blocked outbound traffic on that connection:
# netstat -anp | grep 1.1.1.1
...
tcp 0 0 192.168.0.39:47782 1.1.1.1:443 ESTABLISHED 23236/t
# iptables -I OUTPUT -p tcp -d 1.1.1.1 --dport 443 --sport 47782 -j DROP
If I run it with GODEBUG=http2client=0 I get one failed request after which the connection gets killed and subsequent requests cause a new one to be opened after which everything continues to work:
...
Success
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Success
Success
Success
...
Success
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
...
go version go1.12 linux/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/dop/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/dop/workspace"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
GOROOT/bin/go version: go version go1.12 linux/amd64
GOROOT/bin/go tool compile -V: compile version go1.12
uname -sr: Linux 4.4.0-142-generic
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
/lib/x86_64-linux-gnu/libc.so.6: GNU C Library (Ubuntu EGLIBC 2.19-0ubuntu6.14) stable release version 2.19, by Roland McGrath et al.
gdb --version: GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
I've been tracking down a similar looking issue in production tonight, where a server in a load balancer pool went away. Many Go HTTP/2 clients did not reconnect to the pool, continuously reporting instead:
net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I can reproduce the errors described by @dop251 and that I observed locally by running the testcase and disrupting the network connection to 1.1.1.1, such as:
I've expanded the testcase code to reduce the Dialer and Transport timeouts to below the Client timeout, expecting Go to recover from the situation, but this resulted in the same behavior.
Running with GODEBUG=http2debug=2:
2019/12/05 23:53:01 http2: Transport encoding header ":authority" = "1.1.1.1"
2019/12/05 23:53:01 http2: Transport encoding header ":method" = "GET"
2019/12/05 23:53:01 http2: Transport encoding header ":path" = "/"
2019/12/05 23:53:01 http2: Transport encoding header ":scheme" = "https"
2019/12/05 23:53:01 http2: Transport encoding header "accept-encoding" = "gzip"
2019/12/05 23:53:01 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2019/12/05 23:53:01 http2: Framer 0xc000316540: wrote HEADERS flags=END_STREAM|END_HEADERS stream=93 len=6
2019/12/05 23:53:11 http2: Framer 0xc000316540: wrote RST_STREAM stream=93 len=4 ErrCode=CANCEL
2019/12/05 23:53:11 RoundTrip failure: net/http: request canceled
Error: Get https://1.1.1.1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
The only timeout that seemed to change any behavior was ResponseHeaderTimeout, changing which error is received.
The client does eventually recover when the kernel marks the TCP connection as timed-out (that is, when tcp_retries2 expires), of which the defaults on Linux are very long. I'd expect a similar level and speed of recovery with HTTP/2 as Go provides with HTTP/1.
/cc @bradfitz
Is there any update on this?
Most helpful comment
I've been tracking down a similar looking issue in production tonight, where a server in a load balancer pool went away. Many Go HTTP/2 clients did not reconnect to the pool, continuously reporting instead:
I can reproduce the errors described by @dop251 and that I observed locally by running the testcase and disrupting the network connection to 1.1.1.1, such as:
I've expanded the testcase code to reduce the Dialer and Transport timeouts to below the Client timeout, expecting Go to recover from the situation, but this resulted in the same behavior.
Running with
GODEBUG=http2debug=2:The only timeout that seemed to change any behavior was
ResponseHeaderTimeout, changing which error is received.The client does eventually recover when the kernel marks the TCP connection as timed-out (that is, when tcp_retries2 expires), of which the defaults on Linux are very long. I'd expect a similar level and speed of recovery with HTTP/2 as Go provides with HTTP/1.
/cc @bradfitz