go version)?go version go1.9.2 darwin/amd64
Yes.
go env)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/tt"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9.2/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/v3/slr36r597rn0rrfzq_my6nv00000gn/T/go-build723069031=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
package main
import (
"context"
"fmt"
"net"
"testing"
)
func TestLookupIPAddr(t *testing.T) {
ctx := context.Background()
canceledCtx, cancel := context.WithCancel(ctx)
cancel()
var resolver net.Resolver
_, err := resolver.LookupIPAddr(canceledCtx, "example.com")
if fmt.Sprintf("%#v", err) != `&errors.errorString{s:"operation was canceled"}` {
t.Fatalf("unexpected error: %q", err)
}
_, err = resolver.LookupIPAddr(ctx, "example.com")
if err != nil {
t.Fatalf("unexpected error: %q", err)
}
}
(This is a reduced example; it happened to me through the use of net/http.)
The test should produce no output.
Resolving the IP address using a canceled context should fail and the following attempt to resolve the IP address using a non-canceled context should succeed.
The test produces the following output:
--- FAIL: TestLookupIPAddr (0.00s)
lookup_test.go:24: unexpected error: "lookup example.com on 89.233.43.71:53: dial udp 89.233.43.71:53: operation was canceled"
FAIL
That is, the second attempt to resolve the IP address using the non-canceled context returns the result of the previous invocation.
This is happening as parallel lookups for the same hosts only execute once:
https://github.com/golang/go/blob/bb3be403e79731b208c41bd170a6a87642d988da/src/net/lookup.go#L220-L224
... and following the first lookup, this goroutine exists:
1 @ 0x101298e 0x100535e 0x1188023 0x117c53d 0x118d9f6 0x1188982 0x118ae0c 0x115fade 0x105ca41
# 0x1188022 net.cgoLookupIP+0x72 /usr/local/Cellar/go/1.9.2/libexec/src/net/cgo_unix.go:212
# 0x117c53c net.(*Resolver).lookupIP+0x12c /usr/local/Cellar/go/1.9.2/libexec/src/net/lookup_unix.go:95
# 0x118d9f5 net.(*Resolver).(net.lookupIP)-fm+0x55 /usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:187
# 0x1188981 net.glob..func10+0x51 /usr/local/Cellar/go/1.9.2/libexec/src/net/hook.go:19
# 0x118ae0b net.(*Resolver).LookupIPAddr.func1+0x5b /usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:193
# 0x115fadd internal/singleflight.(*Group).doCall+0x2d /usr/local/Cellar/go/1.9.2/libexec/src/internal/singleflight/singleflight.go:93
Sleeping for just a millisecond between the two lookups allow the goroutine to complete and therefore resolves the IP address properly.
(It makes no difference if you re-initialize net.Resolver as the singleflight.Group variable is shared across instances.)
It's easy to fix this specific example (a sequential lookup) by "forgetting" the result if the context is canceled (similar to the solution applied in 77595e462be07b8229f88cbdf947e320bfc7e639 for #8602). I have a change for this and will submit it.
I do wonder if it would be more correct to never pass the context to the inner lookup; otherwise a goroutine that is canceled or times out will be able to affect other goroutines waiting for that same response.
Change https://golang.org/cl/77670 mentions this issue: net: Forget lookups for canceled contexts
@bradfitz why did you reopen this? Does the CL not actually fix the problem?
It was reverted due to build breakages and code review comments.
Change https://golang.org/cl/79715 mentions this issue: net: Forget lookups for canceled contexts
Ping @tt, please take a look at the feedback that @mikioh posted plus the gentle ping from @bradfitz. Thank you.
This issue is very serious for us. We have a lot of canceled contexts and are flooded by this 'operation was canceled' error. For us this is a blocker. Can you please consider this to fix in Go 1.10?
At the moment we don't even have an approved patch for this problem. The earlier attempt to fix it (https://golang.org/cl/77670) had to be rolled back, as described in the comments there. The problem exists in 1.9, so it's not new in 1.10. The simple fix for this--just checking whether the context has been canceled, as is done in https://golang.org/cl/79715 --will re-break #20703. We need a more sophisticated fix.
So I think it is too late to fix this for 1.10. Sorry. I will mark this as a release blocker for 1.11.
This is a hard problem. As I just wrote on CL 79715, I see now that the earlier change (CL 45999) is not right. We should not be memoizing the result of a context cancellation. And simply reverting CL 45999 isn't enough. After all, another DNS lookup could arrive between the context cancellation and the lookupGroup.Forget, could get folded into the existing singleflight lookup, and could see the cancellation.
The problem is that singleflight is designed for tasks that either succeed or fail. It is not designed for tasks that get canceled, where the cancelation of one task should not affect another task using the same singleflight key. I think we need to separate the cancelation of the task from the cancelation of the singleflight operation.
Change https://golang.org/cl/100840 mentions this issue: net: don't let cancelation of a DNS lookup affect another lookup
Reopening for 1.10.1
CL 100840 OK for Go 1.10.1
Change https://golang.org/cl/102787 mentions this issue: [release-branch.go1.10] net: don't let cancelation of a DNS lookup affect another lookup
Change https://golang.org/cl/103215 mentions this issue: [release-branch.go1.9] net: don't let cancelation of a DNS lookup affect another lookup
@andybons is this has been fixed in 1.10.1?
@andybons ama let you finish ;)
@AliGrab yes, please and Go1.10.1 was released last week https://twitter.com/golang/status/979377544196755456, please get it and try again.
@odeke-em Any one can confirm this fix is in the latest golang 1.10.3? how about 1.10.0?