Go: net: A lookup performed by a canceled context might affect subsequent lookups

Created on 14 Nov 2017  路  16Comments  路  Source: golang/go

What version of Go are you using (go version)?

go version go1.9.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/tt"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9.2/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/v3/slr36r597rn0rrfzq_my6nv00000gn/T/go-build723069031=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

package main

import (
    "context"
    "fmt"
    "net"
    "testing"
)

func TestLookupIPAddr(t *testing.T) {
    ctx := context.Background()
    canceledCtx, cancel := context.WithCancel(ctx)
    cancel()

    var resolver net.Resolver

    _, err := resolver.LookupIPAddr(canceledCtx, "example.com")
    if fmt.Sprintf("%#v", err) != `&errors.errorString{s:"operation was canceled"}` {
        t.Fatalf("unexpected error: %q", err)
    }

    _, err = resolver.LookupIPAddr(ctx, "example.com")
    if err != nil {
        t.Fatalf("unexpected error: %q", err)
    }
}

(This is a reduced example; it happened to me through the use of net/http.)

What did you expect to see?

The test should produce no output.

Resolving the IP address using a canceled context should fail and the following attempt to resolve the IP address using a non-canceled context should succeed.

What did you see instead?

The test produces the following output:

--- FAIL: TestLookupIPAddr (0.00s)
    lookup_test.go:24: unexpected error: "lookup example.com on 89.233.43.71:53: dial udp 89.233.43.71:53: operation was canceled"
FAIL

That is, the second attempt to resolve the IP address using the non-canceled context returns the result of the previous invocation.

This is happening as parallel lookups for the same hosts only execute once:

https://github.com/golang/go/blob/bb3be403e79731b208c41bd170a6a87642d988da/src/net/lookup.go#L220-L224

... and following the first lookup, this goroutine exists:

1 @ 0x101298e 0x100535e 0x1188023 0x117c53d 0x118d9f6 0x1188982 0x118ae0c 0x115fade 0x105ca41
#   0x1188022   net.cgoLookupIP+0x72                /usr/local/Cellar/go/1.9.2/libexec/src/net/cgo_unix.go:212
#   0x117c53c   net.(*Resolver).lookupIP+0x12c          /usr/local/Cellar/go/1.9.2/libexec/src/net/lookup_unix.go:95
#   0x118d9f5   net.(*Resolver).(net.lookupIP)-fm+0x55      /usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:187
#   0x1188981   net.glob..func10+0x51               /usr/local/Cellar/go/1.9.2/libexec/src/net/hook.go:19
#   0x118ae0b   net.(*Resolver).LookupIPAddr.func1+0x5b     /usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:193
#   0x115fadd   internal/singleflight.(*Group).doCall+0x2d  /usr/local/Cellar/go/1.9.2/libexec/src/internal/singleflight/singleflight.go:93

Sleeping for just a millisecond between the two lookups allow the goroutine to complete and therefore resolves the IP address properly.

(It makes no difference if you re-initialize net.Resolver as the singleflight.Group variable is shared across instances.)

It's easy to fix this specific example (a sequential lookup) by "forgetting" the result if the context is canceled (similar to the solution applied in 77595e462be07b8229f88cbdf947e320bfc7e639 for #8602). I have a change for this and will submit it.

I do wonder if it would be more correct to never pass the context to the inner lookup; otherwise a goroutine that is canceled or times out will be able to affect other goroutines waiting for that same response.

CherryPickApproved FrozenDueToAge release-blocker

All 16 comments

Change https://golang.org/cl/77670 mentions this issue: net: Forget lookups for canceled contexts

@bradfitz why did you reopen this? Does the CL not actually fix the problem?

It was reverted due to build breakages and code review comments.

Change https://golang.org/cl/79715 mentions this issue: net: Forget lookups for canceled contexts

Ping @tt, please take a look at the feedback that @mikioh posted plus the gentle ping from @bradfitz. Thank you.

This issue is very serious for us. We have a lot of canceled contexts and are flooded by this 'operation was canceled' error. For us this is a blocker. Can you please consider this to fix in Go 1.10?

At the moment we don't even have an approved patch for this problem. The earlier attempt to fix it (https://golang.org/cl/77670) had to be rolled back, as described in the comments there. The problem exists in 1.9, so it's not new in 1.10. The simple fix for this--just checking whether the context has been canceled, as is done in https://golang.org/cl/79715 --will re-break #20703. We need a more sophisticated fix.

So I think it is too late to fix this for 1.10. Sorry. I will mark this as a release blocker for 1.11.

This is a hard problem. As I just wrote on CL 79715, I see now that the earlier change (CL 45999) is not right. We should not be memoizing the result of a context cancellation. And simply reverting CL 45999 isn't enough. After all, another DNS lookup could arrive between the context cancellation and the lookupGroup.Forget, could get folded into the existing singleflight lookup, and could see the cancellation.

The problem is that singleflight is designed for tasks that either succeed or fail. It is not designed for tasks that get canceled, where the cancelation of one task should not affect another task using the same singleflight key. I think we need to separate the cancelation of the task from the cancelation of the singleflight operation.

Change https://golang.org/cl/100840 mentions this issue: net: don't let cancelation of a DNS lookup affect another lookup

Reopening for 1.10.1

CL 100840 OK for Go 1.10.1

Change https://golang.org/cl/102787 mentions this issue: [release-branch.go1.10] net: don't let cancelation of a DNS lookup affect another lookup

Change https://golang.org/cl/103215 mentions this issue: [release-branch.go1.9] net: don't let cancelation of a DNS lookup affect another lookup

@andybons is this has been fixed in 1.10.1?

@andybons ama let you finish ;)

@AliGrab yes, please and Go1.10.1 was released last week https://twitter.com/golang/status/979377544196755456, please get it and try again.

@odeke-em Any one can confirm this fix is in the latest golang 1.10.3? how about 1.10.0?

Was this page helpful?
0 / 5 - 0 ratings