Go: context: define that Err() == nil before Done is closed

Created on 5 Apr 2017  Â·  21Comments  Â·  Source: golang/go

Currently, context.emptyContext (the type of context.Background() and context.TODO()) has an Err() of nil. This is fine, since the context also has nil Done() and so the value of Err() is unspecified.

However, much code ends up erroneously expecting Err() to be nil before Done() is closed (if ever). If emptyContext were to give a non-nil Err(), which is permitted in the interface definition (since, again, it is unspecified), then this incorrect code would likely be caught more frequently. Since there is no requirement that Err() even be context.Canceled or context.DeadlineExceeded in this case, we could make emptyContext.Err() return something like fmt.Errorf("checking %s.Err() without checking %s.Done() first", ctx, ctx) to helpfully inform users of the mistake they are making.

This would possibly break existing code, but only code that was already incorrect. If necessary, an alternate context could be provided which preserves the current behavior for people unwilling or unable to correct their code.

FrozenDueToAge Proposal Proposal-Accepted

Most helpful comment

Over on CL 40291, @sajmani and @ianlancetaylor were discussing ctx.Err() check as being racy. Moving discussion here.

At best the check err := ctx.Err(); err != nil is only half-racy: if err != nil then you know that's not going to change. If err == nil, then it might change the next moment, but that's OK and fundamental to the fact that contexts are for notification of cancellation. There's always a propagation delay between the top-level request for cancellation and the pieces of the computation noticing that request. That's not really a race. I think it's perfectly fine to do such a check before kicking off some expensive computation, or even at intervals during such a computation.

As for how to do that, I guess we could go to some trouble that finds places where code says:

err := ctx.Err()

and helpfully remind people that they should instead be writing:

var err error
select {
case <-ctx.Done():
    err = ctx.Err()
default:
   // ok
}

That doesn't seem like an improvement though. Better to redefine ctx.Err() to match existing usage, implementations, and expectations.

All 21 comments

CC @Sajmani

Maybe instead we should define that until Done, Err returns nil. Are there any implementations where that's not true?

Put a different way, what problem are you trying to solve? The solution would cause pain for (some) people. What's the benefit?

That would probably be better, since it appears to be a common mis-idiom, but I think that would also more explicitly break the compatibility promise since valid existing Context implementations would become invalid.

The problem I'm trying to solve is that I was working with a dependency which was cancelling early
because the context I was working with had non-nil Err() prior to Done() being closed, and I wasted a fair bit of time debugging it.

Yes, my main concern with changing the spec in this way is invalidating
existing implementations.
On Wed, Apr 5, 2017 at 5:26 PM Alexis Hunt notifications@github.com wrote:

That would probably be better, since it appears to be a common mis-idiom,
but I think that would also more explicitly break the compatibility promise
since valid existing Context implementations would become invalid.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19856#issuecomment-292002146, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AJSK3QNVxGbrkhj0DHsqt7jSgTjEeOgfks5rtAcTgaJpZM4M0trn
.

I can only find five Context implementations in go-corpus v0.01:

github.com/coreos/etcd/client/client_test.go
    fakeCancelContext is always Done, always has Err.

github.com/coreos/etcd/clientv3/watch.go
    valCtx is never Done, never has Err

github.com/gin-gonic/gin/context.go
    Context is never Done, never has Err

github.com/kataras/iris/context.go
    Context is never Done, never has Err

github.com/onsi/gomega/format/format_test.go
    ctx is never Done, never has Err

I searched for ' Done() <-chan struct{}' and sifted through all the non-context uses of that.

All these implementations follow the 'Err is nil until Done' rule. I suspect the main implementations do too, but I haven't checked. All evidence for changing

Err's return value is unspecified before Done is closed.

to

Err's return value should be nil before Done is closed.

and _not_ non-nil.

This would be a better solution from my view; I didn't suggest it because I assumed it was an unacceptable breaking change.

I also looked in the go-corpus for checks of Err() against nil and they are all over the place (I've filtered out vendored copies of these, which would also need to be updated, and I've also exclude *_test.go files). It's telling that we can't even get this right in our own code (first two):

cloud.google.com/go/bigtable/internal/gax/invoke.go

// If the deadline is exceeded...
if ctx.Err() != nil {
    return ctx.Err()
}

golang.org/x/crypto/acme/autocert/cache.go

// prevent overwriting the file if the context was cancelled
if ctx.Err() != nil {
    return // no need to set err
}

github.com/coreos/etcd/client/client.go

case <-hctx.Done():
    // cancel and wait for request to actually exit before continuing
    reqcancel()
    rtresp := <-rtchan
    resp = rtresp.resp
    switch {
    case ctx.Err() != nil: <<< not ok
        err = ctx.Err()
    case hctx.Err() != nil: <<< ok
        err = fmt.Errorf("client: endpoint %s exceeded header timeout", c.endpoint.String())
    default:
        panic("failed to get error from context")
    }

github.com/coreos/etcd/clientv3/client.go

if ctx != nil && ctx.Err() != nil {
    return true
}

github.com/coreos/etcd/mvcc/kvstore.go

if ctx == nil || ctx.Err() != nil {
    ...
    return
}

github.com/coreos/etcd/tools/benchmark/cmd/watch_get.go

for {
    st := time.Now()
    _, err := getClient[0].Get(ctx, "abc", v3.WithSerializable())
    if ctx.Err() != nil {
        break
    }
    r.Results() <- report.Result{Err: err, Start: st, End: time.Now()}
}

github.com/docker/docker/daemon/cluster/cluster.go

func initClusterSpec(node *swarmnode.Node, spec types.Spec) error {
    ctx, _ := context.WithTimeout(context.Background(), 5*time.Second)
    for conn := range node.ListenControlSocket(ctx) {
        if ctx.Err() != nil {
            return ctx.Err()
        }
        ...
    }
    return ctx.Err()
}

github.com/docker/docker/daemon/cluster/executor/container/adapter.go

func (c *containerAdapter) inspect(ctx context.Context) (types.ContainerJSON, error) {
    cs, err := c.backend.ContainerInspectCurrent(c.container.name(), false)
    if ctx.Err() != nil {
        return types.ContainerJSON{}, ctx.Err()
    }
    if err != nil {
        return types.ContainerJSON{}, err
    }
    return *cs, nil
}

github.com/docker/docker/daemon/cluster/executor/container/controller.go

err := r.adapter.wait(ctx)
if ctx.Err() != nil {
    return ctx.Err()
}

github.com/docker/swarmkit/node/node.go

for {
    if ctx.Err() != nil {
        return
    }
    ...
}

Based on the go-corpus results, I sent out CL 40291 narrowing the Context interface semantics. @Sajmani, the main designer of Context, is away this week, so I will leave this until he's back.

CL https://golang.org/cl/40291 mentions this issue.

I think your corpus results argue more that this should be a vet check than that we should change the definition of the interface. Even with the change, many (all?) of the examples you found should be waiting on ctx.Done instead.

@quentinmit, no they are not trying to wait, they are only doing a polling check. What they "should" be doing is:

var err error
select {
case <-ctx.Done():
    err = ctx.Err()
default:
   // ok
}
if err != nil

instead of if ctx.Err() != nil. But that's going to be a hard sell.

CL https://golang.org/cl/40396 mentions this issue.

In addition to the definition change, how about a one-shot stderr complaint from package _context_ if Err() is called before Done() is closed?

I don't think so, for two reasons: (1) if we change the definition, then it's fine to call Err before Done, and (2) Go programs don't chatter to standard error except for very serious problems.

See also https://github.com/golang/go/issues/14292#issuecomment-273391110.

The scalability implications (introducing a contention point due to the need to synchronize Err) might at least merit a lint check.

But someone doing the correct behavior is already synchronizing when
receiving on Done().

I don't know how channels are implemented, but if this is cheaper than,
say, a mutex check (e.g. if closed channels can be checked with an atomic
load rather than a lock), then Context implementers could use Done() as
the synchronization for Err(), so there should be little cost for correct
programs.

And if I understand the memory model correctly, an implementation that
isn't currently synchronizing Err() is erroneous, because a change to its
value may not be observed by a goroutine that does not call Done() to
synchronize with the close.

On Wed, Apr 12, 2017, 16:45 Bryan C. Mills, notifications@github.com
wrote:

See also #14292 (comment)
https://github.com/golang/go/issues/14292#issuecomment-273391110.

The scalability implications (introducing a contention point due to the
need to synchronize Err) might at least merit a lint check.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/19856#issuecomment-293702163, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AT4HVTEPofYioncqFO44rqu3g88p2LLVks5rvTgFgaJpZM4M0trn
.

@alercah You're correct that Err can use exactly the same synchronization as Done under the hood, and in that respect it's not really any worse. The fact that they're so similar is the problem, though: I would bet that to the median programmer ctx.Err() looks like it should be a lot cheaper than select.

Thinking about this some more, the appropriate lint warning would probably be something like:
Warn if ctx.Err() is called in a loop for which, along all codepaths, either ctx is being passed to some other function or there is already a select statement.

That condition may be a bit advanced for lint to try to resolve, though.

Over on CL 40291, @sajmani and @ianlancetaylor were discussing ctx.Err() check as being racy. Moving discussion here.

At best the check err := ctx.Err(); err != nil is only half-racy: if err != nil then you know that's not going to change. If err == nil, then it might change the next moment, but that's OK and fundamental to the fact that contexts are for notification of cancellation. There's always a propagation delay between the top-level request for cancellation and the pieces of the computation noticing that request. That's not really a race. I think it's perfectly fine to do such a check before kicking off some expensive computation, or even at intervals during such a computation.

As for how to do that, I guess we could go to some trouble that finds places where code says:

err := ctx.Err()

and helpfully remind people that they should instead be writing:

var err error
select {
case <-ctx.Done():
    err = ctx.Err()
default:
   // ok
}

That doesn't seem like an improvement though. Better to redefine ctx.Err() to match existing usage, implementations, and expectations.

"There's always a propagation delay between the top-level request for cancellation and the pieces of the computation noticing that request."

I agree. For lengthy computations done in-process, this is the correct idiom and desirable behaviour. Because of the cooperative cancellation model in Go, there is no way to actually directly interrupt a computation short of quitting the process. A goroutine cannot simultaneously be block on <-Done() and do computation, so instead it has to periodically check whether or not the context has been cancelled or timed out. This will necessarily introduce unavoidable real-time delay. It's up to the implementer of the process (which is under no obligation to check the Context in any case) to decide how long between checks.

The only time you're going to get an immediate response on a cancellation is when the goroutine checking cancellation has nothing else to do because it otherwise blocked. In that case it will select on both <-Done() and at least one other channel, and it will return as soon as the channel is closed (and the goroutine can be scheduled). Even this isn't foolproof, however, because if one of the other operations returns before the select statement is reached, the random behaviour of select means that the cancellation may go undetected unless and until the goroutine checks again.

Sameer is happy with this and I haven't seen any strong arguments against, so I'm going to accept this proposal.

Was this page helpful?
0 / 5 - 0 ratings