Go: proposal: context: add a way to detach from parent cancellation

Created on 15 Jul 2020  Â·  14Comments  Â·  Source: golang/go

What

I would like to propose to add a function in context that, given a parent Context, returns a new child Context with the same values of the parent but that will not be cancelled when the parent is cancelled.

Why

This is useful in multiple common and important scenarios:

  • handling of rollback/cleanup operations in the context of an event (e.g. http request) that have to continue regardless of whether the triggering event is cancelled (e.g. due to timeout, or the client going away)
  • handling of long-running operations triggered by an event (e.g. http request) that terminates before the termination of the long-running operation

This is doable today by not propagating the context of the triggering event, and replacing it instead with a new context obtained from context.Background(). This is problematic though, as the new context does not contain any of the values attached to the context of the triggering event, and these values are important to e.g. ensure correct logging/tracing/error recovery functionality (a common scenario when using a middleware-based approach to request/event handling).

As noted below with @davecheney, a nice consequence that naturally falls out of this approach is that it effectively turns cancellation into a regular context value that can be overridden in children contexts. It doesn't solve the problem of cancellation being conflated with the intent of context being just a "bag of values" (that would almost certainly require breaking changes to solve) but it's an effective step into alleviating the situation, and it's backwards compatible.

As noted further below with @martisch the benefit of this approach is that it's pretty much as minimal, composable and inline with the current design of the context package as possible, requiring a single new public API and eschewing conflating additional mechanisms (goroutines).

An important point to be made is that all existing implementations of this (see below) rely on an internal/undocumented guarantee of cancelCtx. If this proposal is shot down at least that guarantee should be explicitly documented in the exported API.

Existing implementations

Looking around it is possible to find multiple reimplementations of this proposal, almost identicals but with different names. I'm not advocating for a specific name here.

  • An implementation already exists here: https://godoc.org/github.com/golang/tools/internal/xcontext#Detach (technically not optimal, since Value() can be implemented by embedding the parent Context) - this implementation is also duplicated at https://godoc.org/golang.org/x/pkgsite/internal/xcontext#Detach
  • Just internally to our company I counted 3 similar implementations (the "canonical" one is called contextutil.WithoutCancellation, with the package name chosen only to avoid clashes with the context package from the standard library)
  • @martisch mentioned below seeing a detach.FromContext implementation
  • I expect other non-public codebases elsewhere to contain additional implementations. Please let me know if you're aware of additional ones.

Implementations are trivial and would add a single public function to context.

Proposal

Most helpful comment

I can't help but feel that this request is a direct by product of context's conflation of cancellation and a skiplist of values. I agree with the rationale for this, but also feel that this is pushing the use case of context as a bag of values beyond its intention.

All 14 comments

I can't help but feel that this request is a direct by product of context's conflation of cancellation and a skiplist of values. I agree with the rationale for this, but also feel that this is pushing the use case of context as a bag of values beyond its intention.

I can't help but feel that this request is a direct by product of context's conflation of cancellation and a skiplist of values.

Yes, without said conflation this proposal wouldn't be needed.

I agree with the rationale for this, but also feel that this is pushing the use case of context as a bag of values beyond its intention.

While my primary intent here (that is much more immediate) wasn't to move that discussion forward... in a way, this proposal could also be seen as a first step in sidestepping (undoing?) the conflation, since it effectively turns cancellation into a regular copy-on-write value. It doesn't magically solve all the other problems with cancellation, but it's nevertheless a step in the direction of alleviating the problems with conflation.

I also have seen a detach package before with a function FromContext that takes a parent context and creates a new context without propagation of the cancelation deadline and returns a new context that keeps all the values of the parent.

It might however be better to encapsulate the specifc use case of detached background tasks by having a Task struct and a Go(ctx, f(ctx)) Task method that takes a function f to execute in a new goroutine. That Task struct can then offer methods itself e.g. useful for canceling and checking if the task is finished. The Go function can have an internal mechanisms with registration or an additional parameter to decide what values to expose in the detached context.

This might also avoid adding more exposed API to context itself.

@martisch i wouldn't add goroutines to the mix. There are legitimate use cases (rollback/cleanup operations that can run in a defer or inside an if err != nil {} being an example) where they are not needed.

Also, your counterproposal seem to be way more complex in terms of number of new exposed APIs (one new package, one new struct/interface and at least one new method) compared to the one proposed here. You're right though it wouldn't need to be in context, although I question the intuitiveness of that (since all packages in the wild are basically called some close variation of context, I have to assume that's where people do expect such a function to live).

To be 100% clear though, I completely agree that the current design of context is less than ideal due to the conflation mentioned above. And it would be ideal if a new context design solved that.

But, as mentioned above, my intent here was not really to propose a new design for context: it's just to address an omission that fails to cover important use cases, _within_ the framework of the _current_ context design. To that end, creating a new/separated "task" package with a design that seem to differ significantly from that of context could lead to a significantly less intuitive and discoverable mechanism (but maybe this is just because I misunderstood your idea... in which case an example/sketch of what you have in mind may help clarify that).

It's worth noting that a new context design that solved the conflation would naturally address the scenarios mentioned in the proposal. Therefore, even if this proposal was accepted, it would not pose additional issues in case users had to migrate to the new design. As such, there's not much for the argument that these scenarios must wait for a context redesign to be addressed.

Note that the detach package mentioned by @martisch is essentially the same API as was proposed in #19643 (thanks @navytux for the cross-reference!). See https://github.com/golang/go/issues/19643#issuecomment-288477679 in particular for some discussion of why that API is needed.

To summarize: values stored in a Context, such as logging or tracing entries, may be scoped to a specific call or operation and flushed or finalized at the end of that call or operation. The API for detaching from such a Context should provide a means to extend or annotate such a value while it is still valid.

values stored in a Context, such as logging or tracing entries, may be scoped to a specific call or operation and flushed or finalized at the end of that call or operation

Do we have any actual example of this need? I am thinking at all middlewares we are using and can't think of any that would need that (including tracing and logging).

Furthermore, even if it was really needed, middleware (again, thinking mostly of logging and tracing) will normally already happily let you override the current values (e.g. a logger or trace/span), so I'm not really convinced about the need for a specific, more complicated to cover it. Maybe a concrete example may help understand.

Do we have any actual example of this need?

...and presumably others: anything with a local buffer of logs or traces, or a local tree of spans or regions, that needs the buffer or parts of the tree to be flushed to a disk or a remote log server when they are “complete”.

Furthermore, even if it was really needed, middleware (again, thinking mostly of logging and tracing) will normally already happily let you override the current values (e.g. a logger or trace/span)

You can only override values that you know about. The point of Context.Value is that most parts of the program _don't_ need to know what's in it. You should be able to use a library without telling that library in particular about which (if any) logging and/or tracing libraries the rest of your program (above and below it) happens to be using.

In practice, that means one of two things:

  1. Synchronous functions and operations, as (potentially) in the case of your “rollback/cleanup operations” use-case. But those should generally be agnostic to cancellation anyway: either they cannot fail or block (in which case there is nothing to cancel), or they must be robust to failure to clean up (in which case why bother cleaning up at all?), or they leak arbitrary resources (in which case we are back to “failure to clean up” by way of an OOM signal).

    • “Detach” would be a misnomer for such an API, because it is not actually detaching from the parent context or the parent operation — rather, it is ignoring cancellation for a task that is still logically _part of_ and _scoped within_ the same operation.
  2. Some global, standardized notion of “the [non-]copyable parts of a context.Context” that decouples ”a library spawning an asynchronous task” from “the set of scoped logging systems in use in the program”.

    • This is exactly #19643. You could perhaps swap the defaults (“keep by default” vs. “discard by default”), but I don't see a way to make the API fundamentally _simpler_.

Do we have any actual example of this need?

...and presumably others: anything with a local buffer of logs or traces, or a local tree of spans or regions, that needs the buffer or parts of the tree to be flushed to a disk or a remote log server when they are “complete”.

Thanks for the list. I am familiar with some, but not all, of these... can you help me understand exactly what you wouldn't be able to do with the current proposal?

My current impression is that flushing a logger when the triggering request completes is not a very convincing example, as whatever middleware added the logger to the context could trivially flush the logger when the request terminates; similarly the asynchronous task knows which logger it is using, so it can flush it when the task completes (I am setting aside for a second the fact that flushing logs at every request is probably not such a common requirement; all loggers and tracers I worked with normally flush based on time or buffer size, and at process shutdown).

The other use-case I read somewhere is IIUC ensuring that values attached to the context that are needed before cancellation but not after are not retained while the derived child context stays alive. I must say I never faced this problem... maybe some concrete examples of where this was needed could help me understand better.

“Detach” would be a misnomer for such an API, because it is not actually detaching from the parent context or the parent operation — rather, it is ignoring cancellation for a task that is still logically part of and scoped within the same operation.

Clarified I did not specifically argue for a specific name, thanks! (our internal impl is not called Detach, just to give context)

You can only override values that you know about. The point of Context.Value is that most parts of the program don't need to know what's in it. You should be able to use a library without telling that library in particular about which (if any) logging and/or tracing libraries the rest of your program (above and below it) happens to be using.

Doesn't this apply in reverse to https://github.com/golang/go/issues/19643 as well? The code that registers a value as "preservable" should be aware of all possible uses of a certain context value.

Also, in that design, what would happen if different code paths were to require different values to be preserved?

Synchronous functions and operations, as (potentially) in the case of your “rollback/cleanup operations” use-case. But those should generally be agnostic to cancellation anyway: either they cannot fail or block (in which case there is nothing to cancel), or they must be robust to failure to clean up (in which case why bother cleaning up at all?), or they leak arbitrary resources (in which case we are back to “failure to clean up” by way of an OOM signal).

I'm not sure about the point here; the use case I'm trying to cover is the non-synchronous one (w.r.t. parent cancellation; if the parent has already been cancelled because the client connection broke there's no real difference between running in a separate goroutine or not)

Some global, standardized notion of “the [non-]copyable parts of a context.Context” that decouples ”a library spawning an asynchronous task” from “the set of scoped logging systems in use in the program”.

I understand that there's a pre-existing design in https://github.com/golang/go/issues/19643, but I think it's worth noting that it was declined because of lack of evidence it was the correct one. Has this changed?

Thanks for the list. I am familiar with some, but not all, of these... can you help me understand exactly what you wouldn't be able to do with the current proposal?

Depends on the library, but in general you end up with some form of use-after-Close bug when you invoke some library that adds a logging or trace span during an asynchronous call. That could manifest as a silent failure to log something you intended to log, or an explicit error return from a call you expected to always succeed, or in extreme cases a run-time panic.

Doesn't this apply in reverse to #19643 as well? The code that registers a value as "preservable" should be aware of all possible uses of a certain context value.

Also, in that design, what would happen if different code paths were to require different values to be preserved?

Different code paths cannot require different values. (Values should be preserved by intersecting the set of preservable keys with the set of keys found in the parent context.)

The preservation semantics are based on the meaning of the context value, not any detail about its usage, so the “preserver” code is independent of the point of use. For example, if you have a tracing library, “detaching” for an asynchronous operation should be an event in the parent trace that results in its own (related but distinct) tracing span: that is the only coherent way to “preserve” continuity of the trace, and it should be correct to do any time a context is used asynchronously.

I understand that there's a pre-existing design in #19643, but I think it's worth noting that it was declined because of lack of evidence it was the correct one. Has this changed?

I think we have pretty solid evidence that using a context asynchronously is _incorrect_, or at least error-prone, but I'm not sure that we have any more evidence than we did about whether the API proposed in #19643 is _ideal_.

FWIW, speaking for the Frameworks Go team within Google, having a way to detach from the parent context would be enormously beneficial for us. We want the trace context, request IDs and some other values preserved in all work that is started from a request handler, or from the framework-specific Task primitive.

In fact, we have static analysis that flags the use of context.Background in this kind of handler, and it keeps tripping up on open-source code that the internal stuff depends on.

Was this page helpful?
0 / 5 - 0 ratings