Go: runtime: non-cooperative goroutine preemption

Created on 26 Mar 2018  ·  122Comments  ·  Source: golang/go

I propose that we solve #10958 (preemption of tight loops) using non-cooperative preemption techniques. I have a detailed design proposal, which I will post shortly. This issue will track this specific implementation approach, as opposed to the general problem.

Edit: Design doc

Currently, Go currently uses compiler-inserted cooperative preemption points in function prologues. The majority of the time, this is good enough to allow Go developers to ignore preemption and focus on writing clear parallel code, but it has sharp edges that we've seen degrade the developer experience time and time again. When it goes wrong, it goes spectacularly wrong, leading to mysterious system-wide latency issues (#17831, #19241) and sometimes complete freezes (#543, #12553, #13546, #14561, #15442, #17174, #20793, #21053). And because this is a language implementation issue that exists outside of Go's language semantics, these failures are surprising and very difficult to debug.

@dr2chase has put significant effort into prototyping cooperative preemption points in loops, which is one way to solve this problem. However, even sophisticated approaches to this led to unacceptable slow-downs in tight loops (where slow-downs are generally least acceptable).

I propose that the Go implementation switch to non-cooperative preemption using stack and register maps at (essentially) every instruction. This would allow goroutines to be preempted without explicit
preemption checks. This approach will solve the problem of delayed preemption with zero run-time overhead and have side benefits for debugger function calls (#21678).

I've already prototyped significant components of this solution, including constructing register maps and recording stack and register maps at every instruction and so far the results are quite promising.

/cc @drchase @RLH @randall77 @minux

Proposal Proposal-Accepted

Most helpful comment

@networkimprov, I'm working hard to land it for 1.13, but I'm also uncovering a lot of dark corners of the scheduler, so it's slow going. This issue will be closed when it lands on tip, so you should see that.

All 122 comments

Change https://golang.org/cl/102600 mentions this issue: design: add 24543-non-cooperative-preemption

Change https://golang.org/cl/102603 mentions this issue: cmd/compile: detect simple inductive facts in prove

Change https://golang.org/cl/102604 mentions this issue: cmd/compile: don't produce a past-the-end pointer in range loops

Forwarding some questions from @hyangah on the CL:

Are code in cgo (or outside Go) considered non-safe points?

All of cgo is currently considered a safe-point (one of the reasons it's relatively expensive to enter and exit cgo) and this won't change.

Or will runtime be careful not to send signal to the threads who may be in cgo land?

I don't think the runtime can avoid sending signals to threads that may be in cgo without expensive synchronization on common paths, but I don't think it matters. When it enters the runtime signal handler it can recognize that it was in cgo and do the appropriate thing (which will probably be to just ignore it, or maybe queue up an action like stack scanning).

Should users or cgo code avoid using the signal?

It should be okay if cgo code uses the signal, as long as it's correctly chained. I'm hoping to use POSIX real-time signals on systems where they're available, so the runtime will attempt to find one that's unused (which is usually all of them anyway), though that isn't an option on Darwin.

And a question from @randall77 (which I answered on the CL, but should have answered here):

Will we stop using the current preemption technique (the dummy large stack bound) altogether, or will the non-coop preemption just be a backstop?

There's really no cost to the current technique and we'll continue to rely on it in the runtime for the foreseeable future, so my current plan is to leave it in. However, we could be much more aggressive about removing stack bounds checks (for example if we can prove that a whole call tree will fit in the nosplit zone).

So it is still possible to make goroutine nonpreemptable with something like:
sha256.Sum(make([]byte,1000000000))
where inner loop is written in asm?

Yes, that would still make a goroutine non-preemptible. However, with some extra annotations in the assembly to indicate registers containing pointers it will become preemptible without any extra work or run-time overhead to reach an explicit safe-point. In the case of sha256.Sum these annotations would probably be trivial since it will never construct a pointer that isn't shadowed by the arguments (so it can claim there are no pointers in registers).

I'll add a paragraph to the design doc about this.

will the design doc be posted here?

The design doc is under review here: https://golang.org/cl/102600

(As a reminder, please only post editing comments to the CL itself and keep technical discussion on the GitHub issue.)

Disclaimer: I'm not a platform expert, or an expert on language implementations, or involved with go aside from having written a few toy programs in it. That said:

There's a (potentially) fatal flaw here: GetThreadContext doesn't actually work on Windows (see here for details). There are several lisp implementations that have exhibited crashes on that platform because they tried to use GetThreadContext/SetThreadContext to implement preemptive signals on Windows.

As some old notes for SBCL point out, Windows has no working version of preemptive signals without loading a kernel driver, which is generally prohibitive for applications.

I think the example code to avoid creating a past-the-end pointer has a problem if the slice has a capacity of 0. You need to declare _p after the first if statement.

@mtstickney looks like it's true but we can look for other implementations, how they go about the same problem. CoreCLR talks about the same problem - they need to preempt threads for GC and talk about the same bugs with wrong thread context. And they also talk about how they solve it without ditching SuspendThread altogether by using redirection.

I'm not an expert in this kind of stuff so I'm sorry if this has nothing to do with solving the problem here.

@creker Nor me, so we're in the same boat there. I hadn't seen the CoreCLR reference before, but that's the same idea as the lisp approach: SuspendThread, retrieve the current register set with GetThreadContext, change IP to point to the signal code to be run, ResumeThread, then when the handler is finished restore the original registers with SetThreadContext.

The trick is capturing the original register set: you can either do it with an OS primitive (GetThreadContext, which is buggy), or roll your own code for it. If you do the latter, you're at risk for getting a bogus set of registers because your register-collecting code is in user-mode, and could be preempted by a kernel APC.

It looks like on some Windows versions, some of the time, you can detect and avoid the race conditions with GetThreadContext (see this post, particularly the comments concerning CONTEXT_EXCEPTION_REQUEST). The CoreCLR code seems to make some attempts to work around the race condition, although I don't know if it's suitable here.

Thanks for the pointers about GetThreadContext! That's really interesting and good to know, but I think it's actually not a problem.

For GC preemption, we can always resume the same goroutine on the same thread after preemption, so there's no need to call SetThreadContext to hijack the thread. We just need to observe its state; not run something else on that thread. Furthermore, my understanding is that GetThreadContext doesn't reliably return all registers if the thread is in a syscall, but in this case there won't be any live pointers in registers anyway (any pointer arguments to the syscall are shadowed on the Go wrapper's stack). Hence, we only need to retrieve the PC and SP in this case. Even this may not matter, since we currently treat a syscall as a giant GC safe-point, so we already save the information we need on the way in to the syscall.

For scheduler preemption, things are a bit more complicated, but I think still okay. In this case we would need to call SetThreadContext to hijack the thread, but we would only do this to threads at Go safe-points, meaning we'd never preempt something in a syscall. Today, if a goroutine has been in a syscall for too long, we don't hijack the thread, we simply flag that it should block upon returning from the syscall and schedule the next goroutine on a different thread (creating a new one or going to a pool). We would keep using that mechanism for rescheduling goroutines that are in system calls.

Change https://golang.org/cl/108497 mentions this issue: cmd/compile: teach Haspointer about TSSA and TTUPLE

Change https://golang.org/cl/108496 mentions this issue: cmd/compile: don't lower OpConvert

Change https://golang.org/cl/108498 mentions this issue: cmd/compile: don't compact liveness maps in place

Change https://golang.org/cl/109351 mentions this issue: cmd/compile: dense numbering for GP registers

Change https://golang.org/cl/109353 mentions this issue: cmd/compile, cmd/internal/obj: record register maps in binary

Change https://golang.org/cl/109347 mentions this issue: cmd/compile: introduce LivenessMap and LivenessIndex

Change https://golang.org/cl/109349 mentions this issue: runtime: use entry stack map at function entry

Change https://golang.org/cl/109350 mentions this issue: cmd/compile: output stack map index everywhere it changes

Change https://golang.org/cl/109348 mentions this issue: cmd/compile: enable stack maps everywhere except unsafe points

Change https://golang.org/cl/109352 mentions this issue: cmd/compile: compute register liveness maps

Change https://golang.org/cl/109346 mentions this issue: cmd/internal/obj: consolidate emitting entry stack map

No alternative to CGO? CGO blocked the development of golang

Change https://golang.org/cl/110177 mentions this issue: cmd/compile: incrementally compact liveness maps

Change https://golang.org/cl/110176 mentions this issue: cmd/compile: abstract bvec sets

Change https://golang.org/cl/110175 mentions this issue: cmd/compile: single pass over Blocks in Liveness.epilogue

Change https://golang.org/cl/110179 mentions this issue: cmd/compile: reuse liveness structures

Change https://golang.org/cl/110178 mentions this issue: cmd/compile: make LivenessMap dense

Change https://golang.org/cl/114076 mentions this issue: cmd/compile: fix unsafe-point analysis with -N

Hi all, a question about this design. First, I've only read it from a high level, and haven't followed the use cases where the cooperative preemption is problematic.

But the big picture question I have is: what about code which depends on cooperative preemption?
I don't think we can know the extent of it.

But I can give an example where non-cooperative preemption might be problematic, on a project I am working on. The context is communication with an OS/C thread via atomics and not cgo, where timing is very sensitive: real time deadline misses will render things useless (audio i/o).

If currently, some code controls the pre-emption via spinning and runtime.Gosched(), then it seems to me this proposal will break that code because it will introduce preemption and hence delay the thing which is programmed to not be pre-empted.

Again, there is no way for us to know how much such code there is, and without an assessment of that, it seems to me this proposal risks entering a game of whack-a-mole w.r.t. go scheduling, where you solve one problem and as a result another pops up.

Please don't take away programmer control of pre-emption.

Last question: what good could runtime.Gosched() possibly serve with per-instruction pre-emption?

Again, sorry I don't know the details of this proposal, but that might be the case with a lot of other code that uses runtime.Gosched() under the assumption of cooperative pre-emption.

@wsc1 can you provide a code example which breaks without cooperative scheduling?

@networkimprov please see

10958

for status and rationale of the problem and problems related to testing for it via the cloud.

@wsc1
No one said the test program should run via cloud.

@wsc1
No one said the test program should run via cloud.

@crvv testing audio latency violations is hard. it isn't expected to agree except on the same hard ware under the same operating conditions or under OS simulation, and the scheduler random seed can come into play. re-creating those things to placate the interests of pre-emptive scheduling is not my job, although I'm happy to help along the way. No one said what hardware or OS operating conditions either.

There are also a myriad of reasons why pre-emption in a real-time audio processing loop would cause problems documented there by audio processing professionals. The Go runtime uses locks and the memory management can lock the system. These things are widely accepted as things which cause glitches in real time audio because they can take longer than the real-wall-clock-time allocated to a low latency application. This is widely accepted. It is also widely accepted that the glitches "will eventually" happen, meaning that it is very hard to create a test in which they do in a snap because it depends on the whole system (OS, hardware, go runtime) state.

I do not find it so credible to quote out of context against the grain of best practice in the field. You could also provide a reason why you want a test on the github issue tracker. Do you believe that the worst case real-wall-clock-time of pre-emption doings inserted into a critical segment of real-time audio processing code shouldn't cause problems? Why?

To me, the burden of proof lies there. On my end, I'll continue to provide what info and alternatives I can to help inform the discussion.

But I can give an example where non-cooperative preemption might be problematic, on a project I am working on. The context is communication with an OS/C thread via atomics and not cgo, where timing is very sensitive: real time deadline misses will render things useless (audio i/o).

Non-cooperative preemption as proposed shouldn't introduce any more overhead or jitter than you would already see from any general-purpose operating system handling hardware and timer interrupts. If the system is not overloaded, which is already a requirement for such work, then it will simply resume from the preemption after perhaps a few thousand instructions. And we could probably optimize the scheduler to not even attempt preemption if it would be a no-op. On the flip side, tight atomic loops like you describe currently run the very real risk of deadlocking the entire Go system, which would certainly violate any latency target.

Again, there is no way for us to know how much such code there is ...

There's quite a lot of evidence that cooperative preemption has caused a large number of issues for many users (see #543, #12553, #13546, #14561, #15442, #17174, #20793, #21053 for a sampling).

@aclements It's easy to write some micro program to show this infinite loop, like all issues you mentioned, but hard to find a real case in a larger project. I can not see such goroutine running without calling any non-inlinable functions has any useful purpose.

@ayanamist This bug has been reported multiple times based on real code in real projects, including cases mentioned in the issues cited above. It's true that the cut down versions look like unimportant toys, and it's true that the root cause is often a bug in the original code, but this is still surprising behavior in the language that has tripped up many people over the years.

I'd just like to add some data to this discussion.

I gave a talk at several conferences [1,2,3,4,5] about this topic and collected some user experiences and feedback that can be summarized as:

  • Lack of preemption leads to very hard to debug behaviors (some latency spikes would only manifest if a GC happened in the right moment).
  • Some security issues (mostly DoS) were caused by this.
  • This is a surprising trait of the language, which is generally bad.

I believe this should be addressed, as long as some way to annotate code as not preemptible is provided (surprises should be opt-in, and not opt-out).

I'd just like to add some data to this discussion.

I gave a talk at several conferences [1,2,3,4,5] about this topic and collected some user experiences and feedback that can be summarized as:

  • Lack of preemption leads to very hard to debug behaviors (some latency spikes would only manifest if a GC happened in the right moment).
  • Some security issues (mostly DoS) were caused by this.
  • This is a surprising trait of the language, which is generally bad.

Thanks very much for the well balanced data.

I wanted to especially emphasise agreement with your last statement:

I believe this should be addressed, as long as some way to annotate code as not preemptible is provided (surprises should be opt-in, and not opt-out).

because not allowing un-preemptible code :

  1. reduces the expressivity and control of the programmer.
  2. is a major headache to take into account for certain problem domains, such as low latency audio or timed interactions with a device driver.
  3. renders many requirements on OS supplied interface for a callback impossible to fulfill (eg don't allocate memory, don't do sys-calls, etc)

I am not arguing that the above list is more important to take into account than the problems caused by the surprise, just emphasising that a way to annotate code as not pre-emptible should be a requirement.

I would like to ask that the title of this issue be changed so that it is not in the form of an opinion about the solution but rather in the form of stated problems. I think that might help guide the discussion toward consensus, as there is plenty of evidence of lack of consensus for the pre-emptive vs cooperative scheduling.

For example, "scheduling improvements" may be a less divisive choice.

Change https://golang.org/cl/158857 mentions this issue: design/24543-non-cooperative-preemption: more alternatives

Change https://golang.org/cl/158861 mentions this issue: design/24543-non-cooperative-preemption: conservative inner-frame scanning design

Change https://golang.org/cl/158858 mentions this issue: design/24543-non-cooperative-preemption: discuss signal selection

Change https://golang.org/cl/158859 mentions this issue: design/24543-non-cooperative-preemption: split out safe-points-everywhere

Change https://golang.org/cl/158860 mentions this issue: design/24543-non-cooperative-preemption: general preemption discussion

I propose that the Go implementation switch to non-cooperative preemption using stack and register maps at (essentially) every instruction. This would allow goroutines to be preempted without explicit

As far as I understand, Java VMs have essentially the same issue for GC safepoints, because they need to stop the world for GC heap compaction. Most JVMs use safepoints at function calls, just like Go and on the jumps backwards (i.e. at the end of the loops). Won't this be sufficient?

@Cyberax Go doesn't run in a VM. See this comment in the original issue:

@dr2chase has put significant effort into prototyping cooperative preemption points in loops, which is one way to solve this problem. However, even sophisticated approaches to this led to unacceptable slow-downs in tight loops (where slow-downs are generally least acceptable).

@aclements Maybe a dumb question, what does the proposal mean for system call handling? Does it mean that every call in the syscall package needs to be retried, since they might be interrupted by the scheduler?

@bergwolf We register all signal handlers with SA_RESTART, so in principle all signal handlers should restart anyhow. And for the several cases where they don't, we already need to retry them, since otherwise turning on the profiler breaks the program. So I don't think there is any fundamental change here with regard to signal handling.

Change https://golang.org/cl/171762 mentions this issue: runtime: abstract initializing and destroying Ps out of procresize

Change https://golang.org/cl/172281 mentions this issue: runtime: combine preemption reasons into one bit-field

Change https://golang.org/cl/172282 mentions this issue: runtime: move preemption task handling into scheduler

Change https://golang.org/cl/172284 mentions this issue: runtime: make preemption requests robust

@aclements, will this land in 1.13? If so, could you post here when it's working in tip?

@networkimprov, I'm working hard to land it for 1.13, but I'm also uncovering a lot of dark corners of the scheduler, so it's slow going. This issue will be closed when it lands on tip, so you should see that.

Change https://golang.org/cl/172857 mentions this issue: runtime: document P statuses

Change https://golang.org/cl/172983 mentions this issue: runtime: introduce _Gdead|_Gscan g status

Change https://golang.org/cl/172984 mentions this issue: runtime: make preemption requests prompt

Change https://golang.org/cl/172982 mentions this issue: runtime: make copystack/sudog synchronization more explicit

Change https://golang.org/cl/172989 mentions this issue: runtime: decouple stack shrinking from stack scanning

Change https://golang.org/cl/172991 mentions this issue: runtime: use "fire and forget" preemption for stack scans

Change https://golang.org/cl/173724 mentions this issue: runtime: make m.libcallsp check in shrinkstack panic

Change https://golang.org/cl/173941 mentions this issue: runtime: switch to P 0 before destroying current P

Change https://golang.org/cl/173942 mentions this issue: runtime: make _Pidle strictly mean on the idle list

Change https://golang.org/cl/173943 mentions this issue: runtime: make pidle list doubly-linked

Just stumbled across a bug caused by cooperative preemption in real life production code. The code looks essentially like this:

func (c *queue) wait() {
    for {
        c.RLock()
        flushed := c.flushed
        c.RUnlock()
        if flushed {
            return
        }
    }
}

We recently started inlining the fast-paths of sync.RWMutex.RLock and RUnlock, which means the fast path of this loop can execute without preemption points. If that causes the goroutine that should be setting c.flushed to not run, then this leads to a deadlock.

Obviously this busy loop isn't great, but it's interesting because it was in production code, and only started to fail when both RLock and RUnlock were inlined, the RWMutex was uncontended (otherwise it falls back to the slow path, which has a preemption point), and the goroutine schedule led to the goroutine that was supposed to set flushed getting blocked behind the spinning goroutine.

Change https://golang.org/cl/190415 mentions this issue: cmd/compile: enable loop preemption for problematic loops

For CL above, the change in binary sizes is surprisingly low. Benchmarking for time in progress.
https://perf.golang.org/search?q=upload:20190815.1
A superficial test demonstrates that the filtered preemptions are inserted where expected.

First round of results is not bad. Forget to also test against 1.12 to see if everyone is at least moving forward, am running that now.
https://perf.golang.org/search?q=upload:20190816.1

Change https://golang.org/cl/190577 mentions this issue: cmd/compile: add intrinsic sync.checkPreempt

Change https://golang.org/cl/200877 mentions this issue: cmd/internal/obj/x86: correct pcsp for ADJSP

Change https://golang.org/cl/201139 mentions this issue: runtime: remove g.gcscanvalid

Change https://golang.org/cl/201137 mentions this issue: runtime: add general suspendG/resumeG

Change https://golang.org/cl/201138 mentions this issue: runtime: remove old stack scanning code

Change https://golang.org/cl/201440 mentions this issue: runtime: ensure _Grunning Gs have a valid g.m and g.m.p

Change https://golang.org/cl/201438 mentions this issue: runtime: only shrink stacks at synchronous safe points

Change https://golang.org/cl/201439 mentions this issue: runtime: abstract M preemption check into a function

Change https://golang.org/cl/201401 mentions this issue: runtime: M-targeted signals for Linux

Change https://golang.org/cl/201403 mentions this issue: runtime: M-targeted signals for libc-based OSes

Change https://golang.org/cl/201402 mentions this issue: runtime: M-targeted signals for BSDs

Change https://golang.org/cl/201404 mentions this issue: runtime: add test for signalM

Change https://golang.org/cl/201760 mentions this issue: runtime: use signals to preempt Gs for suspendG

Change https://golang.org/cl/201759 mentions this issue: runtime: asynchronous preemption function for x86

Change https://golang.org/cl/201762 mentions this issue: runtime: implement async scheduler preemption

Change https://golang.org/cl/201757 mentions this issue: runtime: add GODEBUG=asyncpreemptoff=1

Change https://golang.org/cl/201758 mentions this issue: rnutime: support for injecting calls at signals on x86

Change https://golang.org/cl/201761 mentions this issue: runtime: scan stacks at async safe points

Can we have an environment variable in 1.14 to en/disable preemptive scheduling?

Change https://golang.org/cl/201777 mentions this issue: runtime: add a test for asynchronous safe points

@networkimprov, GODEBUG=asyncpreemptoff=1 will disable preemptive scheduling.

That said, if it does cause problems, please do report them (assuming this all gets in for 1.14). This ought to be invisible beyond eliminating bizarre deadlocks and improving throughput in some cases. If it's not invisible, we'd love to know why and better understand that.

Ha, I must have been subliminally tickled by the CL that just appeared, which I skimmed over :-)

I've gathered that a few apps rely on non-preemption, but I'm mostly concerned about unintended consequences on platforms that get fewer Go deployments, e.g. Windows and MacOS laptops.

That's certainly fair. Part of why I put in the environment variable is because this has a risk of a long bug tail, so it's nice to have a debugging knob (and an emergency escape hatch!).

Change https://golang.org/cl/202084 mentions this issue: cmd/compile: fix missing unsafe-points

Change https://golang.org/cl/203286 mentions this issue: runtime: atomically set span state and use as publication barrier

Change https://golang.org/cl/203283 mentions this issue: runtime/internal/atomic: add Store8

Change https://golang.org/cl/203285 mentions this issue: runtime: fully initialize span in alloc_m

Change https://golang.org/cl/203284 mentions this issue: cmd/compile: intrinsics for runtime/internal/atomic.Store8

Change https://golang.org/cl/203778 mentions this issue: runtime: change TestPreemptM from SIGUSR1 to sigPreempt

Change https://golang.org/cl/203957 mentions this issue: runtime: unblock SIGUSR1 for TestPreemptm

Change https://golang.org/cl/203977 mentions this issue: cmd/compile: fix missing lowering of atomic {Load,Store}8

@aclements what's the effect of runtime.Gosched() after this?

@networkimprov, this doesn't change the behavior of Gosched at all. (I'm not sure how it would... did you have a particular concern in mind?)

It occurred to me that Gosched() might become a no-op, since many (tho not all) uses of it will be unnecessary, and maybe detrimental.

Wouldn't that be nicer to handle through software with tight loops declaring a dependency on goX.Y and then removing the Goscheds that were there purely to break up tight computational loops.

The functionality of Gosched (give others a chance) might still be useful for anything resembling a spinlock, CAS, or such.

I want to use Gosched() as a minimal Sleep(), which is why I brought it up. So yes, we need it.

But if it's mostly detrimental under the preemptive regime, making it a no-op and adding a new API (Goyield?) could make sense.

Change https://golang.org/cl/204957 mentions this issue: runtime: clear preemptStop in dropm

Change https://golang.org/cl/204340 mentions this issue: runtime: support preemption on windows/{386,amd64}

Change https://golang.org/cl/207779 mentions this issue: runtime: ensure thread handle is valid in profileloop1

Change https://golang.org/cl/207961 mentions this issue: runtime: support non-cooperative preemption on windows/arm

Change https://golang.org/cl/208218 mentions this issue: runtime: stress testing for non-cooperative preemption

Change https://golang.org/cl/213837 mentions this issue: runtime: protect against external code calling ExitProcess

List of todo items posted in #36365

Is there a reason to leave this issue open, given the existence of #36365?

Nope! Closing.

Is there any way to disable preemption altogether? In most situation I want a simple single-threaded asynchronous model similar to node.js, where locks are mostly not needed.

Now even if I specify runtime.GOMAXPROCS(1), I have to protect against things like panic: concurrent map iteration and map write.

@szmcdull have you tried this runtime switch? Note that Go had preemption before 1.14...

$ GODEBUG=asyncpreemptoff=1 ./your_app arguments ...

@szmcdull have you tried this runtime switch? Note that Go had preemption before 1.14...

$ GODEBUG=asyncpreemptoff=1 ./your_app arguments ...

Yes I tried. But still got panic: concurrent map iteration and map

I think you were just lucky that you didn't see that before 1.14 :-)

Try https://golang.org/pkg/sync/#Map

For further Q's, I refer you to golang-nuts. You'll get more & faster responses there, generally.

Was this page helpful?
0 / 5 - 0 ratings