go version)?1.8.1
go env)?darwin amd64, but this is a general issue I also encounter in docker containers
I'm trying to reliably gracefully shut down an http Server: https://play.golang.org/p/ESmgJ6O_WU
However, because ListenAndServe() blocks while the Server is running, there is no way to ensure its call _happens before_ a call to Shutdown(), so Shutdown() may be called before ListenAndServe(), in which case it returns nil and the later call to ListenAndServe() still starts a server that is then never shut down.
I expect to have some way of knowing that the server is now up and running such that Shutdown() will stop it, short of sending it http requests. Maybe an interface closer to httptest.Server.
It seems I either need to repeatedly do http requests to the server to ensure it is up before calling Shutdown() or repeatedly call Shutdown() until ListenAndServe() returns, like this: https://play.golang.org/p/LPfpz0LaBY
Note that due to the pre-emptive nature of Goroutines, since Shutdown() can never be called on the same Goroutine as ListenAndServe(), this theoretically affects _all_ programs using Shutdown() without a loop.
Hi @mrwonko, in order to follow Go's conventions on issues: could you please change the subject of the issue to: net/http: No simple way to definitely shut down an http.Server
Thanks
I'm busy this week and don't have time to look at this. Can somebody distill this down to a short summary for me?
tldr: Because ListenAndServe() never returns, there's no way to ensure a call to Shutdown() happens after the server is started, so it may be a noop and afterwards the server still starts and runs forever. Non-returning functions are very prone to race conditions.
ListenAndServe is a convenience function. If you don't find it convenient, do the Listen and Serve (https://golang.org/pkg/net/http/#Server.Serve) steps separately.
But I still haven't read this enough to understand the problem so I might be replying prematurely, though.
Serve() has the same problems as it does not return (until the server is shut down) either.
I believe the issue is that there's no simple way to know if the server has started.
If the shutdown is happening immediately after the Serve() goroutine is started there's a race and Shutdown() may be called on a server which has not started.
This could be worked around with a custom listener (https://play.golang.org/p/x15ZPYYmfo), but I think it would be useful if there were a Started() method (or something similar) on a server that blocked until the server has started.
As @quentinmit wrote in https://github.com/golang/go/issues/16220#issuecomment-236285946
If we were designing net/http from scratch, I would say that ListenAndServe should take a Context argument. But we can't add that now without breaking the Go1 compatibility guarantee, and I don't think it's worth adding a second ListenAndServeWithContext. :/
Turns out adding a ServeWithContext() might now be necessary given the raciness of current Shutdown() solution
@bradfitz Here is a version which just uses Listen and Serve and it deadlocks on play.golang.org when you close the server immediately. I'm running into this with some tests.
https://play.golang.org/p/LOD878em-R
When you trigger a context switch it works.
2009/11/10 23:00:00 waiting
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x107814dc, 0x5516)
/usr/local/go/src/runtime/sema.go:47 +0x40
sync.(*WaitGroup).Wait(0x107814d0, 0x1)
/usr/local/go/src/sync/waitgroup.go:131 +0x80
main.main()
/tmp/sandbox649309351/main.go:29 +0x200
goroutine 5 [semacquire]:
sync.runtime_notifyListWait(0x107504e0, 0x0)
/usr/local/go/src/runtime/sema.go:297 +0x140
sync.(*Cond).Wait(0x107504d8, 0x107504d0)
/usr/local/go/src/sync/cond.go:57 +0xc0
syscall.(*queue).waitRead(0x107504d0, 0x1, 0x0, 0x0, 0x9f6c0, 0x5516, 0x3fd980, 0x0)
/usr/local/go/src/syscall/net_nacl.go:284 +0x120
syscall.(*msgq).read(0x107504d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/syscall/net_nacl.go:401 +0xc0
syscall.(*netFile).accept(0x10734240, 0x5516, 0x10734240, 0x0, 0x0, 0x1070e030, 0x800000, 0x0)
/usr/local/go/src/syscall/net_nacl.go:563 +0x60
syscall.Accept(0x3, 0x107421e8, 0x4f700, 0x5516, 0x107321fc, 0x10715e98, 0x8, 0x5516)
/usr/local/go/src/syscall/net_nacl.go:783 +0xa0
net.accept(0x3, 0x2fbec0, 0x107421e0, 0x0, 0x18, 0x2bf020, 0x0, 0x1070e030)
/usr/local/go/src/net/sys_cloexec.go:45 +0x40
net.(*netFD).accept(0x107421e0, 0x5516, 0x0, 0x0, 0x0, 0x1070e030)
/usr/local/go/src/net/fd_unix.go:422 +0xe0
net.(*TCPListener).accept(0x1070c518, 0x5516, 0x1073213c, 0x10740d40, 0x2a1140, 0x3f5100)
/usr/local/go/src/net/tcpsock_posix.go:136 +0x40
net.(*TCPListener).Accept(0x1070c518, 0x10740d20, 0x2a1140, 0x3f5100, 0x2c2760, 0x107409c0)
/usr/local/go/src/net/tcpsock.go:228 +0x60
net/http.(*Server).Serve(0x1073e680, 0x3d2ae0, 0x1070c518, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/server.go:2643 +0x260
main.main.func1(0x107814d0, 0x1073e680, 0x3d2ae0, 0x1070c518)
/tmp/sandbox649309351/main.go:22 +0x80
created by main.main
/tmp/sandbox649309351/main.go:25 +0x140
One way to work around this for now is to use a wait group in reverse. I think you can still have a context switch between isUp.Done() and srv.Serve so this shouldn't be 100% safe but better than sprinkled context switches. At least code from the right go routine was executed.
```go
srv := &http.Server{}
var isUp sync.WaitGroup
isUp.Add(1)
go func() {
isUp.Done()
if err := srv.Serve(l); err != nil {...}
}()
isUp.Wait()
// safe to call srv.Shutdown() from here on
I think there is another, related race condition: a connection can be accepted in Serve but not closed by Shutdown. The sequence is something like this:
Server.Serve: ln.Accept()
Server.Shutdown: s.closeListenersLocked()
Server.Shutdown: s.closeIdleConns() == true
Server.Serve: c.setState(StateNew)
The fix would be to wait for Serve to exit before returning from Shutdown.
I stumbled across this issue, so I took a few hours taking a look. I investigated the interaction between Serve() and Shutdown()/Close() instead of ListenAndServe() as the original report discusses, as that's where the main logic is.
First, I did see a somewhat surprising behavior when I tried to run code similar to https://github.com/golang/go/issues/20239#issuecomment-299245151. You would expect the server to stop and Serve() to bail out upon calling Close(), but Serve() just blocks on l.Accept(), waiting for connections.
The reason for this is that while Close() kills off all net.Listeners that were registered before the call to Close() is made, it does not necessarily "stop" the Serve() call. The order of event that leads to Serve() to block is as follows:
srv.Close() gets called, kills all previous connectionssrv.Serve(l) gets calledsrv.trackListeners(l) gets called, registers the listener lsrv.trackListeners(l) sees that the number of registered listeners == 0, so clears srv.doneChan, and basically resets the state of the Server as running.l.Accept(). It will even go into the server process loop again, because the srv.doneChan says that we're running.The really tricky part is that srv.trackListeners() actually resets srv.doneChan.
From the perspective of a user who is creating a simple Web server, you would expect Close() to set a flag in the Server to not allow it to process anymore requests, so the fact that Serve() could block should be understandably confusing.
However, this reset is there to allow re-using the Server even after Close() was called. In pseudocode, this is allowed in the current http.Server:
var srv http.Server
l1, err := newListener(...)
go srv.Serve(l1)
srv.Close()
// ... later ...
l2, err := newListener(...)
go srv.Serve(l2)
It's fairly easy to make Serve() not block when Close() is called before it, but a simple fix would actually make the Server not work when Serve() is called again.
We have a situation where we are sharing http.Server (global state) between multiple goroutines, and their behavior is timing dependent as to when you call Close() and Serve().
Given this fact and that we have a backwards compatible guarantee, I think we should be looking at introducing a context.Context aware API. This would free the user from having to know the timing when the server is ready as the OP posted alltogether
I expect to have some way of knowing that the server is now up and running such that Shutdown() will stop it, short of sending it http requests.
l1, ... := newListener(addr1)
l2, ... := newListener(addr2)
ctx, cancel := context.WithCancel()
go srv.ServeCtx(ctx, l1)
go srv.ServeCtx(ctx, l2)
// ... later ...
cancel() // kills both previous
// if you need to reuse srv, you can, by using a new context
go srv.ServeCtx(ctx2, l3)
Maybe inclusion of such API is planned, I don't know.
But meanwhile, without a new API, I don't think this an easy to fix problem. So I'd like to suggest documenting that it is the users' responsibility to synchronize calling Close()/Shutdown() and Serve() (or equivalent), or make sure to call Close()/Shutdown() repeatedly until the call to Serve() returns (this is not an elegant solution, but it should work 99% of the times).
If people are up for (1) documentation and (2) context.Context aware API, I'm willing to work on either.
This is a fundamental mistake in the Shutdown API. The idea that you can Shutdown a Server and then once it's done call Serve again is wrong. We should make it be that once you Shutdown a server, it's shut down. Future calls to Serve simply return errShutdown.
If this is too invasive for now it would be fine to punt to Go 1.11 but there's really no other way it can work. Any other way, you have this race between concurrent Shutdown and Serve behaving unpredictably.
It's only a few lines of code to make shutdown sticky, and there's a decent argument for making this possibly-breaking change now instead of letting Shutdown last another release in the racy form (it was introduced in Go 1.8), just to head off 6 more months of people potentially thinking they can Serve again after Shutdown.
/cc @tombergan @bradfitz to decide about Go 1.10 vs Go 1.11
(The needed decision is what milestone.)
Change https://golang.org/cl/81778 mentions this issue: net/http: prevent Server reuse after a Shutdown
I'm trying to understand better this issue and I have a few questions:
(All identifiers are fields or methods of http.Server)
1) This CL address Shutdown, but what about Close? Just an oversight or there is some reason to still allow reuse after Close? It seems to me that Close suffers from exactly the same problems.
2) Is it correct that we want to be able to unblock any (present and future) ListenAndServe/Serve reliably with just one call to Close/Shutdown?
That would be useful and spare the need for the aforementioned loop workaround.
This CL doesn't seem to achieve it:
Close/Shutdown will leave ListenAndServe/Serve blocked;Close/Shutdown will make a ListenAndServe/Serve return "use of closed network connection" instead of "server closed" (see test below).3) I'm trying to understand why there is a need for doneChan to be replaced during the server life cycle (besides supporting reuse).
Can someone explain the following piece of code, please?
How, closing doneChan "closes http2 conns" here?
func (srv *Server) SetKeepAlivesEnabled(v bool) {
// [CUT]
// Close HTTP/2 conns, as soon as they become idle, but reset
// the chan so future conns (if the listener is still active)
// still work and don't get a GOAWAY immediately, before their
// first request:
srv.mu.Lock()
defer srv.mu.Unlock()
srv.closeDoneChanLocked() // closes http2 conns
srv.doneChan = nil
Proposal:
inShutdown to http.atomicBool, to make it explicit that it goes from false to true and never back.shuttingDown with inShutdown.isSet (they do the same thing).srv.inShutdown.setTrue() on top of both Close and Shutdown.doKeepAlives return false not only after Shutdown but after Close too; I don't suppose it would be a problem but correct me if I'm wrong.Serve, add if srv.inShutdown.isSet() { return ErrServerClosed } between the call to trackListener and the for loop.l.Accept is reached, then the listener l was already tracked before any Close/Shutdown had the chance to run.Serve, replace the select with if srv.inShutdown.isSet() { return ErrServerClosed }.closeDoneChanLocked, getDoneChan, getDoneChanLocked, shuttingDown, and field doneChan.This should fix mentioned problems and simplify a bit, without changing any behavior that is currently predictable.
But I may be missing something, specially about the above code block.
I run a test program spawning 3 goroutines per http.Server (2 calling Serve and 1 calling Close).
In one test run with the current version (100,000 iterations):
Serve returned http.ErrServerClosedServe returned "use of closed network connection"Serve never returnedWith the proposed modifications all 200,000 Serve returned http.ErrServerClosed.
@pam4, let's do that conversion to atomicBool in August when the Go 1.12 tree opens. I added a TODO for that in the just-updated https://go-review.googlesource.com/c/go/+/81778
@bradfitz Thanks for your reply, I have a few concerns about the CL:
1) The serveDone channel in method Serve seems to be unused. Is there a purpose to it?
serveDone := make(chan struct{})
defer close(serveDone)
2) This code in the trackListener method should probably be removed, since we are now preventing Server reuse:
// If the *Server is being reused after a previous
// Close or Shutdown, reset its doneChan:
if len(s.listeners) == 0 && len(s.activeConn) == 0 {
s.doneChan = nil
}
3) Actually, I don't understand why we are still using Server.doneChan at all. Is it because at the moment we want to change as little as possible?
My suggestion was to replace the select in the Serve method with if srv.shuttingDown() { return ErrServerClosed }.
Is there any reason why doneChan is not completely redundant?
4) doneChan is closed and reset in the SetKeepAlivesEnabled method, but I can't think of any useful effect in doing so.
The only effect I can imagine is to cause Serve to terminate unexpectedly, and that could be a problem.
How can it affect http2 connections, as the comment says?
// Close HTTP/2 conns, as soon as they become idle, but reset
// the chan so future conns (if the listener is still active)
// still work and don't get a GOAWAY immediately, before their
// first request:
srv.mu.Lock()
defer srv.mu.Unlock()
srv.closeDoneChanLocked() // closes http2 conns
srv.doneChan = nil
5) (A minor observation) I see that you put the shuttingDown check inside trackListener (changing trackListener signature).
I see no problem with that code, I just wanted to mention that doing the check anywhere between trackListener and the Accept should be equally safe for our purposes (there is no need for the check to be in the critical section, or for the listener to be tracked conditionally).
Thanks for your reply, I have a few concerns about the CL:
Replies on Gerrit would be easier, but thanks for the comments.
The serveDone channel in method Serve seems to be unused. Is there a purpose to it?
No, looks unused. I'll send out a change to delete it.
This code in the trackListener method should probably be removed, since we are now preventing Server reuse:
Nice catch. Will remove.
And yes, the doneChan stuff is for http2. Or, it was. That changed in https://go-review.googlesource.com/43230 which is only for Go 1.9+.
So, yeah, the "closes http2 conns" comment doesn't seem accurate anymore. I'll file a bug.
Change https://golang.org/cl/122820 mentions this issue: net/http: remove dead code noted in post-submit review of CL 81778
Most helpful comment
This is a fundamental mistake in the Shutdown API. The idea that you can Shutdown a Server and then once it's done call Serve again is wrong. We should make it be that once you Shutdown a server, it's shut down. Future calls to Serve simply return errShutdown.
If this is too invasive for now it would be fine to punt to Go 1.11 but there's really no other way it can work. Any other way, you have this race between concurrent Shutdown and Serve behaving unpredictably.
It's only a few lines of code to make shutdown sticky, and there's a decent argument for making this possibly-breaking change now instead of letting Shutdown last another release in the racy form (it was introduced in Go 1.8), just to head off 6 more months of people potentially thinking they can Serve again after Shutdown.
/cc @tombergan @bradfitz to decide about Go 1.10 vs Go 1.11