Rocket: Clean shutdown?

Created on 8 Feb 2017 · 23Comments · Source: SergioBenitez/Rocket

Is there any way to perform a "clean" shutdown of Rocket, such that any requests currently being handled can finish (and such that managed state can be cleanly dropped)? I'm assuming right now that pressing ^C just kills the program without any graceful shutdown, since I would expect Rocket to print something to stdout otherwise.

Besides not wanting to have a bad experience for users, I'm primarily concerned with the fact that the program I'm writing is using SQLite and therefore killing it abruptly in the middle of any database mutations is probably a bad idea.

accepted feedback wanted request

Source

lilyball

👍41

Most helpful comment

I would really like this as well. Let's get this in at some point. Slating for 0.7 for now.

SergioBenitez on 9 Feb 2017

🎉14

All 23 comments

It seems that ::run_until is intended for this purpose in Hyper ( https://github.com/hyperium/hyper/blob/2fa414fb5fe6dbc922da25cca9960652edf32591/tests/server.rs#L170 ) added in master.

Rocket uses 0.9.x and ::handle_threads which doesn't exist in master on Hyper. It's not possible to do this in Hyper 0.9.x.

I double :+1: this issue though. It's critical for production applications. I use the concept (in other languages/frameworks) all the time during a zero-downtime upgrade.

mehcode on 8 Feb 2017

@mehcode Rocket is on Hyper 0.10, not 0.9.

SergioBenitez on 8 Feb 2017

Oops. I had the versions wrong. Well.. the rest of the stuff still applies. Non-async Hyper never got closing a server fixed.

mehcode on 8 Feb 2017

Argh. Would it be possible for Rocket to enter a mode where it just returns a 503 Service Unavailable to any new connections, as it waits for existing requests to finish, so it can do a nearly-graceful shutdown? As long as the currently-open requests finish processing and the managed state is dropped, it's not a huge problem if the server continues accepting new requests and returns errors.

lilyball on 8 Feb 2017

As long as the currently-open requests finish processing and the managed state is dropped, it's not a huge problem if the server continues accepting new requests and returns errors.

Good point.

However my concern is I need to stop the process. I can just stop sending new requests to a Rocket process and wait maybe N minutes before killing the process. It's a bit hacky but it could work.

mehcode on 8 Feb 2017

I would really like this as well. Let's get this in at some point. Slating for 0.7 for now.

SergioBenitez on 9 Feb 2017

🎉14

So right now hyper allow keep-alive to be turned off for a connection. https://github.com/hyperium/hyper/pull/1390/
https://github.com/hyperium/hyper/issues/1365

karuna on 5 Jan 2018

This needs a new version of hyper: https://github.com/hyperium/hyper/issues/338#issuecomment-347044047

tcmal on 13 Oct 2018

Hi! This feature is a show stopper for me, as I am trying to write hardware-in-the-loop testing framework for IoT devices - I'd need to be able to spawn a server for each test case / test fixture and close it on demand.

Does this feature require a ton of porting activities or is it just bumping the version and adding some kind of close() method that calls hyper's close? I might like to try to contribute but I am rust newbie...

mchodzikiewicz on 27 Mar 2019

@mchodzikiewicz rocket currently uses hyper 0.10, which doesn't support this. Adding it would require changes in both hyper (the unmaintained branch) and Rocket. Upgrading to the latest version of hyper (which does support clean shutdown) is desired for many other reasons, but is also a lot of work and changes Rocket significantly.

jebrosen on 30 Mar 2019

@jebrosen thank you for the clarification. Is there any timeline for this? I do not see any obviously related issues.
Can you recommend any workaround for this? Is there a better way than using platform specific std::os::unix::thread::JoinHandleExt and call libc::pthread_cancel ? Will it cause any problems on spawning rocket one more time?

mchodzikiewicz on 1 Apr 2019

Killing the thread or process that rocket is running on is the most obvious solution that comes to mind (EDIT: cancelling individual threads is very unwise, see discussion further on). Managing a separate process should be cleaner and more portable than a thread, but also more complex to set up.

jebrosen on 4 Apr 2019

Once Rocket finishes the conversion to async, I think this will be pretty simple. Rocket could spawn a future on its thread that will only resolve once a key combination (ctrl+C most likely) is pressed. It could then use the auto-shutdown feature of a runtime to keep the server running until the keybinding is pressed.

The downside of this is that we'd have to check for a binary state on every request so we know whether to fulfil it or reject it outright (as we're shutting down).

jhpratt on 2 Aug 2019

Rocket could spawn a future on its thread that will only resolve once a key combination (Ctrl+C most likely) is pressed.

Yikes, signal handling! But yes, setting a "please shut down" flag on Ctrl+C should be possible. We should consider exposing some kind of rocket.shutdown() method to set that flag as well, or something accessible to routes.

The downside of this is that we'd have to check for a binary state on every request so we know whether to fulfil it or reject it outright (as we're shutting down).

hyper should be able to do this part for us - https://docs.rs/hyper/0.12.33/hyper/server/struct.Server.html#method.with_graceful_shutdown. But I have not actually tried it yet.

jebrosen on 2 Aug 2019

Just from the documentation, I'd agree it seems Hyper should be able to handle that. It _could_ also avoid the flag check on each request, depending on whether Hyper already sends a 503 in that situation.

A rocket.shutdown() method sounds sensible, there's no reason it should be limited to a key combo.

This sounds like a useful feature that many would like and wouldn't need a ton of code, so I'll take a look into implementing this, likely early next week.

jhpratt on 2 Aug 2019

An implementation of this (#1079) of this feature will land in the async branch soon. A ShutdownHandle can be retrieved before launch - e.g. in an attach fairing - via rocket.get_shutdown_handle() or in a route as a regular request guard. It provides one method, shutdown_handle.shutdown(), which will terminate the server with a success code. Additionally, Ctrl-C will be listened for by default and will shut down the server.

To support a server that can successfully terminate, launch() now returns a Result<(), Error>. Error is a new type that represents errors either at launch time or a fatal error while the server is running.

It works as intended in the main scenarios tested, but there may be some corner cases.

I'm going to leave this issue open, though, because there are a few things we should decide before release:

Making Ctrl-C handling optional. It is currently a whole (enabled by default) feature to itself, and it will likely be folded into another feature as we resolve some of choices around runtime. Should Ctrl-C handling be a cargo feature, a configuration option, a bool parameter to launch(), both, or perhaps something else?
Whether and how to prevent libraries from "injecting" shutdown into an application that may not want it. Being able to shut down the server programmatically should be opt-in or opt-out at the very least, and perhaps we can make it application-controlled in some way instead of available everywhere.

jebrosen on 8 Sep 2019

👍6 ❤4

I haven't played with the async branch yet but here are my thoughts.

On Ctrl-C Handling

I think rocket's configuration model needs to be iterated on a bit (#852). For now, SIGTERM handling configuration make sense as a rocket::Config option gated behind a cargo feature which is on by default so users can opt out of any dependencies this brings in if they really want to.

Injecting Shutdown

I don't completely understand how shutdown is done on the async branch. Isn't a ShutdownHandle provided at some point before or at startup? Isn't this enough to let users decide whether or not they want libraries to control shutdown by passing the ShutdownHandle to the library?

I can't think of a case where it makes sense for a library to be able to terminate the application. If the library wants the application to shut down, it should probably be communicated through an abstraction boundary which the user sets up handling for. I don't think this is a common enough case to justify allowing it automatically.

0xcaff on 22 Nov 2019

Yes, Ctrl-C handling needs to be reworked. I want it to be (part of?) a cargo feature because it depends on the tokio runtime, and it should also be a configuration option.

As for shutdown injection, I realized this morning that we're only talking about whether libraries should have access to graceful shutdown; libraries have unblockable access to ungraceful shutdown via exit, a signal to the current process, or by stalling the application via panics or infinite loops in an on_request fairing. From that perspective, I'm not so sure it's worth making it hard for libraries to get a ShutdownHandle if it makes it just as inconvenient for applications to get one or for Rocket to provide one.

jebrosen on 22 Nov 2019

I don't have an official source on this (cc @RalfJung @cramertj) but last I heard calling pthread_cancel is unsound: it can break safety of your program, depending on what the cancelled thread was doing.

So you'd have to abort the whole program, even if only in some of the cases (if you could even determine which cases are safe to cancel just the thread).

eddyb on 19 Feb 2020

pthread_cancel is unsound in general, yes. It's pretty easy to construct a situation where another thread still has a reference into this thread's stack (e.g. via rayon::join). Also it is unsound to deallocate memory without calling destructors (various things rely on this, e.g. scoped threads and pinning). Also see https://github.com/rust-lang/unsafe-code-guidelines/issues/211.

RalfJung on 19 Feb 2020

👍3

Thanks for pointing out those pitfalls of pthread_cancel. This is the current situation of the underlying shutdown mechanism(s), for clarity:

0.4: No shutdown support, and quite difficult to implement. You would instead have to cancel whole threads/processes, assuming that you can even soundly do so - and in most (maybe even all?) cases the answer is "no, it's not sound."
The async branch: ShutdownHandle can be used to request the server to shut down. This means no new connections are accepted, and hyper's implementation waits for all in-flight requests and responses to complete. Either of the following would "stall" a clean shutdown:
- Any route code that blocks for a long time without using something like tokio::spawn_blocking(). This causes issues beyond just shutdown and should be avoided anyway.
- Any "infinite" response, such as an SSE event stream or stream::repeat. This case should be solvable within Rocket but would need a decision on what to do - cancel all active response streams immediately, wait for a timeout, provide a mechanism to cooperatively stop responses, etc.

jebrosen on 19 Feb 2020

👍5

Any "infinite" response, such as an SSE event stream or stream::repeat. This case should be solvable within Rocket but would need a decision on what to do - cancel all active response streams immediately, wait for a timeout, provide a mechanism to cooperatively stop responses, etc.

I can think of one alternative, which would have a simpler API than my #1237 pull request:

A response could set a flag; let's call it cancel_on_shutdown. It would default to true, but built-in responders that are known to eventually exit (like File and Template) would set it to false. If cancel_shutdown is true, then it gets instantly canceled, but if it's set to false, Rocket will instead wait for it to finish.

That way, in-progress files and templates (which probably cover most boring websites) don't get torn, but SSE and WebSocket connections get abruptly cut off, which for extremely long-lived connections like this the client would need to handle anyway.

notriddle on 18 Mar 2020

👍2

i noticed that this is also causing problems when trying to run a rocket server in a containerised environment (kuberentes/openshift): when trying to scale down the pod openshift it will first send a SIGTERM and, after a grace period it'll send a SIGKILL. since rocket applications don't react on SIGTERM they always get killed which isn't exactly nice.
so this isn't just about someone manually pressing Ctrl+C, it's also about handling SIGTERM in general.

in my understanding with SIGTERM the server should stop accepting new requests/connections, finish the pending ones and then gracefully shut down (close open resources, flush outstanding buffers, etc.). i'm not sure how much of this can be automated (stopping to accept new connections should probably be fully handled in rocket, the rest will probably be callbacks to the app using rocket since it's maintaining the state?).

out of interest: it was mentioned here that this should all change once async lands in rocket (and it seems to be planned for 0.5). does anyone already have some insight on whether it will now be possible to implement this with the async version of rocket?

for reference here's the kubernetes documentation on pod termination: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods