Akka-http: Deal better with backpressure on the server-side

Created on 19 Oct 2016 · 6Comments · Source: akka/akka-http

Right now, it's only through the max-connections settings that "backpressure" is applied). This is the simplest solution but not really the best possible. Here are a few problems that don't have any good solutions yet:

There are both the stream of connections and the stream of requests on a single connection. How to reconcile backpressure between both of these? I.e. there's a conflict between answering another request on an old persistent connection vs. closing that connection after a while vs. accepting new connections with new requests.
How to actually measure load in a meaningful way? E.g. in many cases, the main part of request handling time might be spent waiting for asynchronous requests to external systems to complete. During that time the only resources that are used is memory for keeping the context of the request and the connection. In other cases, request handling might be CPU bound. Keeping a simple counter of "currently handled requests" might be the simplest solution but may lead to underutilization of the system.
What should be the rules of letting an incoming connection wait for a little while longer (which might fill the accept backlog and reject TCP connection attempts for further TCP connections) vs. instantly taking the connection and sending out an 503 response if it cannot be given to the downstream handler.

Also the current default of accepting 1024 connections (= 1024 requests) concurrently might be overly optimistic. Many applications should probably start back-pressuring sooner than that. Also if most of those are idle persistent connections it might not be a problem. On the other hand might 1024 idle persistent connections prevent any other users from also connecting.

1 - triaged discuss server core

Source

jrudolph

Most helpful comment

Hi @ktoso , let me give you a bit of context here: we have an CPU-intensive app and we are using akka-http to expose the API. Behind the scenes we have actors doing the processing. We use onComplete(someActor ? myRequest) to process the incoming requests asynchronously in our route. Our issue is that since it takes some time for futures to complete, the number of futures keeps increasing as requests arrive, reaching a point where things do not work anymore.

The solution we are considering is pretty naive: have a counter of 'active' requests and a threshold (using an AtomicLong or something similar). If we reach the threshold, we reject the request with 503 Service Unavailable. Obviously this is not ideal or optimal but it will backpressure somehow. Ideally this would happen based on CPU load.

Is there any other solution we could implement here? Is there any way of pipelining all incoming requests to a server route to an akka stream that will handle backpressure by itself (so we can call our actors using mapAsync)?

Maybe we are missing something obvious here, so any help is appreciated.

hveiga on 27 Jun 2017

👍8

All 6 comments

Came across this when thinking about ways to implement API rate limiting (which needs to kick in _after_ request headers are parsed, to know which "user" it is). This means the connection is already accepted, so it would need to happen near the actual Flow that handles the request.

One thought I had was to extend bindAndHandle to take a BidiFlow[HttpRequest,HttpRequest,HttpResponse,HttpResponse], and then internally drop that into fuseServerFlow as an additional step before the handler. The BidiFlow would then keep state across materializations, in order to do the rate limiting, e.g. stalling or failing requests that don't match the allowance.

I actually think the existing load figures are quite OK the way they are... and on a highly-loaded CPU-bound system, I'd rather have the kernel drop SYN packets than having to render a proper 503 for each overloaded request. That's more of a role for a proxy.

jypma on 17 Nov 2016

The BidiFlow would then keep state across materializations, in order to do the rate limiting, e.g. stalling or failing requests that don't match the allowance.

How would it keep state across materializations without synchronization and blocking? In any case, under load this could get quite contented so we need to make sure it doesn't introduce a new bottleneck.

actually think the existing load figures are quite OK the way they are... and on a highly-loaded CPU-bound system, I'd rather have the kernel drop SYN packets than having to render a proper 503 for each overloaded request.

I think these are different kinds of concerns. We can drop SYN packets by putting backpressure on the connection stream. We might also have already accepted some requests which we then need to deal with. So, having multiple guards in place would still make sense.

jrudolph on 18 Nov 2016

Hi guys, has there been any update on this?

hveiga on 27 Jun 2017

Do you have an immediate need for it or just asking?