Envoy: Http2: Need connection policy for maximum streams per upstream connection

Created on 26 Jun 2019 · 24Comments · Source: envoyproxy/envoy

Issue Template

Title: *Http2: Need connection policy to allow and control multiple upstream connections. *

Description:

Current default upstream connection scheme doesnt allow you to set a maximum number of streams allowed per connection. This causes upstream load balancing issues as some connections could carry more streams. For example: when upstream is a multiprocess server with shared socket, only few workers receive all the streams.

A connection policy that gives configurable parameters to manage maximum number of streams per upstream connections is needed to solve this problem.

Some implementation approaches:

Use current circuit breaking codepath and use the onPoolFailure method to add a case for maxStreamsOverflow.
Add another callback in Http::ConnectionPool::Callbacks to handle this case.

Essentially we need to :

Enqueue the stream if current client stream count exceeds the configured maximum
Create a new connection
Process the enqueued stream/request in the context of the new connection.

arehttp enhancement help wanted

Source

conqerAtapple

All 24 comments

@conqerAtapple I think what you are asking for is to allow the HTTP/2 connection pool to use more than 1 connection, and also to limit the maximum requests per connection, right? If so, this is something that we have talked about off and on for a long time (cc @alyssawilk @PiotrSikora).

My feeling is to add this capability to the HTTP/2 connection pool code that exists today. We can have a configuration that specifies some policy in terms of when to create a new connection in the pool. Perhaps this is the number of requests already on a connection, the maximum requests per connection, etc. and then we can bind new requests to a new HTTP/2 pool connection.

I think there are some details that need to be worked out here in terms of what knobs we want to provide the user on controlling the connection expansion (and of course connections should be capped via standard circuit breakers).

@conqerAtapple if this is what you want can you potentially rephrase the title/description of the issue and then propose a more detailed solution after looking at both the HTTP/1 and HTTP/2 connection pool as a reference?

@alyssawilk @PiotrSikora any further thoughts or comments on this?

mattklein123 on 26 Jun 2019

Yeah, I'd definitely love to see the H2 connection pool handle multiple upstream connections, it's been long on our wish list as well.

cc @antoniovicente as interested in this feature as well.

alyssawilk on 26 Jun 2019

@mattklein123 @alyssawilk Although I have the general idea of how this needs to be done, I will need some pointers initially on where to look etc in the code.
IMO, the general flow should be:

Have a list of Active connections ( in active list)
When a Active connection exceeds the configured policy's max allowed streams, create a new Active connection. Move the overflow Active connection to drainlist.
On stream close, drain its connection (in the drainlist)
If the drain connection has no active streams, close the connection.

conqerAtapple on 26 Jun 2019

@conqerAtapple effectively you are going to make the HTTP/2 connection pool look more like the HTTP/1 connection pool, where you will have a list of active connections, and using "some algorithm" you will decide which connection to use. As I suggested above, I think we want this algorithm to probably be tunable and allow the user to specify the target max simultaneous requests per connection before a new connection is spun up. There is definitely more thought that needs to go into how this will work from an algorithmic perspective, so my suggestion is to think about this for a bit and propose a rough algorithm with what tuning knobs we want, then we can talk about the code changes. I can help with this also, but not until Monday. I'm sure @alyssawilk will have lots of good thoughts here also and I would like her to help with the policy/knob design.

mattklein123 on 27 Jun 2019

@mattklein123 @alyssawilk @antoniovicente

I have started a document to go over the proposals and design details.

https://docs.google.com/document/d/1gzY1y8CdbvZZ_82y9Fb6kNXAlzOBC0VIeiuMdn_xCRs/edit#heading=h.zd70r91epk53

conqerAtapple on 28 Jun 2019

👍1

@conqerAtapple I left a few small comments but in general this is exactly what I was hoping for. I think this will be great so if you can implement that would be fantastic. @alyssawilk @antoniovicente and I can help review. Feel free to post a partial PR if you want a sanity check before working on all the tests.

mattklein123 on 1 Jul 2019

@mattklein123 @alyssawilk @antoniovicente .Thanks for reviewing the doc. I will update the doc and add some more details this week.Can start on the initial implementation next week.

conqerAtapple on 1 Jul 2019

@mattklein123 @alyssawilk Had a question before I start the implementation: Looks like ProdClusterManagerFactory::allocateConnPool has the ability to fetch a Http1 connection pool if upstream.use_http2 feature is disabled. Isnt this what we would use if we just cared about 1 request/stream per upstream connection ?

conqerAtapple on 8 Jul 2019

Isnt this what we would use if we just cared about 1 request/stream per upstream connection ?

I think we are talking about 2 things here. If you want to use HTTP/1.1 just use HTTP/1.1. :)

mattklein123 on 8 Jul 2019

Yes looking at the code in detail, the connection pool for Http/1 is coupled with its codec. I thought /was-hoping that i could use http1 connection pool policy with http2 codec. So yea i guess it wont work.Thanks!

conqerAtapple on 8 Jul 2019

Hi,

Just want to make sure, will this enhancement also be applicable to Original destination cluster?

Thanks a lot,
Wally

wallyatgithub on 7 Nov 2019

Just want to make sure, will this enhancement also be applicable to Original destination cluster?

Yes it should be once implemented.

mattklein123 on 7 Nov 2019

Cross linking an experimental approach / PoC from Nighthawk for those who would like to have a look-see: https://github.com/envoyproxy/nighthawk/pull/200

That adds a relatively simple new pool implementation, which round robins over multiple vanilla H/2 pools under the hood to leverage > 1 active connections. In Nighthawk the number of pools would equal the configured number of --connections configured so semantics for that option are on par with H1.

oschaaf on 19 Nov 2019

Just want to make sure, will this enhancement also be applicable to Original destination cluster?

Yes it should be once implemented.

Thanks a lot for confirmation.
Is this still under active development? seems the pull request has been closed a month ago.

wallyatgithub on 20 Nov 2019

FYI I'm kinda playing around with refactoring the HTTP conn pools to share more code between HTTP/1 and HTTP/2, so hopefully that will make some incremental improvements that makes supporting this feature easier.

snowp on 14 Dec 2019

I have a PR to polish up and submit in the next few weeks, continuing where @conqerAtapple left off. @snowp we should sync up on if these will conflict.

ggreenway on 14 Dec 2019

Hi @ggreenway, welcome back! :) I was planning on working on this also, so happy to see that much interest in this. It might be worth syncing up at some point on plans along with @snowp

mattklein123 on 14 Dec 2019

I was just kind of starting to pull things up into the base class + introduce a base class for the ActiveStream implementations. My anticipation was to have the base class manage the ready/busy lists, perhaps also introducing additional lists as necessary.

Looking at the PR it seems to be fairly compatible in that it attempts to use a similar set of lists to track the various states, but I imagine it would cause quite a bit of merge conflicts, so it might make sense for me to hold off on the refactors.

Here's a WIP commit that does some of the safer refactors: https://github.com/snowp/envoy/commit/a38c2151a9addf4dfebc4f1036165f2d133b946f

snowp on 14 Dec 2019

Hi,

Should we really pull up the ready/busy clients into base? Busy client should be http1.x specific, as it does not support concurrent request/streams, while this is not applicable to http2, so we would not have a busy client for h2.

wallyatgithub on 20 Dec 2019

In my mind you'd treat a H2 client as "busy" when you reach the threshold for maximum streams per upstream connection described in this issue. This means that the handling for HTTP/1.1 busy connections just becomes a special case (max streams = 1) of the HTTP/2 behavior

snowp on 20 Dec 2019

+1 to what @snowp said.

mattklein123 on 20 Dec 2019

OK, I get it, here the max streams means max concurrent streams, not max total streams.

wallyatgithub on 23 Dec 2019

Right; there's max_requests_per_connection on the cluster, and http2_protocol_options .max_concurrent_streams in the h2 options. Although in the above description, I'd say the H2 client is "busy" if it has reached either of those thresholds. Basically, anytime it cannot take any more requests, it is "busy".

ggreenway on 23 Dec 2019

I think if max_requests_per_connection is reached, it should be moved to draining not busy.

wallyatgithub on 24 Dec 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings