Issue Template
Title: *Http2: Need connection policy to allow and control multiple upstream connections. *
Description:
This is related to https://github.com/envoyproxy/envoy/issues/7217.
Current default upstream connection scheme doesnt allow you to set a maximum number of streams allowed per connection. This causes upstream load balancing issues as some connections could carry more streams. For example: when upstream is a multiprocess server with shared socket, only few workers receive all the streams.
A connection policy that gives configurable parameters to manage maximum number of streams per upstream connections is needed to solve this problem.
Some implementation approaches:
Use current circuit breaking codepath and use the onPoolFailure method to add a case for maxStreamsOverflow.
Add another callback in Http::ConnectionPool::Callbacks to handle this case.
Essentially we need to :
@conqerAtapple I think what you are asking for is to allow the HTTP/2 connection pool to use more than 1 connection, and also to limit the maximum requests per connection, right? If so, this is something that we have talked about off and on for a long time (cc @alyssawilk @PiotrSikora).
My feeling is to add this capability to the HTTP/2 connection pool code that exists today. We can have a configuration that specifies some policy in terms of when to create a new connection in the pool. Perhaps this is the number of requests already on a connection, the maximum requests per connection, etc. and then we can bind new requests to a new HTTP/2 pool connection.
I think there are some details that need to be worked out here in terms of what knobs we want to provide the user on controlling the connection expansion (and of course connections should be capped via standard circuit breakers).
@conqerAtapple if this is what you want can you potentially rephrase the title/description of the issue and then propose a more detailed solution after looking at both the HTTP/1 and HTTP/2 connection pool as a reference?
@alyssawilk @PiotrSikora any further thoughts or comments on this?
Yeah, I'd definitely love to see the H2 connection pool handle multiple upstream connections, it's been long on our wish list as well.
cc @antoniovicente as interested in this feature as well.
@mattklein123 @alyssawilk Although I have the general idea of how this needs to be done, I will need some pointers initially on where to look etc in the code.
IMO, the general flow should be:
Active connections ( in active list)Active connection exceeds the configured policy's max allowed streams, create a new Active connection. Move the overflow Active connection to drainlist.drainlist)@conqerAtapple effectively you are going to make the HTTP/2 connection pool look more like the HTTP/1 connection pool, where you will have a list of active connections, and using "some algorithm" you will decide which connection to use. As I suggested above, I think we want this algorithm to probably be tunable and allow the user to specify the target max simultaneous requests per connection before a new connection is spun up. There is definitely more thought that needs to go into how this will work from an algorithmic perspective, so my suggestion is to think about this for a bit and propose a rough algorithm with what tuning knobs we want, then we can talk about the code changes. I can help with this also, but not until Monday. I'm sure @alyssawilk will have lots of good thoughts here also and I would like her to help with the policy/knob design.
@mattklein123 @alyssawilk @antoniovicente
I have started a document to go over the proposals and design details.
https://docs.google.com/document/d/1gzY1y8CdbvZZ_82y9Fb6kNXAlzOBC0VIeiuMdn_xCRs/edit#heading=h.zd70r91epk53
@conqerAtapple I left a few small comments but in general this is exactly what I was hoping for. I think this will be great so if you can implement that would be fantastic. @alyssawilk @antoniovicente and I can help review. Feel free to post a partial PR if you want a sanity check before working on all the tests.
@mattklein123 @alyssawilk @antoniovicente .Thanks for reviewing the doc. I will update the doc and add some more details this week.Can start on the initial implementation next week.
@mattklein123 @alyssawilk Had a question before I start the implementation: Looks like ProdClusterManagerFactory::allocateConnPool has the ability to fetch a Http1 connection pool if upstream.use_http2 feature is disabled. Isnt this what we would use if we just cared about 1 request/stream per upstream connection ?
Isnt this what we would use if we just cared about 1 request/stream per upstream connection ?
I think we are talking about 2 things here. If you want to use HTTP/1.1 just use HTTP/1.1. :)
Yes looking at the code in detail, the connection pool for Http/1 is coupled with its codec. I thought /was-hoping that i could use http1 connection pool policy with http2 codec. So yea i guess it wont work.Thanks!
Hi,
Just want to make sure, will this enhancement also be applicable to Original destination cluster?
Thanks a lot,
Wally
Just want to make sure, will this enhancement also be applicable to Original destination cluster?
Yes it should be once implemented.
Cross linking an experimental approach / PoC from Nighthawk for those who would like to have a look-see: https://github.com/envoyproxy/nighthawk/pull/200
That adds a relatively simple new pool implementation, which round robins over multiple vanilla H/2 pools under the hood to leverage > 1 active connections. In Nighthawk the number of pools would equal the configured number of --connections configured so semantics for that option are on par with H1.
Just want to make sure, will this enhancement also be applicable to Original destination cluster?
Yes it should be once implemented.
Thanks a lot for confirmation.
Is this still under active development? seems the pull request has been closed a month ago.
FYI I'm kinda playing around with refactoring the HTTP conn pools to share more code between HTTP/1 and HTTP/2, so hopefully that will make some incremental improvements that makes supporting this feature easier.
I have a PR to polish up and submit in the next few weeks, continuing where @conqerAtapple left off. @snowp we should sync up on if these will conflict.
Hi @ggreenway, welcome back! :) I was planning on working on this also, so happy to see that much interest in this. It might be worth syncing up at some point on plans along with @snowp
I was just kind of starting to pull things up into the base class + introduce a base class for the ActiveStream implementations. My anticipation was to have the base class manage the ready/busy lists, perhaps also introducing additional lists as necessary.
Looking at the PR it seems to be fairly compatible in that it attempts to use a similar set of lists to track the various states, but I imagine it would cause quite a bit of merge conflicts, so it might make sense for me to hold off on the refactors.
Here's a WIP commit that does some of the safer refactors: https://github.com/snowp/envoy/commit/a38c2151a9addf4dfebc4f1036165f2d133b946f
Hi,
Should we really pull up the ready/busy clients into base? Busy client should be http1.x specific, as it does not support concurrent request/streams, while this is not applicable to http2, so we would not have a busy client for h2.
In my mind you'd treat a H2 client as "busy" when you reach the threshold for maximum streams per upstream connection described in this issue. This means that the handling for HTTP/1.1 busy connections just becomes a special case (max streams = 1) of the HTTP/2 behavior
+1 to what @snowp said.
OK, I get it, here the max streams means max concurrent streams, not max total streams.
Right; there's max_requests_per_connection on the cluster, and http2_protocol_options .max_concurrent_streams in the h2 options. Although in the above description, I'd say the H2 client is "busy" if it has reached either of those thresholds. Basically, anytime it cannot take any more requests, it is "busy".
I think if max_requests_per_connection is reached, it should be moved to draining not busy.