Title: Is there a way to limit the connections based on incoming IP or the request's SNI?
Description:
Is there a way to use the network local/global rate limiter to limit the number of connections based on their incoming IP address, destination cluster or SNI? So far, the descriptors for network level RLS seem to be populated with the static values specified in the config.
Is there a way to get the dynamically substituted descriptors for the network level rate limiter? Or is there a way to achieve the above with the network local rate limiter? From what I understand, the SNI info should be available after the downstream TLS inspection filter. I'm looking for something similar to the functionality provided by NGINX: https://docs.nginx.com/nginx/admin-guide/security-controls/controlling-access-proxied-http/#limit_conn
[optional Relevant Links:]
At L7 I think the router can do this. For L4 I don't think that works out of the box, but I could be wrong. @junr03 any thoughts on this?
Yes, even I wonder how nginx might have implemented this functionality. Is there an L7 filter capable of closing the connection after determining that the request doesn't satisfy the configured connection limit?
L7 filters typically operate on requests rather than connections. I think there is a real need for Envoy to add a listener filter rate limiter (based on some of my anecdotal experience); this would be able to reject very early, close to TCP accept, and leverage SNI information potentially as well.
Thanks @htuch. A follow up question, let's say I'm implementing such a filter and want to support different limits for different SNI values. Do you have any recommendation on how those config limits should be specified? In an http filter, I could leverage a per route config (at which point it probably doesn't even need the SNI or other request info since the route matching will take care of giving the right limits). But looking for some best practices for providing such a config-map for the listener and network filters.
If it's just SNI, I think a proto3 map would be fine, mapping SNI name to rate limits. But, you might want to have other features too. You might want to look at how FilterChainMatch works for inspiration on the most general case. It's probably fine to start with just SNI, but you might want to leave room to grow towards FilterChainMatch parity in the future. Are you building a local rate limiter or RLS?
I plan to build a local rate limiter unless there's a way to pass the SNI info in descriptors for the global rate limiter (which was the intention of my original question).
I plan to build a local rate limiter unless there's a way to pass the SNI info in descriptors for the global rate limiter (which was the intention of my original question).
This is not supported today (IIRC) but it would be very easy to add.
I see. I'm also concerned about the performance impact of the global limiter and wonder if doing it locally might be cheaper. Our goal is it to actually protect the ingress envoy instances (not the upstream) from overuse and attacks.
If you are going to build it into a filter, I would look at the existing L4 local rate limiting filter: https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/network_filters/local_rate_limit_filter. You could either use the existing filter chain match to match on SNI and then get a different filter chain (rate limit) per chain, or, you could potentially add matching config into the rate limit filter itself. If we do that we should do it with the generic matching functionality that @snowp is working on.
Arguably there is value when doing connection limiting on matching before the network filter chain, since you can avoid a TLS handshake when rejecting. I mention this as I've seen real world situations in which that would pay off.
Thank you. Those are great points. We did consider putting SNI based filter chain match criteria actually. The concern is that it will cause the entire listener to be recreated anytime a filter chain is added/modified via LDS api and that might not be good for performance. Is that a valid concern? Or is the listener transition process fairly optimized and doesn't have a significant perf impact?
We are also thinking to leverage IPTables to rate limit before even entering userspace. But it's based on source ip and we need something based on destination/SNI.
Regarding the L4 local rate limiter, how is it different from using this config https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/runtime? I'm guessing the filter enforces rate of new connections while the config is a flat max number of connections.
Arguably there is value when doing connection limiting on matching before the network filter chain, since you can avoid a TLS handshake when rejecting. I mention this as I've seen real world situations in which that would pay off.
It's counter-intuitive (and somewhat related to @ggreenway comments on merging transport sockets, listener filters, and networking filters), but network filters can still rate limit pre-TLS, because the "on new connection" event happens immediately and before the handshake completes.
Regarding the L4 local rate limiter, how is it different from using this config https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/runtime? I'm guessing the filter enforces rate of new connections while the config is a flat max number of connections.
Correct.
I think of you are concerned about filter chain scalability, thinking about adding matching into the L4 ratelimit filter is probably the way to go.
i think this is related https://github.com/envoyproxy/envoy/issues/6502
I have a confusion reading the code for the L4 local rate limit filter. Let's say we do use the SNI based filter chain match. That means the local rate limit filter's limits will be specific to that SNI. But the actual token bucket is inside the shared config which seems to be shared across all instances of the filter. Does that mean the token bucket used is the same one irrespective of the filter chain, or does a separate shared config object gets created for each filter chain?
The config object is shared by all instances of that filter in that filter chain. A filter in a different filter chain has it's own config.
Thanks @ggreenway. I guess that means that the filter's instances for different connections will share the config object as long as they share the same SNI (with SNI based filter chain match). Is this statement correct? My confusion stems from the relation between filter chain and filter chain instance. If it's 1:1, then this all makes sense.
You could either use the existing filter chain match to match on SNI and then get a different filter chain (rate limit) per chain, or, you could potentially add matching config into the rate limit filter itself. If we do that we should do it with the generic matching functionality that @snowp is working on.
I'm leaning towards using the filter chain match based on SNI after learning that filter chain only update don't affect the entire listener but only the filter chains updated.
Follow up question: is an update to a filter considered a filter chain update? Let's say you update the token bucket config for the local rate limit filter, will it cause a connection drain for that filter chain?
Follow up question: is an update to a filter considered a filter chain update? Let's say you update the token bucket config for the local rate limit filter, will it cause a connection drain for that filter chain?
Yes. A better long term solution is going to be to implement ECDS for network filters. This is needed for a lot of different scenarios so I'm sure someone will implement this at some point but I'm not sure when. cc @kyessenov
What about http filters? Is it the same for them since http connection mgr is a network filter and any update to http filters will construe as an update to the http connection network filter?
Yes, but HTTP filters already support ECDS if you can make use of that.
Is there an existing way to limit the max number of active connections (not the rate of connections) per unique SNI i.e. per filter chain? I could only find a way to limit the number of active connections either at listener level or at global level but not at filter chain level.
Most helpful comment
It's counter-intuitive (and somewhat related to @ggreenway comments on merging transport sockets, listener filters, and networking filters), but network filters can still rate limit pre-TLS, because the "on new connection" event happens immediately and before the handshake completes.
Correct.
I think of you are concerned about filter chain scalability, thinking about adding matching into the L4 ratelimit filter is probably the way to go.