We were reviewing the change log of v1.15.0 and we noted some changes that could improve our rate limiting setup. We decided to spike one the new feature, and make sure we understand it correctly:
When we first read that line and the documentation linked, we though that it would make our life much more easier. Let me explain succinctly what we mean by that.
We are mapping a path of a request to an operation identifier for reasons (think for example if your request calls with PUT "/admin/user" path, we want to associate it to the operation identifier "create-user"), and then we use this operation identifier in Envoy’s rate limit filter.
To configure that we are doing quite a “pasodoble”. We add temporally, a header with the operation identifier for the route that matches the path. This header will be matched by the rate limit action (header matcher) configuration, and passed as a descriptor key to the gRPC call of the rate limit service. Now to add the header that contains the operation identifier. We use HTTP Lua filters to add and remove an HTTP header on the request. So when we saw that change, we though we could simplify our configuration. Joy was in our heart.
We though we could set metadata at the route level with filter_metadata, and use this metadata to fill the operation descriptor key with the dynamic_metadata rate limit action like the following:
virtual_hosts:
- name: backend
domains:
- "*"
rate_limits:
- actions:
dynamic_metadata:
descriptor_key: "operation"
metadata_key:
key: "envoy.filters.http.ratelimit"
path:
- key: "descriptor"
- key: "operation"
- actions:
request_headers:
header_name: "x-auth-client"
descriptor_key: "client"
routes:
- name: example-operation-1
match:
prefix: "/anything/shouldRateLimit"
metadata:
filter_metadata:
envoy.filters.http.ratelimit:
descriptor:
operation: "operation-1"
route:
cluster: "httpbin"
include_vh_rate_limits: true
- name: default-route
match:
prefix: "/"
route:
cluster: httpbin
include_vh_rate_limits: false
We tested this rate limit configuration approach on a vanilla Envoy setup, eliminating anything related to our internal setup. We use some Envoy use case examples repository to “learn Envoy's core feature set and how to configure it by hand”. You can find the envoy configuration corresponding to the extract above, and a small docker-compose repository to reproduce what we mean.
At our surprised the description key "operation" was not filled when Envoy rate limit filter was calling the rate limit service mock.
In a previous troubleshooting investigation, we learn to successfully setup remote gdb debugging of an Envoy binary with debug symbol in our local docker-compose world. We decided to have a look at what was happening in Envoy code base in a live debugging session.
In source/common/router/router_ratelimit.cc, we could displayed the values of the configuration of the rate limit action in metadata_key_ and the expected metadata of filter_metadata in dynamic_metadata. We observed that no metadata was present in dynamic_metadata (it was empty). Something was wrong about how we configured the filter_metadata.
Our next step in our troubleshooting was to have a look at the integration tests of Envoy code base, and how they configure dynamic metadata. We find this is often a good place to look since it uses some YAML configuration samples. We could not find anything obvious that we were doing wrong, literally copying and pasting the YAML used there. We think that those integration tests tricked us, they are using filter_metadata, but we think they should not, more on this later.
Our last procrastination, before asking for help publicly, was to review Envoy slack channel history and prior GitHub issues. That’s the minimum you can do. You don’t want to waste people’s time. That’s when we found a crucial clue to this story. We found a discussion in Envoy slack that was referring to this GitHub issue. Some user had problem with metadata set by filter_metadata in a route configuration that was not to be found in the access log filter. Something almost identical to our scenario but with the rate limit filter instead of access log filter. Especially the comments on the GitHub issue triggers the idea that route metadata is potentially different than dynamic metadata.
We think that the route metadata set by filter_metadata are different "data" than the dynamic metadata. They are different abstraction. We understand that route metadata is used, for example, by Lua HTTP filters. We knew that for sure because we already using it in our approach that add and remove operation header with Lua filters. If we are correct, the documentation is not really clear with the distinction (and we are not alone to be confused). We found the best explanation resided in the HTTP Lua filters section, they are the rare place where you can set both route metadata and dynamic metadata, so the separation is flagrant.
But the change adds support for dynamic metadata as a rate limit action, not route metadata. Fair enough, how can we set dynamic metadata for a route, we were asking our-self. From the comments of Dio in the GitHub issue, we understand there are several ways: we can implement an Envoy filter that will match a route and use the StreamInfo API, use the header to metadata filter (but that would mean we would need again header, that is not a big win for us, remember we are trying to get ride of that), or use directly HTTP Lua filters (same, not a big win).
At this point, we were deeply confused about this dynamic metadata. Even more confused by the change, why would somebody add support of dynamic metadata as rate limit action, because in our mind, most of the action in Envoy are driven by a HTTP route. That did not make sense. We were biased obviously. We decided to take a step back. Have a little walk, and brew some coffee. We asked our-self, what is really dynamic metadata? For which purposes people use dynamic metadata?
We looked at the PR that bring the change, and found that it originated from this proposal. When we read the description, we understood the intent of the change. The users wanted to pass information directly from the dynamic metadata to the rate limit action. They wanted to avoid using a Lua filter that will read the dynamic metadata and append it to a forged header that will be matched by a rate limit action. The key element there is they have already the information in the dynamic metadata. No mention of route here to set some dynamic metadata.
Dynamic metadata is actually a great idea (you folks rocks). We found this advanced section in the documentation that completed our understanding. We understand that filters can emit dynamic metadata that are consumed by other filters. Take for example, the network Postgres filter. It will set the table names and the SQL operations as dynamic metadata, seen in the SQL statement. That is fantastic, now you can have for example a rate limit action that will found the table name in the dynamic metadata and you can have a rate limit based on this descriptor key.
Sorry for the long context setup, but we think it helps to understand where we come from. Our question is more or less to confirm or infirm our understanding detailed in the previous paragraphs of what is dynamic metadata? And the difference between (route) metadata and dynamic metadata?
cc @clarakosi @Pchelolo
Hi @jeremybaumont
You are correct in your understanding. dynamic_metadata != filter_metadata. They reuse the same structure internally, thus the confusing integration test. In my understanding the difference is that filter_metadata is configured statically while the dynamic_metadata can be set/modified within each request context by the filters.
Our original goal was to implement support for dynamic_metadata since in our use-case we are getting the descriptors our of JWT token and JWT Authentication filter, which sets the JWT payload to the dynamic metadata.
In your case, what prevents you from configuring the rate_limits on the route level instead of the virtual host level? That way you would be able to just use the GenericKey action and configure the operation-id directly without passing it around via the metadata? Although, this has the downside of being much more verbose.
It seems pretty easy to implement a filter_metadata action type, lay implementing a new RateLimitAction, filter_metadata following the existing MetaDataKey schema:
RateLimitAction.proto:
// The following descriptor entry is appended when the filter metadata contains a key value:
//
// .. code-block:: cpp
//
// ("<descriptor_key>", "<value_queried_from_metadata>")
message FilterMetaData {
// The key to use in the descriptor entry.
string descriptor_key = 1 [(validate.rules).string = {min_bytes: 1}];
// Metadata struct that defines the key and path to retrieve the string value.
type.metadata.v3.MetadataKey metadata_key = 2 [(validate.rules).message = {required: true}];
// An optional value to use if *metadata_key* is empty. If not set and
// no value is present under the metadata_key then no descriptor is generated.
string default_value = 3;
}
oneof action_specifier {
...
FilterMetaData filter_metadata = 8;
}
Usage example:
actions:
- { filter_metadata: { key: “…”, path: [] } }
This would make it possible to pass information about routes to a virtual-host level rate-limit configuration.
Hey @Pchelolo
Thank you for the quick answer, we really appreciate :smile:
what prevents you from configuring the rate_limits on the route level instead of the virtual host level?
That way you would be able to just use the GenericKey action and configure the operation-id directly without passing it around via the metadata?
Yes we have considered that at the beginning. The problem with GenericKey rate limit action is that we cannot choose the descriptor key name. We would need to ask the rate limit service team to support the mapping between the descriptor key "generic_key" and "operation" (in the gRPC service but also in the rate limit policies definitions services.). It could be also a bit confusing when you troubleshoot issues if you are not aware of the mapping. Nevertheless, I agree that is not the end of the world and that is a valid approach.
We could add a functionality to the GenericKey rate limit action to be able to set the descriptor key, and by default it uses "generic_key" for backward compatibility.
It seems pretty easy to implement a filter_metadata action type, lay implementing a new RateLimitAction, filter_metadata following the existing MetaDataKey schema.
Yes, that is a good idea, I agree. I am a wannabe C++ programmer :smile: but that could be a good way to learn, that seems feasible. I'll pursue this option.
I am trying to do the same as the OP doing and was struggling for a couple of days to achieve it setting the route metadata and using the dynamic metadata action for rate limiting. Thank you @jeremybaumont for writing this long story about it to point me in the Lua filter way. :+1:
Also I would like to add that implementing a FilterMetadataAction as @Pchelolo suggested would simplify this use case even more by eliminating the Lua filter.
For us the route specific actions would be really verbose and brittle as we have many VirtualServices in Istio which should produce the same composit descriptor structure: API+path+method+clientIP where API is specific to the VirtualService. So for every route we would need to provide the same actions list, where the only difference is the API value.
I've been trying to setup an override limit but i can't seems to find anyplace where it possible configure dynamic metadata, even statically. I don't quite understand why you're forced to pass through this very new and under documented dynamic metadata for something that should be more easily configurable.
This is what my cfg looks like. Maybe i missed something...
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.http_connection_managerenvoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
stat_prefix: envoy_ratelimit_http
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: default
timeout: 0.02s # 20 ms
rate_limited_as_resource_exhausted: true
failure_mode_deny: true
enable_x_ratelimit_headers: DRAFT_VERSION_03
rate_limit_service:
transport_api_version: v3
grpc_service:
envoy_grpc:
cluster_name: ratelimit
- name: envoy.filters.http.router
route_config:
name: default_route
# config.route.v3.RouteConfiguration
virtual_hosts:
# config.route.v3.VirtualHost
- name: default
domains: ["*"]
routes:
# config.route.v3.Route
- match:
prefix: "/"
metadata:
filter_metadata:
envoy.filters.http.ratelimit.override:
limit:
unit: HOUR
requests_per_unit: 42
route: # config.route.v3.RouteAction
cluster: upstream
rate_limits:
# config.route.v3.RateLimit
- actions: # config.route.v3.RateLimit.Action[]
- destination_cluster: {}
- remote_address: {}
- generic_key:
descriptor_key: hardcoded
descriptor_value: hard-value
# config.route.v3.RateLimit
- actions:
- destination_cluster: {}
- generic_key:
descriptor_key: max_rate
descriptor_value: 'true'
limit:
dynamic_metadata:
metadata_key:
key: envoy.filters.http.ratelimit.override
path:
- key: limit
clusters:
- name: upstream
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: hello-secret
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: hello-secret
port_value: 8080
- name: ratelimit
type: STRICT_DNS
connect_timeout: 1s
lb_policy: ROUND_ROBIN
protocol_selection: USE_CONFIGURED_PROTOCOL
http2_protocol_options: {}
load_assignment:
cluster_name: ratelimit
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: hermes
port_value: 9090
You are defining your limits in the route metadata section, but that is different from dynamic_metadata. You need a Lua filter before your rate limit filter to fetch the route metadata and inject it as dynamic metadata for the request.
Something similar to the following. We are using it to provide input for actions, but the concept is the same. It is a workaround until the generic MetadataAction is not released:
- name: envoy.filters.http.lua
typed_config:
"@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua"
inline_code: |
function envoy_on_request(request_handle)
local api_group = request_handle:metadata():get("api_group");
api_group = (api_group == nil) and 'undefined' or api_group;
request_handle:logDebug(" api_group filter metadata: " .. api_group);
request_handle:streamInfo():dynamicMetadata():set("envoy.rate_limit", "api_group", api_group);
end
Probably the override configuration could be enhanced similarly how the generic MetadataAction was introduced in favor of the DynamicMetadataAction.
Thanks a lot! I succeeded to make it work by adding this in my filters before my rate limit, pasting for posterity. Might i suggest a new feature for either a new filter type to set some dynamic metadata or to enable access to filter_metadata in the limit override?
- name: envoy.filters.http.lua
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
inline_code: |
function envoy_on_request(request_handle)
request_handle:streamInfo():dynamicMetadata():set("envoy.filters.http.ratelimit.override", "limit", {
unit = "HOUR",
requests_per_unit = 42
})
end
Great that the workaround works for you!
to enable access to filter_metadata in the limit override
Probably the override configuration could be enhanced similarly how the generic MetadataAction was introduced in favor of the DynamicMetadataAction.
My proposal is exactly the suggestion you made, just in a bit more formalized form.
Most helpful comment
Hi @jeremybaumont
You are correct in your understanding.
dynamic_metadata!=filter_metadata. They reuse the same structure internally, thus the confusing integration test. In my understanding the difference is that filter_metadata is configured statically while the dynamic_metadata can be set/modified within each request context by the filters.Our original goal was to implement support for
dynamic_metadatasince in our use-case we are getting the descriptors our of JWT token and JWT Authentication filter, which sets the JWT payload to the dynamic metadata.In your case, what prevents you from configuring the
rate_limitson the route level instead of the virtual host level? That way you would be able to just use theGenericKeyaction and configure the operation-id directly without passing it around via the metadata? Although, this has the downside of being much more verbose.It seems pretty easy to implement a
filter_metadataaction type, lay implementing a newRateLimitAction,filter_metadatafollowing the existing MetaDataKey schema:RateLimitAction.proto:
Usage example:
This would make it possible to pass information about routes to a virtual-host level rate-limit configuration.