Title: gRPC-web filter does not clear the route cache
Description:
source/extensions/filters/http/grpc_web/grpc_web_filter.ccchecks if the incoming request is a gRPC-web request and transforms it to gRPC proper. The transformation involves changing the content type toapplication/grpc. Unlike the JSON transcoding filter, gRPC-web one does not however clear the route cache making it harder to route the "upgraded" request the same way one would route a native gRPC request.
Envoy version: master @ 97d259d43a4f9b7c6d798804b4b2c9ffbd2d285d (02/11/2019)
Repro steps:
Define a route explicitly matching gRPC traffic:
...
"match": {
"prefix": "/",
"grpc": {}
},
"route": {
"cluster": "lo_svc_grpc",
"max_grpc_timeout": {
"seconds": 0
}
},
{
"match": {
"prefix": "/"
},
"route": {
"cluster": "lo_svc_http",
"timeout": {
"seconds": 60
}
}
}
...
Define the http filters in the following order:
...
"http_filters": [
{
"name": "envoy.grpc_web"
},
{
"name": "envoy.router"
}]
...
Send a gRPC-web request through Envoy
Expected: the request is upgraded to gRPC proper and routed to lo_svc_grpc
Actual: the request is upgraded to gRPC proper and routed to lo_svc_http (the cached route)
Why is this a bug?
1) JSON transcoding filter clears the cache and supports such routing - consistency issue.
2) The documentation states the filters are executed in order of their definition, since the router was defined as the last one, it should pick up the changes made by the previous filters.
3) The routing supports matching on gRPC requests and when the gRPC-web request reaches the router, it is a gRPC request at that point and should be routed as such.
Workaround:
Insert a route matching the initial request on gRPC-web explicitly:
{
"match": {
"prefix": "/",
"headers": [
{
"name": "content-type",
"prefix_match": "application/grpc-web"
}
]
},
"route": {
"cluster": "lo_svc_grpc",
"max_grpc_timeout": {
"seconds": 0
}
}
},
Proposed fix:
decoder_callbacks_->clearRouteCache();
after the line https://github.com/envoyproxy/envoy/blob/master/source/extensions/filters/http/grpc_web/grpc_web_filter.cc#L60
Agreed sounds like a bug, cc @lizan @fengli79
@mattklein123 Agree the user facing behavior sounds not right. Regarding the impl, shouldn't clearRouteCache() something transparent to filters? It sounds error prone if we need each filter to manage the route cache at some point to make the proxying right.
@mattklein123 Agree the user facing behavior sounds not right. Regarding the impl, shouldn't clearRouteCache() something transparent to filters? It sounds error prone if we need each filter to manage the route cache at some point to make the proxying right.
This is something that has come up before, and there is no great solution. Recomputing the route can be very expensive, and it's hard for the connection manager to know if a filter does something that might effect the route without a huge amount of work. I agree it's worth thinking about more, but for now we should just fix in the filter.
If it's very expensive, we should not clear it for each request.
Compare to lost efficiency and get such a route config be correct, I would prefer to get efficiency.
Is there a better place to support such routing? Like somewhere at config time?
after the line
https://github.com/envoyproxy/envoy/blob/master/source/extensions/filters/http/grpc_web/grpc_web_filter.cc#L60
Is there a better place to support such routing? Like somewhere at config time?
Sorry what do you mean?
I mean this is something should be resolved by the caching at config time. For example, if we can enhance the caching knows that a specific filter will produce a specific content-type, and this make the route cache able.
In this case, the user already has a workaround by defining a grpc mapping separately. If we change the filter to clear the cache, yes, this upgrade case will be supported at a performance cost, but for people who care about performance, they have no workaround anymore, they are always with a performance lost.
I would leave this to you to decide how important to support such upgrade case.
For this particular case, IMO we can add a flag into GrpcRouteMatchOptions, such as include_grpc_web so the route will match grpc-web request too before the filter, WDYT?
@lizan I don't think we should add any flag for this. It will be harder to deprecate them further.
@mattklein123 I list the potential solutions here:
@wenbozhu to find someone can actively work on this.
Fix the route caching and remove the clearRouteCache() API from filters. In this case, if the request is modified by a filter, maybe it should trigger a route cache miss and gets re-compute once.
I would love to discuss ways of doing this (I can think of some, basically wrapping all mutating things that a filter can do, etc.) but I don't think there is anything easy here.
I would suggest either doing (1) or (3) as an easy fix.
To give some context from user's perspective, say I have a service, I put Envoy as a reverse proxy in front of it and want to direct all traffic through Envoy.
For historical reasons, the service listens on multiple ports for different types of traffic (e.g. gRPC, regular HTTP), Envoy present an opportunity to unify this and get rid of this port per traffic type pattern.
In addition to that, I'd like to enrich the gRPC experience by allowing JSON HTTP/1.1 -> gRPC and gRPC-web -> gRPC. I believe this is not out of ordinary since gRPC-web alone is often insufficient (does not support ad-hoc use cases well like a simple curl).
Now to do that, one needs to:
1) Add gRPC-web and JSON gRPC transcoding filters (let's assume in this order)
2) Define routes to match on (in this order)
grpc and route to gRPC portcontent-type for application/grpc-web and route to gRPC portAs you may see, there's no need to rely on filters mutating the request nature and if fact it makes the routing more confusing, from where I stand I'd be perfectly fine to actually remove the ability for filters to clear the route cache, but I'd definitely want to be able to detect the incoming traffic during the routing the same way the filters do.
This means adding native traffic detectors like the grpc one to the Route Match config.
In this particular case it would mean that for gRPC-web a new route match option is added grpc-web which executes the same logic as the filter.
And for the transcoding one, a new route match option is added json-grpc which executes the filter logic.
(and ideally for most other filters using the same convention)
For the transcoding one, I'd make a change to require a custom header requesting the transcoding (x-envoy-json-grpc) because currently it is very aggressive and will attempt to match any non-gRPC content type HTTP request with the proto descriptor and execute the transcoding, which may clash with the native HTTP endpoint of the service.
TL;DR: enrich Route Match with more native detectors for various types of traffic, ideally bringing it to par with the filters supported and making filters more explicit about when they are triggered (json grpc) by using custom headers (or something else?). This should eliminate the need for the route cache invalidating filters altogether.
I know the docs mention this mutation as a feature but the justification seems very weak, if what I suggest is implemented (better traffic route match), one can just add an explicit match for the transcoding and direct it to the same cluster as gRPC if desired.
"grpc" in the route match is more like a shortcut for varies application/grpc content-type headers.
For grpc-web, did you try?
"match": {
"prefix": "/",
"headers": [{
"name": "content-type",
"prefix_match": "application/grpc-web"
}]
},
"grpc" in the route match is more like a shortcut for varies application/grpc content-type headers.
True, but it gives a valuable benefit of executing the same logic as the actual filter, one does not need to make sure it's up to date with the filter, that it matches all of the headers in the same way etc. It's a useful abstraction.
For grpc-web, did you try?
Yes, it's in the original post (under work around).
I'd rather not match on headers manually and be worried I did not get everything right, but have these mini-options letting me match the common traffic, potentially internally exposed by filters so it's shared logic. Case in point - I read the filter code to make sure I got the header matching right.
I just wanted to stress again that for the transcoding filter (and maybe some others) you can't even match on the content-type, there's simply nothing to match on, the filter is super aggressive about the transcoding.
@troshko111 transcoding filter does clear route cache by default as it changes path/method and mostly change the route, so you can just match the request as normal grpc.
Yes, that's exactly what I'm doing, I was saying that in the context of my comment about potentially getting rid of the cache invalidation and using explicit matching.
In both cases, I feel like the transcoding filter's behavior is not ideal as it tries to transcode everything non application/grpc which may potentially clash with pure HTTP endpoints on the upstream (if the user happens to define a conflicting mapping).
Let me know if the comment linked above is ambiguous, I'll try to clarify it. I was just offering an opinion on how to potentially improve the user experience and consistency of filters and routing.
"grpc" in the route match is more like a shortcut for varies application/grpc content-type headers.
True, but it gives a valuable benefit of executing the same logic as the actual filter, one does not need to make sure it's up to date with the filter, that it matches all of the headers in the same way etc. It's a useful abstraction.
Maybe even not true for grpc. For example, if you want to blacklist a specific buggy client version, etc. A clear spec/doc about how to identify grpc/grpc-web is simpler and more transparent.
For grpc-web, did you try?
Yes, it's in the original post (under work around).
Sorry, I missed that part. I would suggest to use that work around as the official way to match grpc-web traffics. Clear the route cache doesn't sound a meaningful fix anymore, due to the recompute cost.
I'd rather not match on headers manually and be worried I did not get everything right, but have these mini-options letting me match the common traffic, potentially internally exposed by filters so it's shared logic. Case in point - I read the filter code to make sure I got the header matching right.
I just wanted to stress again that for the transcoding filter (and maybe some others) you can't even match on the content-type, there's simply nothing to match on, the filter is super aggressive about the transcoding.
For the transcoding one, I'd make a change to require a custom header requesting the transcoding (
x-envoy-json-grpc) because currently it is very aggressive and will attempt to match any non-gRPC content type HTTP request with the proto descriptor and execute the transcoding, which may clash with the native HTTP endpoint of the service.
In both cases, I feel like the transcoding filter's behavior is not ideal as it tries to transcode everything non
application/grpcwhich may potentially clash with pure HTTP endpoints on the upstream (if the user happens to define a conflicting mapping).
Feel free to open an PR as long as requiring header is opt-in thus not breaking existing users. I don't feel much value as it is a config problem in general, and if you have a proto descriptor defined transcoding clashes with the native HTTP endpoint, you will have your own problem anyway.
Transcoding filter try to match incoming request to proto descriptor but if it doesn't, it is basically no-op.
@mattklein123 @lizan any news on this? Or a choice of decision to implement?