When using Envoy as a general egress proxy, there doesn't seem to be a way to have http_connection_manager routes send to a cluster and have the :authority header be used to select the destination. I expected a cluster with type: ORIGINAL_DST lb_policy: ORIGINAL_DST_LB to do this, but it seems original_dst is actually something else (iptables level?).
http_connection_manager supports routing to different _clusters_ based on a header (RouteAction::cluster_header), but we can't plumb this to the cluster level unless each hostname somehow dynamically generated a cluster.
I'm not sure what to do about DNS in this case -- the process behind Envoy can't do the resolution because it doesn't have a network, but having Envoy resolve would violate the documented behavior "Envoy never synchronously resolves DNS in the forwarding path". Presumably ORIGINAL_DST doesn't worry about this if its forwarding is IP-level.
@jmillikin-stripe I think you are asking for the feature discussed in this thread? https://groups.google.com/forum/#!topic/envoy-dev/xVvy3Q26VNM
Yes, it looks like that thread is about pretty much the same feature. I see the DNS resolution issue was already being considered. As to your many-cluster-vs-single-cluster question there, we definitely want (2). Not just for simplicity of implementation, but also because it prevents unbounded growth of metric tags.
OK. I meant to open an issue to track that thread so here we are. This feature is not easy to add but I think will be useful. I will mark this as "help wanted." If someone wants to work on it please let me know and we can go into the details.
Retitled from: Support something like ORIGINAL_DST but based on the :authority header
FYI unless this somehow gets picked up soon I may do this as my "holiday programming project."
This would be amazing feature to have
Enough people have asked for this (including at Lyft) that I'm going to implement this. My rough plan is a trio of HTTP filter, cluster, and load balancer that all work together. Roughly:
:authority in the load balancer context to select the right host which should be available after being cached in (1). So the idea is that the user will have to install all three things (filter, cluster, load balancer type) together to get this functionality. At face value, this sounds convoluted, but IMO it's simpler than the alternative which basically involves making load balancer host selection asynchronous. The repercussions from that are wide and IMO will make the rest of the code substantially more complicated and harder to reason about vs. what I'm proposing here. I may also look into this being turned on via a router config flag and then implicitly installing a new filter prior to the router filter on behalf of the user. I will look into this.
I can't promise when I will deliver this but I will work on it in my "spare time" over next couple of months.
cc @rshriram
@rshriram from Istio perspective, how will cluster configuration work? E.g., things like HTTP/1 vs. HTTP/2, etc.? It seems like for HTTP stuff in general we would have to use HTTP/1 in most cases, until we eventually implement upgrade and/or potentially using ALPN for TLS connections (with potentially some automated way to turn on SAN checking based on authority).
Istio use case is primarily for the egress proxy, where all external traffic out of the mesh (e.g., consuming AWS apis) has to transit through a dedicated proxy node. H1 should suffice for the near term.
But we have a need for TCP connections as well. I think the TCP is going to require some funny setup (Will have to look into the CONNECT semantics?).
Yeah for TCP the only way to make this sanely work would be via CONNECT, which will be a separate work item, tracked here: https://github.com/envoyproxy/envoy/issues/1451
I will think through the CONNECT case fully to make sure that will be supported. My rough intuition is to potentially handle CONNECT in a new filter that would run before the router but after the "inline_DNS" filter.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
Marking help wanted to avoid auto close. I still plan on working on this, not sure when though in case someone wants to do this before me.
Just pinging this issue with my use-case, which is somewhat related. (Though an alternative design might be more related to #2500)
Effectively, we need a way to do the equivalent of x-envoy-original-dst-host with dns names. This is going to be used in a similar way as linkerd dtab overrides.
In our environment, developer playgrounds are registered in DNS so that they can be routed to. We'd like to enable developers to insert their instance of a service into the service call graph via dynamic, per-request routing (I understand the security implications of this). We would pass the override via a header similar to x-envoy-original-dst-host but with the hostname override. This would probably require some custom filter work, but hopefully a minimal amount.
One alternative option is to add a cluster for each of these registrations. However these are high cardinality. #2500 would fix this.
@bplotnick I have added x-envoy-original-dst-host exactly as a replacement for dtab overrides in linkerd .However, in our case fortunately we know the IP address. So limited it to use IPs. Before that we were experimenting by adding a cluster dynamically on the request path with the host that request needs to be routed to, which is some what complicated. Here is the partial PR https://github.com/envoyproxy/envoy/pull/3479 if you are interested in looking at.
@mattklein123 I have recently considered this as a solution to a challenge we're facing, and may be able to help at some point after I have completed the original_src work. Do you have a breakdown of the work remaining/a design at all? Maybe we could split up the work.
I haven't gotten back to this, and haven't refreshed my memory on the proper design given recent overall system changes. @klarose maybe LMK when you are ready to work on this and we can sync up and brainstorm?
Any workaround or progress on this? This would enable the proper use of an egress-gateway in Istio as outlined in the question here: https://groups.google.com/forum/#!topic/envoy-dev/xVvy3Q26VNM
No promises but I'm going on vacation for the next two weeks and this might be my vacation coding project. I will report back if I get started.
Awesome! Hope you have a great vacation :)
To all watchers, I have a plan to implement this and will start soon. This will be a combination of an HTTP filter and a customer cluster/LB pair.
My question is what should this be called? There is no clearly good name that I can think of that will make it clear what the filter and cluster/LB do. Some options:
1) http_proxy
2) outbound_http_proxy
3) generic_http_proxy
4) ?
Please LMK if any of you have any thoughts on naming. cc @envoyproxy/maintainers
FWIW similar feature in nginx is usually referred as forward_proxy although the word "forward" is not mentioned in the configuration required to achieve this:
https://umbrella.cisco.com/blog/2015/11/03/lets-talk-about-proxies-pt-2-nginx-as-a-forward-http-proxy/
https://ef.gy/using-nginx-as-a-proxy-server
I second 'forward proxy'. Wikipedia calls it that as well to distinguish it from reverse proxies.
+1 'forward proxy'.
btw do we really need a new set of an HTTP filter and a customer cluster/LB pair? The really only missing part is resolving DNS for current original_dst IIRC.
The reason I don't think "forward proxy" by itself is a great name is that Envoy can already be used as a forward proxy, just a statically configured one. Perhaps "http_forward_proxy" ?
@lizan re: original_dst, the issue there is we do not have any async mechanisms around chooseHost. So the options there would be to a) make chooseHost() async, which is a very non-trivial change or b) Actually return a wrapper connection that internally resolves DNS and then works as a normal connection. I've considered both options, but IMO, these are both more difficult than adding a filter which holds requests while DNS is being resolved if needed, and then adding the DNS to a special cluster which then returns the new host. So basically it ends up looking like a combination of original_dst and logical_dns. IMO it's simpler to just build a new cluster to do this, though I may end up factoring out some base code if it makes sense. WDYT?
Hmm. Perhaps "dynamic_forward_proxy" or "generic_forward_proxy" since the distinguishing feature here is that the destination cluster is not pre-configured.
I like "dynamic_forward_proxy" +1 to that.
I've considered both options, but IMO, these are both more difficult than adding a filter which holds requests while DNS is being resolved if needed, and then adding the DNS to a special cluster which then returns the new host.
I see, thanks for the explanation. I didn't take at deeper look on this. Just a random thought, will "a) make chooseHost() async" be a more generic solution? i.e. in the future some cluster/lb extension will need it anyway, rather than requiring those extension, like the generic proxy here, to be a combination of filter + cluster?
Just a random thought, will "a) make chooseHost() async" be a more generic solution? i.e. in the future some cluster/lb extension will need it anyway, rather than requiring those extension, like the generic proxy here, to be a combination of filter + cluster?
Yeah, I've had the same thoughts here. I do agree that figuring out how to (optionally) make choostHost() async would be a good capability to have, but it will be an extremely complicated change, and I'm not sure it's worth doing unless we know for sure that we need this in multiple places. WDYT?
I've been playing around with a hackish solution to this myself. The general design is:
I had a few problems with it:
I think we'd want to make sure that the connection pooling was efficient, as well as well-reported on.
It'd also be nice if we could make sure that the CONNECT method worked, but I understand that that could be solved as a separate work item (indeed, it's a separate issue (https://github.com/envoyproxy/envoy/issues/1451)).
@klarose I have a complete plan for a non-hacky solution, so will move forward with that. Connection pooling should work properly since each DNS target will have its on logical host, similar to how logical DNS works.
CONNECT support is orthogonal as you point you, and similar to web socket. With that said, the new filter I am going to write I think can easily support CONNECT and just be the terminal filter in that case vs. going to the router.
@mattklein123 Awesome, thanks. I didn't mean to propose my hack as an alternative. More that it might validate some of the assumptions (async dns resolution), and point out some drawbacks I found with it.
Do you have the plan documented anywhere? I'm not sure how much time I'll have over the next month (about to start sprinting hard towards something), but I wouldn't mind staying in the loop/helping out where I can.
Do you have the plan documented anywhere?
Only in my head right now. I can write it down if you want, or I can just go ahead and implement the basic version with a bunch of TODOs (host cleanup via TTL, max DNS, etc.) and then maybe you can help out with some of the follow ups? WDYT?
In thinking about this a bit more, I think CONNECT support is going to require being implemented directly in the HCM since it needs to upgrade to TCP much like web socket does. Given this, it's possible that the dynamic forward proxy DNS caching system may need to be used by the HCM also if we don't somehow allow a filter to effectively upgrade to raw TCP. My feeling is to cross this bridge when we come to it. cc @alyssawilk for any thoughts on where to put CONNECT support, taking for a given that we can implement a dynamic DNS cache. I'm thinking that maybe we can just treat CONNECT as a special type of upgrade and still have the filter deal with it.
Yeah, I'd envisioned CONNECT going over the same upgrade path, and maybe eventually allowing filter chain overrides (I think some day we'll want per-route filter chains for other reasons and they could be used for this as well) for folks who want to not apply L7 filters to them.
Initial PR out. Tracking additional work items here to call this issue done for v1:
<IP>:<port> host names@lizan @alyssawilk and others lmk if I'm missing anything.
Any updates on this @mattklein123 @davidben @lizan @PiotrSikora
i am looking for the same kind of implementation, pls.. direct me here.
anything on this space (envoy as generic forward proxy) is very helpful.
@davidben @lizan @PiotrSikora I have a question about IP certs which I'm looking to handle in my final PR for this issue:
I assume an IP cert would be issued using an IP alt name, right?
In looking at our verification code I don't think we are checking for SAN types of GEN_IPADD, so I would need to add that. Is that right? Thank you.
@mattklein123 correct.
Most helpful comment
Enough people have asked for this (including at Lyft) that I'm going to implement this. My rough plan is a trio of HTTP filter, cluster, and load balancer that all work together. Roughly:
:authorityin the load balancer context to select the right host which should be available after being cached in (1).So the idea is that the user will have to install all three things (filter, cluster, load balancer type) together to get this functionality. At face value, this sounds convoluted, but IMO it's simpler than the alternative which basically involves making load balancer host selection asynchronous. The repercussions from that are wide and IMO will make the rest of the code substantially more complicated and harder to reason about vs. what I'm proposing here. I may also look into this being turned on via a router config flag and then implicitly installing a new filter prior to the router filter on behalf of the user. I will look into this.
I can't promise when I will deliver this but I will work on it in my "spare time" over next couple of months.
cc @rshriram