Envoy: Support generic outbound proxy

Created on 7 Sep 2017 · 38Comments · Source: envoyproxy/envoy

When using Envoy as a general egress proxy, there doesn't seem to be a way to have http_connection_manager routes send to a cluster and have the :authority header be used to select the destination. I expected a cluster with type: ORIGINAL_DST lb_policy: ORIGINAL_DST_LB to do this, but it seems original_dst is actually something else (iptables level?).

http_connection_manager supports routing to different _clusters_ based on a header (RouteAction::cluster_header), but we can't plumb this to the cluster level unless each hostname somehow dynamically generated a cluster.

I'm not sure what to do about DNS in this case -- the process behind Envoy can't do the resolution because it doesn't have a network, but having Envoy resolve would violate the documented behavior "Envoy never synchronously resolves DNS in the forwarding path". Presumably ORIGINAL_DST doesn't worry about this if its forwarding is IP-level.

enhancement no stalebot

Source

jmillikin-stripe

👍13 ❤4

Most helpful comment

Enough people have asked for this (including at Lyft) that I'm going to implement this. My rough plan is a trio of HTTP filter, cluster, and load balancer that all work together. Roughly:

The filter will work similarly to what I described in https://github.com/envoyproxy/envoy/issues/2500#issuecomment-362404328 with request holding, but instead will look in a DNS cache. If a request needs to be held while initial DNS resolution is taking place it will be done.
The cluster will by very similar to logical_dns in operation. Perhaps it will be called "inline_dns" or something.
The special load balancer will use :authority in the load balancer context to select the right host which should be available after being cached in (1).

So the idea is that the user will have to install all three things (filter, cluster, load balancer type) together to get this functionality. At face value, this sounds convoluted, but IMO it's simpler than the alternative which basically involves making load balancer host selection asynchronous. The repercussions from that are wide and IMO will make the rest of the code substantially more complicated and harder to reason about vs. what I'm proposing here. I may also look into this being turned on via a router config flag and then implicitly installing a new filter prior to the router filter on behalf of the user. I will look into this.

I can't promise when I will deliver this but I will work on it in my "spare time" over next couple of months.

cc @rshriram

mattklein123 on 2 Feb 2018

❤14

All 38 comments

@jmillikin-stripe I think you are asking for the feature discussed in this thread? https://groups.google.com/forum/#!topic/envoy-dev/xVvy3Q26VNM

mattklein123 on 7 Sep 2017

Yes, it looks like that thread is about pretty much the same feature. I see the DNS resolution issue was already being considered. As to your many-cluster-vs-single-cluster question there, we definitely want (2). Not just for simplicity of implementation, but also because it prevents unbounded growth of metric tags.

jmillikin-stripe on 7 Sep 2017

OK. I meant to open an issue to track that thread so here we are. This feature is not easy to add but I think will be useful. I will mark this as "help wanted." If someone wants to work on it please let me know and we can go into the details.

mattklein123 on 7 Sep 2017

Retitled from: Support something like ORIGINAL_DST but based on the :authority header

mattklein123 on 31 Oct 2017

FYI unless this somehow gets picked up soon I may do this as my "holiday programming project."

mattklein123 on 31 Oct 2017

This would be amazing feature to have

jippi on 22 Dec 2017

Enough people have asked for this (including at Lyft) that I'm going to implement this. My rough plan is a trio of HTTP filter, cluster, and load balancer that all work together. Roughly:

The filter will work similarly to what I described in https://github.com/envoyproxy/envoy/issues/2500#issuecomment-362404328 with request holding, but instead will look in a DNS cache. If a request needs to be held while initial DNS resolution is taking place it will be done.
The cluster will by very similar to logical_dns in operation. Perhaps it will be called "inline_dns" or something.
The special load balancer will use :authority in the load balancer context to select the right host which should be available after being cached in (1).

I can't promise when I will deliver this but I will work on it in my "spare time" over next couple of months.

cc @rshriram

mattklein123 on 2 Feb 2018

❤14

@rshriram from Istio perspective, how will cluster configuration work? E.g., things like HTTP/1 vs. HTTP/2, etc.? It seems like for HTTP stuff in general we would have to use HTTP/1 in most cases, until we eventually implement upgrade and/or potentially using ALPN for TLS connections (with potentially some automated way to turn on SAN checking based on authority).

mattklein123 on 2 Feb 2018

Istio use case is primarily for the egress proxy, where all external traffic out of the mesh (e.g., consuming AWS apis) has to transit through a dedicated proxy node. H1 should suffice for the near term.

But we have a need for TCP connections as well. I think the TCP is going to require some funny setup (Will have to look into the CONNECT semantics?).

rshriram on 4 Feb 2018

Yeah for TCP the only way to make this sanely work would be via CONNECT, which will be a separate work item, tracked here: https://github.com/envoyproxy/envoy/issues/1451

I will think through the CONNECT case fully to make sure that will be supported. My rough intuition is to potentially handle CONNECT in a new filter that would run before the router but after the "inline_DNS" filter.

mattklein123 on 4 Feb 2018

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale[bot] on 28 Jun 2018

Marking help wanted to avoid auto close. I still plan on working on this, not sure when though in case someone wants to do this before me.

mattklein123 on 28 Jun 2018

Just pinging this issue with my use-case, which is somewhat related. (Though an alternative design might be more related to #2500)

Effectively, we need a way to do the equivalent of x-envoy-original-dst-host with dns names. This is going to be used in a similar way as linkerd dtab overrides.

In our environment, developer playgrounds are registered in DNS so that they can be routed to. We'd like to enable developers to insert their instance of a service into the service call graph via dynamic, per-request routing (I understand the security implications of this). We would pass the override via a header similar to x-envoy-original-dst-host but with the hostname override. This would probably require some custom filter work, but hopefully a minimal amount.

One alternative option is to add a cluster for each of these registrations. However these are high cardinality. #2500 would fix this.

bplotnick on 4 Oct 2018

@bplotnick I have added x-envoy-original-dst-host exactly as a replacement for dtab overrides in linkerd .However, in our case fortunately we know the IP address. So limited it to use IPs. Before that we were experimenting by adding a cluster dynamically on the request path with the host that request needs to be routed to, which is some what complicated. Here is the partial PR https://github.com/envoyproxy/envoy/pull/3479 if you are interested in looking at.

ramaraochavali on 6 Oct 2018

@mattklein123 I have recently considered this as a solution to a challenge we're facing, and may be able to help at some point after I have completed the original_src work. Do you have a breakdown of the work remaining/a design at all? Maybe we could split up the work.

klarose on 30 Jan 2019

I haven't gotten back to this, and haven't refreshed my memory on the proper design given recent overall system changes. @klarose maybe LMK when you are ready to work on this and we can sync up and brainstorm?

mattklein123 on 30 Jan 2019

Any workaround or progress on this? This would enable the proper use of an egress-gateway in Istio as outlined in the question here: https://groups.google.com/forum/#!topic/envoy-dev/xVvy3Q26VNM

skydoctor on 16 May 2019

No promises but I'm going on vacation for the next two weeks and this might be my vacation coding project. I will report back if I get started.

mattklein123 on 16 May 2019

Awesome! Hope you have a great vacation :)

skydoctor on 16 May 2019

To all watchers, I have a plan to implement this and will start soon. This will be a combination of an HTTP filter and a customer cluster/LB pair.

My question is what should this be called? There is no clearly good name that I can think of that will make it clear what the filter and cluster/LB do. Some options:

1) http_proxy
2) outbound_http_proxy
3) generic_http_proxy
4) ?

Please LMK if any of you have any thoughts on naming. cc @envoyproxy/maintainers

mattklein123 on 31 May 2019

🎉8

FWIW similar feature in nginx is usually referred as forward_proxy although the word "forward" is not mentioned in the configuration required to achieve this:

https://umbrella.cisco.com/blog/2015/11/03/lets-talk-about-proxies-pt-2-nginx-as-a-forward-http-proxy/
https://ef.gy/using-nginx-as-a-proxy-server

skydoctor on 31 May 2019

I second 'forward proxy'. Wikipedia calls it that as well to distinguish it from reverse proxies.

klarose on 31 May 2019

+1 'forward proxy'.

btw do we really need a new set of an HTTP filter and a customer cluster/LB pair? The really only missing part is resolving DNS for current original_dst IIRC.

lizan on 31 May 2019

The reason I don't think "forward proxy" by itself is a great name is that Envoy can already be used as a forward proxy, just a statically configured one. Perhaps "http_forward_proxy" ?

@lizan re: original_dst, the issue there is we do not have any async mechanisms around chooseHost. So the options there would be to a) make chooseHost() async, which is a very non-trivial change or b) Actually return a wrapper connection that internally resolves DNS and then works as a normal connection. I've considered both options, but IMO, these are both more difficult than adding a filter which holds requests while DNS is being resolved if needed, and then adding the DNS to a special cluster which then returns the new host. So basically it ends up looking like a combination of original_dst and logical_dns. IMO it's simpler to just build a new cluster to do this, though I may end up factoring out some base code if it makes sense. WDYT?

mattklein123 on 31 May 2019

Hmm. Perhaps "dynamic_forward_proxy" or "generic_forward_proxy" since the distinguishing feature here is that the destination cluster is not pre-configured.

skydoctor on 31 May 2019

I like "dynamic_forward_proxy" +1 to that.

mattklein123 on 31 May 2019

I've considered both options, but IMO, these are both more difficult than adding a filter which holds requests while DNS is being resolved if needed, and then adding the DNS to a special cluster which then returns the new host.

I see, thanks for the explanation. I didn't take at deeper look on this. Just a random thought, will "a) make chooseHost() async" be a more generic solution? i.e. in the future some cluster/lb extension will need it anyway, rather than requiring those extension, like the generic proxy here, to be a combination of filter + cluster?

lizan on 31 May 2019

Just a random thought, will "a) make chooseHost() async" be a more generic solution? i.e. in the future some cluster/lb extension will need it anyway, rather than requiring those extension, like the generic proxy here, to be a combination of filter + cluster?

Yeah, I've had the same thoughts here. I do agree that figuring out how to (optionally) make choostHost() async would be a good capability to have, but it will be an extremely complicated change, and I'm not sure it's worth doing unless we know for sure that we need this in multiple places. WDYT?

mattklein123 on 31 May 2019

I've been playing around with a hackish solution to this myself. The general design is:

Original Dst cluster
Use an external service (via the ext_authz header) to resolve the requested host to an IP

I had a few problems with it:

The connection pooling of the original_dst cluster didn't seem ideal, since the resolved IP may change with each request.
One thing I couldn't get working was an HTTPS-style proxy, where we just get a CONNECT method along with the desired host.

I think we'd want to make sure that the connection pooling was efficient, as well as well-reported on.

It'd also be nice if we could make sure that the CONNECT method worked, but I understand that that could be solved as a separate work item (indeed, it's a separate issue (https://github.com/envoyproxy/envoy/issues/1451)).

klarose on 31 May 2019

@klarose I have a complete plan for a non-hacky solution, so will move forward with that. Connection pooling should work properly since each DNS target will have its on logical host, similar to how logical DNS works.

CONNECT support is orthogonal as you point you, and similar to web socket. With that said, the new filter I am going to write I think can easily support CONNECT and just be the terminal filter in that case vs. going to the router.

mattklein123 on 31 May 2019

@mattklein123 Awesome, thanks. I didn't mean to propose my hack as an alternative. More that it might validate some of the assumptions (async dns resolution), and point out some drawbacks I found with it.

Do you have the plan documented anywhere? I'm not sure how much time I'll have over the next month (about to start sprinting hard towards something), but I wouldn't mind staying in the loop/helping out where I can.

klarose on 31 May 2019

Do you have the plan documented anywhere?

Only in my head right now. I can write it down if you want, or I can just go ahead and implement the basic version with a bunch of TODOs (host cleanup via TTL, max DNS, etc.) and then maybe you can help out with some of the follow ups? WDYT?

mattklein123 on 31 May 2019

In thinking about this a bit more, I think CONNECT support is going to require being implemented directly in the HCM since it needs to upgrade to TCP much like web socket does. Given this, it's possible that the dynamic forward proxy DNS caching system may need to be used by the HCM also if we don't somehow allow a filter to effectively upgrade to raw TCP. My feeling is to cross this bridge when we come to it. cc @alyssawilk for any thoughts on where to put CONNECT support, taking for a given that we can implement a dynamic DNS cache. I'm thinking that maybe we can just treat CONNECT as a special type of upgrade and still have the filter deal with it.

mattklein123 on 1 Jun 2019

Yeah, I'd envisioned CONNECT going over the same upgrade path, and maybe eventually allowing filter chain overrides (I think some day we'll want per-route filter chains for other reasons and they could be used for this as well) for folks who want to not apply L7 filters to them.

alyssawilk on 3 Jun 2019

Initial PR out. Tracking additional work items here to call this issue done for v1:

[x] Configurable DNS cache circuit breakers and maximums
[x] Stats
[ ] <IP>:<port> host names
[x] Show host name in admin output
[x] Dynamic per-host TLS, at least for SNI and verifying subject alt name

@lizan @alyssawilk and others lmk if I'm missing anything.

mattklein123 on 19 Jun 2019

🎉2

Any updates on this @mattklein123 @davidben @lizan @PiotrSikora
i am looking for the same kind of implementation, pls.. direct me here.

anything on this space (envoy as generic forward proxy) is very helpful.

ravikumarkgit on 9 Jul 2019

@davidben @lizan @PiotrSikora I have a question about IP certs which I'm looking to handle in my final PR for this issue:

https://github.com/envoyproxy/envoy/blob/0d26d05fe353315113b193bd0c53a768fbdd1c4f/test/config/integration/certs/upstreamcert.cfg#L35-L38

I assume an IP cert would be issued using an IP alt name, right?

In looking at our verification code I don't think we are checking for SAN types of GEN_IPADD, so I would need to add that. Is that right? Thank you.

mattklein123 on 10 Jul 2019

@mattklein123 correct.

PiotrSikora on 10 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Envoy in container does not get PID 1 / does not receive SIGTERM

anatolebeuzon · 3Comments

http2 to external upstream fails

dstrelau · 3Comments

Envoy does not modify `content-length` header after applying BodyFormat to error responses from External Authorization service

alkov-ibm · 3Comments

Admin endpoints not responding

justConfused · 3Comments

Is there a way to expose a gRPC service as only HTTP and not as a gRPC service?

lps0535 · 3Comments