Envoy: FilterChain Discovery Service (FDS)

Created on 26 Sep 2018 · 34Comments · Source: envoyproxy/envoy

In order to support filter chain updates without forcing a full listener drain, it's been suggested we add a FilterChain Discovery Service. This would allow individual FilterChains belonging to a Listener to be independently discovered and updated.

This would also lend support to finer grained Listener resource naming for incremental and on-demand xDS (generalizing https://github.com/envoyproxy/envoy/issues/2500).

Opening this issue to track any design work and ownership.

@ggreenway @andraxylia @lizan @PiotrSikora @mattklein123

apv3 design proposal no stalebot

Source

htuch

👍14

Most helpful comment

We need this fix for Istio for long-lived connections with SNI

@andraxylia the intention here is to only effect new connections, right? Obviously previous connections will use the old filter chain? I just want to make sure we are on the same page. If so this sounds like a useful and straightforward feature to me.

mattklein123 on 26 Sep 2018

👍2

All 34 comments

We need this fix for Istio for long-lived connections with SNI @rshriram @costinm .

andraxylia on 26 Sep 2018

Related add-on feature: lazy-loading of FilterChains, triggered by a listener-filter that is watching for unknown SNI values.

ggreenway on 26 Sep 2018

@htuch does the drain happen when a particular FilterChain is updated, and also when any FilterChain is added or deleted in the listener?

andraxylia on 26 Sep 2018

We need this fix for Istio for long-lived connections with SNI

mattklein123 on 26 Sep 2018

👍2

@mattklein123 All is good if a change in the FilterChain does not affect existing TCP and GRPC connections - for instance the addition of a host in server_names in the FilterMatch.

@rshriram @costinm told me that because of issues with full drain we cannot move Istio from multiple listeners to 1 listener with multiple FilterChains.

andraxylia on 27 Sep 2018

I brainstormed with @htuch and here's what we propose for now:

Listeners will specify a list of FilterChains by name to discover. That means that adding a new static FilterChain will involve modifying the listener. This model is the best way to support lazy-loading of FilterChains.
We will modify the current listener implementation so that if a listener has filter chains added, modified, or removed, draining only happens to connections on the affected filter chain. Other connections do not need to drain. This will be done as a separate PR, as it does not require filterchain discovery.
FilterChain Discovery Service is geared towards the lazy-load use case. Statically defined/loaded filter chains can be added/modified/removed with the existing LDS.

Any comments or concerns with this plan?

ggreenway on 7 Jan 2019

I don't understand how filter chains interact with draining? Can't old connections use the old filter chain and new ones use the new filter chain? If we want to add draining on filter chain change I think that's totally orthogonal?

mattklein123 on 7 Jan 2019

@mattklein123 Yes, that's basically the conclusion @htuch and I came to. There are two orthogonal feature requests/requirements here:

1) Be able to add/remove filterchains without draining. Right now this is not possible only due to the implementation of listener config changes. The API does not need to change to fix this.
2) lazy-load filterchains.

The implementation of these two features is mostly/entirely non-intersecting.

ggreenway on 7 Jan 2019

OK, thanks for the clarification. IMO we don't ever need to implement draining on a per filter chain basis. I would track as a totally separate issue? Fixing (1) should be simple compared to having to do draining also (not simple).

mattklein123 on 7 Jan 2019

Actually, thinking through this, I think (2) would depend on (1) to work as expected. In order to modify one of the filter chains via FCDS, it would need to modify the listener and cause draining of connections on that filterchain, but not drain connections on unchanged filterchains.

ggreenway on 7 Jan 2019

But even with lazy load, why is draining required? It seems like we can write the code so that a filter chain is latched onto a connection at initial connect time? Why is it required to drain? (I can see why someone might want that, but it seems not required).

mattklein123 on 7 Jan 2019

That's an interesting point. I suppose it's not required. But it would be inconsistent with current behavior for listeners: there's not currently a way to modify/remove a listener without draining. But the answer to (1) may be as simple as adding a new DrainType of "hot-restart only". That's not quite as fine-grained, but would be much less effort for most of the potential gains. It would entirely solve the case of filterchain add/remove, but would not allow forcing draining for a filterchain modify.

ggreenway on 7 Jan 2019

FWIW, my view of filter chain is that it's something that is only consulted when a connection is created. As such, draining is not required. The reason I wrote draining into listeners is that when a listener goes away, by definition is destroys all owned connections (this wouldn't be the case for filter chains). I agree that we could add a configuration option to listeners to not drain and just hard swap, though it seems like few would want this behavior?

mattklein123 on 7 Jan 2019

The reason I wrote draining into listeners is that when a listener goes away, by definition is destroys all owned connections (this wouldn't be the case for filter chains).

I don't understand why filter chains would be any different in this way. We still need to ensure that all of the filter configs, etc are not destroyed until all connections using them have closed. This could be done with shared_ptr or similar, but the same could be said for listeners (I'm not arguing that we want to do that for listeners, just that we could). I don't think I understand what you have in mind for the different lifetimes of listeners vs filter chains and why they're fundamentally different.

ggreenway on 7 Jan 2019

I don't think I understand what you have in mind for the different lifetimes of listeners vs filter chains and why they're fundamentally different.

The reason that I added draining for the listener implementation was honestly mainly for simplicity due to connection ownership. I agree that connections could live beyond the life of a listener but it would be a very large change. Obviously not doing anything at all (draining vs. shared_ptr ownership) would be not good for many users.

For filter chains, I think the keeping the filter chain alive via shared pointer if necessary is substantially simpler and if it were me is how I would start. I don't object to adding draining in any way, I just don't think it adds much and would be pretty complicated.

mattklein123 on 8 Jan 2019

Ok, that makes sense. I just wanted to make sure I understood your reasoning.

What do you think about changing listeners so that they do not drain when their config is modified (but not deleted) if the only part of the config that changed is the filter chains?

ggreenway on 8 Jan 2019

What do you think about changing listeners so that they do not drain when their config is modified (but not deleted) if the only part of the config that changed is the filter chains?

Yup makes sense and I don't think would be that hard.

mattklein123 on 8 Jan 2019

SGTM to me as well, FWIW I hadn't grokked the full tradeoff between shared_ptr and drain here in Listener implementation, so thanks @mattklein123 for explaining this for posterity.

htuch on 9 Jan 2019

@mattklein123 I've been thinking this through, and simply using a shared_ptr for the filter-chain will run into other issues, because right now ThreadLocal slots must be destructed on the main thread. But I can probably make it post deletions to the main thread when the refcount goes to zero somehow.

ggreenway on 9 Jan 2019

Hmm yeah that's definitely a complexity with the stored lambdas. Yeah I think could have some type of shared_ptr wrapper that on destruction posts back to the main listener for deletion. It's definitely not trivial b but doable. I suppose if TLS is the only issue you could fix that to allow deletion from any thread, but not sure if that is worth it or not.

mattklein123 on 9 Jan 2019

my two cents. the draining is totally optional. If we can have the old filter chain with the old connection live for ever in the same listener, then as Matt says, I see no reason to drain. Unless its a security issue and I want to terminate old connections in a timely fashion, and not have any connections with expired tls certs lingering around for ages.

rshriram on 15 Jan 2019

Wouldn't not draining lead to a situation, where we have long-running connections (i.e. HTTP/2, gRPC) that are latched to a filter chain that no longer exist, along with its outdated configuration, with no way to force those clients to re-connect and re-select a new filter chain?

PiotrSikora on 15 Jan 2019

it does but the point is that unless these configurations pertain to some security stuff, this "forcing" thingy should be optional. In a way, we put the onus on the end user - if they want to take advantage of the newer config, they should reconnect. Gives them the option of staying with old config or reconnecting

rshriram on 15 Jan 2019

End users don't know that there is a new config, how would they?

The old config might be pointing to a non-existing RDS/CDS/EDS targets, in which case already established connections will be permanently broken at the stream level, whereas new connections will work perfectly fine. I don't believe that's an acceptable state to be in.

PiotrSikora on 15 Jan 2019

@PiotrSikora I don't think anyone is saying we shouldn't implement filter chain draining. All we are saying is that it's extra complexity that is orthogonal to the main implementation. I would suggest we track as a separate issue once the primary feature is implemented.

mattklein123 on 15 Jan 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale[bot] on 10 Apr 2019

Ack: working on this

lambdai on 10 Apr 2019

@silentdai would you be able to post a brief status update on this issue for followers. Thanks. Dave

davecheney on 18 Jun 2019

7152 #7246 are the preparation. Not a real fdcs thing but share the motivation that decouple the filterchain and listener flow.

I have several draft PRs on benchmark and FCDS proto. Blocked by #7246
I also have draft design doc supposed to publish by the end of this week.

lambdai on 18 Jun 2019

👍1

One more thing to consider as part of this work is decoupling of filter chains from the filter chain matchers. This would not only help with the future on-demand work, but it would also allow multiple filter chain matchers to match to a single filter chain, instead of having separate instances of filter chains for each matcher, since those can be exact copies in some cases.

PiotrSikora on 22 Jun 2019

👍1

fwiw, for our use case it would be great to allow multiple matchers to refer to the same chain and avoid the extra copies.

rgs1 on 22 Jun 2019

@mattklein123 @lambdai do you think this is needed for 1.12.0 or can we do this as a v3 add-on once shipped?

htuch on 23 Sep 2019

Add on is fine with me.

mattklein123 on 23 Sep 2019

I was curious about time-line on this feature (FDS). Looks like lot of work is already committed.

hirenshah05 on 9 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

document idletimeout clearly

hzxuzhonghu · 3Comments

[rbac] Envoy doesn't break an existing connection when rbac rules no longer allow traffic

yanniszark · 3Comments

Envoy does not modify `content-length` header after applying BodyFormat to error responses from External Authorization service

alkov-ibm · 3Comments

Admin endpoints not responding

justConfused · 3Comments

Support running without an admin interface

jmillikin-stripe · 3Comments