Opening separate issue for visibility.
It's noteworthy that migrating away from non-binding listeners to a single listener with multiple filter chains is going to be painful in highly dynamic systems, because, due to a missing FilterChain Discovery Service (FDS), each update will require draining of the listener. Perhaps it should be a blocker for the removal?
cc @mattklein123 @alyssawilk @htuch
@PiotrSikora I'd like to hear someone state that the drain behavior in a real-world situation is going to make FDS necessary before making it a strict blocker. So far, we're just guessing I think? OTOH I totally agree we should do FDS sooner rather than later, this will be needed for on-demand LDS and solves the high churn story.
@htuch yes, it's not a strict blocker (hence the question mark), but I'm a bit concerned about the reliability of constant draining and it's effect on long-lived connections.
This might be a moot point, though, considering that @ggreenway already started working on F(C)DS.
Its not just that though. If we want incremental updates, having a single large listener blob negates the benefit of updating a specific listener.
Also keep in mind all the long lived gRPC streams or persistent DB connections will end up being killed everytime there is a listener update. And listener updates are frequent in K8S
@rshriram the point of FDS is that the listener blob delivered via LDS is just the top-level configuration; it's relatively stable and never updates. Instead, individual filter chains update, none of which requires a drain.
FYI, @mattklein123, it seems that I might have been a bit too optimistic when I said "late Q1". At this point, it looks that we won't start working on this before FDS is ready (partially to avoid rewriting this logic twice in a short period of time, but mostly because everybody is already committed to other work this quarter). Unfortunately, I don't have any firm dates to offer. @duderino will keep us posted.
FWIW, this also covers the use of use_original_dst, another deprecated field that was slated for upcoming default fatal-on-startup. We need to preserve both until we have Istio migrate to this.
@PiotrSikora @costinm can you provide an updated statement on what time frames we should be intending to drive this deprecation in FTR?
@silentdai is working on this.
Yep, I'm just wondering if we can get an agreement on which release version we're all comfortable with deprecation taking place in?
We would be comfortable with use_original_dst and bind_to_port to be marked as fatal-by-default on July 1st 2019. Would it be possible to keep these fields exempt from deprecation process until that date?
Jul 2019 would match Istio plans - if 1.2 and 1.3 don't get delayed. Given vacations and history - I would be more comfortable with Dec ( or as late as we can get ).
fatal by default "in july" means it'd actually flip when we cut the September Envoy build, and the code would be safe to remove in 2020, when it was slated by removal in 2018. I think we can do better than that.
I think we should make it fatal-by-default either in this or the next Envoy release (likely the next to allow for tooling), make it super easy for the Istio's Envoy build to override that fatal-by-default (since we're in good communication with you about deprecation/removal timeline but that'd force any lingering users to move over and start the clock for removal) and then when you folks hit 1.2 we can remove the code as quickly as the Envoy policy allows. Does this sound plausible from your end?
I'm going to just close this. This will live until v2 dies and then it's gone.
@lambdai @costinm now that we have drainless listener updates, are we good to deprecate bind_to_port and remove this on the Envoy minor version cycle? Reopening to track this for v3.
Per the chat today, we're going to un-deprecate bind_to_port and use_original_dst.
cc @tbarrella
@costinm @PiotrSikora can you folks summarize the key issues around why Istio has been unable to move to non-virtual listeners? Since we're making a fairly big exception here, it would be good to record for posterity why this deprecation didn't work as planned.
It looks like bind_to_port is a hidden field that has been part of DeprecatedV1 ever since it was introduced to the proto API (https://github.com/envoyproxy/data-plane-api/pull/143). Just to make sure, it should be exposed and moved to the top level of config.listener.v3.Listener as in #329? Does this need extra API approval? Should the DeprecatedV1 block remain for now to avoid a breaking change?
Undeprecating use_original_dst will cause hidden_envoy_deprecated_use_original_dst to be renamed. I'm assuming this breaking change is fine since anyone using a hidden_envoy_deprecated_ field should expect a breaking change at any point, but want to make sure.
I think there are a few primary issues:
1) Draining granularity - this is already fixed by what Yuchen did to make draining per filter chain
2) Time + Risk - we simply haven't had the time to migrate everything over, and when we do it will be a massive change that introduces substantial risk, which also likely means two code+testing paths for quite some time
3) I suspect, but have not proven either way, that filter_chain_match trie logic (https://github.com/envoyproxy/envoy/issues/12572) may make things more challenging for the reasons mentioned in that issue.
4) NACK granularity - now the entirety of listener config will be rejected together since we have a single massive listener.
5) Scalability of XDS - this is probably not an issue. Today, we send N listeners with M filter chains. On any change, we send all N*M listeners. With this change, we will send one listener with N*M filter chains, which is the same. Theoretically if we implemented delta XDS we could optimize this with bind_to_port=false but not with single listener without something like FilterChainDS
(2) is by far the largest thing blocking us
I suspect (4) is a major problem - we really only have 2 listeners, one for iptables inbound and one for outbound.
Even if Envoy can still start the listener and ignore filter chains that are broken ( which I don't think happens now), the
error reporting and metrics will be off, meaning users will need to adjust alerts and debugging.
@howardjohn @PiotrSikora @costinm @htuch could one of you tell @tbarrella how the undeprecation should work? I take it that the 'hidden_envoy_deprecated_' fields should themselves go through a deprecation process?
I don't think renaming a field breaks protobuf compatibility (just like adding hidden_envoy_deprecated_ didn't break it in the first place), so it should be fine to just rename it I would guess, without being familiar with Envoy api process.
Renaming a field breaks loading the config via yaml. Documentation on envoy processes around this are here: https://github.com/envoyproxy/envoy/blob/master/api/API_VERSIONING.md#backwards-compatibility
Thank you, yeah, maybe it was an easy fix but I noticed the field name showed up in Istio code (https://github.com/istio/istio/pull/25159)
So then according to the policy the hidden field should only be removed in v4alpha. If it's important, maybe the non-hidden field could be added in v3 by removing the reserve field here and replacing it with this? This would be kind of confusing though and the newly-added non-hidden field would have a different ID
Similarly, the DeprecatedV1 shouldn't be removed till v4alpha, but are there any approval extra steps needed to add bind_to_port to v3?
and the newly-added non-hidden field would have a different ID
I think if you do this you break wire compatibility. We don't want a new field, right? We just want the existing one to no longer be deprecated.
Renaming a field breaks loading the config via yaml
Does this still apply for a field that already requires an explicit runtime override to be able to even configure it? It was already broken in this manner, to change it back would just be to "un-break" it from my point of view
@howardjohn @costinm thanks for the explanation, super helpful.
I think we can just rename the hidden_ fields; any use of them outside of Envoy is illegal, and you can update their names in the Envoy source. Istio's current use of this is outside-of-contract. Please do this in the v3 API. For the v2 API, I'm not sure if we need to make any changes, since the fields still exist there, they are just marked deprecated.
Thank you. What about deprecated_v1.bind_to_port? Should that and a new bind_to_port coexist in v3, with both being valid, or with the old option being ignored? deprecated_v1.bind_to_port wasn't marked deprecated in v2 proto, so if I try to remove it in v3, proto_format.sh fix complains
Edit: It seems like bind_to_port should be added with the deprecated_v1 option remaining but marked as deprecated. If either bind_to_port or deprecated_v1.bind_to_port is set to false, that value can be used, since true is the default
@tbarrella I guess the question is what is the long term story here. Does https://github.com/envoyproxy/envoy/issues/5355#issuecomment-741245451 suggest that we're better off long term retaining bind_to_port (i.e. we tried and realized this is really more hassle than its worth full stop) or that we're better off in the short-medium term (i.e. past v2 removal but not indefinitely) with bind_to_port but still plan on having Istio migrate?
Hm, I don't really have context on this beyond what @PiotrSikora @howardjohn @costinm already said; they'd probably be able to answer better than me. My understanding though is that bind_to_port should be retained in the long term
We use bind_to_port on the outbound chain - and if we are allowed we'll likely keep it. We are exploring other modes
of interception besides iptables REDIRECT - like raw binding to the IP, which will look exactly like the current config but with
real binding to each IP:port. And it's cleaner/simpler.
On the inbound chain - we will make big changes when we move to BTS, i.e. everything will move to vhosts and routes.
@htuch do you think it's fine to proceed with the plan at the end of https://github.com/envoyproxy/envoy/issues/5355#issuecomment-743466041?
Seems like a reasonable plan, but I'd like to be deliberative about this undeprecation if we live with it long term. Tagging folks who have done significant listener work and are effective owners. @lambdai @mattklein123 @PiotrSikora WDYT?
I just learned that another factor is there's an attempt to totally preserve backward compatibility on the wire; it seems the other Istio folks have more context. For example, renaming deprecated_v1 but not changing the message structure or field numbers. I'm not sure how realistic of a goal this is
My two cents:
Pros:
Cons:
I'm fine with keeping bind to port right now. I agree it adds some complexity, but it's mostly limited to a single place in the code (connection handler) and the code has already been there for a very long time. We can always deprecate it later if we feel that it becomes truly redundant.
In terms of the actual proto, I would have to look at it myself, but I think it's fine to just add a new field in v3 if that is what it takes.
Yep, new v3 field, +1 on going ahead with reintroducing.
Great, thank you all, I'm happy that alignment is clear now!
To summarize and add some context from discussing offline with @howardjohn @costinm,
bind_to_port will be added with the deprecated_v1 option remaining but marked as deprecated in v3bind_to_port or deprecated_v1.bind_to_port is set to false, this will be interpreted as false, since true is the defaultOriginally there was a hope (among Istio developers) that wire compatibility could be preserved, based on an assumption that previous deprecation of this field could just be reverted. But it turns out that this field was always "deprecated" ever since it was migrated from json to proto (https://github.com/envoyproxy/data-plane-api/pull/143). This is fine for Istio though as long as the old field isn't removed in v3, and it won't be, since that would be against Envoy's breaking change policy
Most helpful comment
fatal by default "in july" means it'd actually flip when we cut the September Envoy build, and the code would be safe to remove in 2020, when it was slated by removal in 2018. I think we can do better than that.
I think we should make it fatal-by-default either in this or the next Envoy release (likely the next to allow for tooling), make it super easy for the Istio's Envoy build to override that fatal-by-default (since we're in good communication with you about deprecation/removal timeline but that'd force any lingering users to move over and start the clock for removal) and then when you folks hit 1.2 we can remove the code as quickly as the Envoy policy allows. Does this sound plausible from your end?