envoy 🚀 - filter_chain_match difference between {} and null is confusing/under documented

Actually original issue was not fully accurate. If my matches are like:

{
  "transportProtocol": "raw_buffer",
  "applicationProtocols": [
    "http/1.0",
    "http/1.1",
    "h2c"
  ]
}
null

Raw TCP traffic traffic will match neither

In order to get it to match I need to set up two matches:

{
  "transportProtocol": "raw_buffer",
  "applicationProtocols": [
    "http/1.0",
    "http/1.1",
    "h2c"
  ]
}
{
  "transportProtocol": "tls"
}
{
  "transportProtocol": "raw_buffer"
}

I have not yet found a way to have a real "match everything" FCM

howardjohn on 10 Aug 2020

Yes I was confused a lot by this, I think we also need to revisit whole FCM proto. cc @envoyproxy/api-shepherds

lizan on 11 Aug 2020

Even more contradicting is that changing one FCM (seems to) impacts the matching logic of another FCM!

It's the natural of "best match" implemented by decision tree. I can explain match mechanism using sieve mode which is less confusing than
https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/listener/v3/listener_components.proto.html?highlight=filterchainmatch#config-listener-v3-filterchainmatch

Let me write an formal one

lambdai on 11 Aug 2020

I will admit I don't fully grok the details here but just from my quick read it sounds confusing. If there is any way we can fix this without proto changes that would be really appreciated.

mattklein123 on 11 Aug 2020

I'm not sure why this is so complicated or hard to understand. The current documentation seems clear. You first match transport protocol, and then match on ALPN. In your first example, definitely the ALPN one doesn't work for pure TCP. I'm not sure about the null; this is not part of the FCM match algorithm. We should clarify semantics around that.

htuch on 11 Aug 2020

@htuch all I want to do is have a "fallback" filter chain that will match everything not matched by a previous chain. As far as I can tell this is not actually possible although the doc implies it is by leaving all fields empty and it does not mention that someone changing a different FC impacts others matching behavior.

The doc suggests an empty application protocol will match any application protocol, however this doesn't seem to be the case as my tcp traffic matches neither the h2 one (expected) or the empty one (unexpected)

howardjohn on 11 Aug 2020

From istio's point of view, a fallback filter chain could save lots of logic.

I am not sure once we introduce the "fallback" filter chain to the listener, how could we support the case to reject a connection

lambdai on 11 Aug 2020

@lambdai the whole point of having a "match everything"/"fallback" filter chain is that we do not ever want to reject it

howardjohn on 11 Aug 2020

So based on the sieve model in https://github.com/envoyproxy/envoy/pull/12587, my understand is if I want two filter chains:

FC 1 matches on some complex criteria, say destination_port=80,transport_protocol=tls,application_protocols=[h2]
FC 2 matches everything else

If I want to do this, instead of a single FC2, I need to make every possible permutation of FCMs? So I need 65535 (destination ports) * 2 (transport_protocols) * infinity (all permutations of ALPN, which is unbounded?

Am I understanding this right? @lambdai

howardjohn on 11 Aug 2020

If I want to do this, instead of a single FC2, I need to make every possible permutation of FCMs? So I need 65535 (destination ports) * 2 (transport_protocols) * infinity (all permutations of ALPN, which is unbounded?

A little bit less: we need

FC 2 with no destination port. So FC2 matches any port other than 80.
FC 3 with destination_port=80, nothing else. FC3 matches destination_port=80 and transport_protocol != tls
FC 4 with destination_port=80, transport_protocol=tls, and nothing else. FC3 matches destination_port=80 and transport_protocol = tls, and alpn != h2

lambdai on 11 Aug 2020

So when we move to a single outbound listener, and we have 100s or 1000s of different FCM in play, in order to get a real "match all" will we need some huge combinatoric explosion of FCMs? Because at that point, we will have FCs for a bunch of different ports, server names, etc.

Put another way, I don't see how we can stop using the use_original_dst option unless FCM is changed in some way

howardjohn on 11 Aug 2020

So when we move to a single outbound listener, and we have 100s or 1000s of different FCM in play, in order to get a real "match all" will we need some huge combinatoric explosion of FCMs? Because at that point, we will have FCs for a bunch of different ports, server names, etc.

Put another way, I don't see how we can stop using the use_original_dst option unless FCM is changed in some way

Fallback filter chain is preferred not only b/c single outbound listener.

We even without single outbound listener, a 0.0.0.0:80 outbound has multiple filter chains and an imagined fallback filter chain.

The fallback filter chain is "imagined" because in massive cases that filter chain works as a fallback. However, in the cases you mentioned, none filter chain is chosen.

In all, in istio(especially at sidecar) fallback filter chain is nice to have. But it is not a blocker of single outbound listener.

lambdai on 11 Aug 2020

FWIW, if FCM matching has the existing sequencing logic but was structured in the API like a trie (including how CIDR ranges are treated), i.e. as a bunch of nested trees, rather than linear list of alternative FCM criteria, we would not have any combinatorial explosion as we scale up.

htuch on 12 Aug 2020

I think we should just standardize something for catch-all; empty match criteria matches all.

htuch on 12 Aug 2020

I am a bit concerned if we just special case catch-all we are just delaying the problem until we end up with a more complicated match where we end up in a similar state.

What would be more intuitive/flexible to me, as well as consistent with the (current) docs is to add some form of "backtracking". Hopefully I can explain this right:

Today, for each criteria we decide if its a match and go down the match logic:

                       dest port match
          /no match (a)                  \found a match (b)
      ip match                                 ip match
    /no match (c) \found a match (d)     /no match (e) \found a match (f)

And consider I have these FCMs:

match:
  port: 80
  ip: 1.2.3.4
match:
  ip: 2.3.4.5

just showing 2 levels to avoid massive graph.

My intent is this matches the second FC; currently it does not.

At each step, if we follow "no match" path, we carry on, but if we follow "found a match" we filter out all non matching FCs.

The issue is if we ever follow a "found a match" path, we can never match the FCs we filter out. This may lead to us finding no match (part e on the graph) when we expect a match.

My proposed change is that if we ever get to the bottom of the matching logic (Source port in real code, e here), we start back tracking. In this case we know we found a matching port but not a matching port+ip, so now we want to see if we can find a matching ip (which we know will not match the port). So we can take the a path, filtering out the FCs that did match the dest port, which could eventually lead us to a matching FC.

For example, if I send a request on 2.3.4.5:80 - we first filter on port, going from {FC1, FC2} -> {FC 1}, but then end up at e since the ip does not match. At this point, we backtrack and look at {FC2} and checking ip match. From here, FC2 does match, and we are at the bottom of the tree so we return it.

Hopefully this makes sense

howardjohn on 12 Aug 2020

I think back tracking is pretty hard to reason about and would introduce unpredictable match cost. Looking at the example, we end up basically abandoning the strict matching criteria you were after. Is the issue really that port match happens before IP match? I seem to recall some discussion we had around possibly allowing flexible ordering of matchers, which might mitigate this.

htuch on 12 Aug 2020

@htuch I am not sure its hard to reason about for the user, which is what I am optimizing for. Yes, my explanation is complicated, as its based on the implementation - but the user facing results are not. We have a ton of bugs reviewed by a ton of people, including many very familiar with Envoy, because of this "quirk" in the API - no one identified that it was totally broken, and we spent many hours tracking down issues that we now realized are due to this.

Reading the docs, I actually would assume that matching logic is like:

Any filter chain that matches all 8 criteria
Any filter chain that matches criteria 2-8
Any filter chain that matches criteria 3-8
...
Any filter chain that matches no criteria

If there are multiple filter chains matching at any given stage, we then filter them out by "closest" match for things that have wildcards, for example

This is the same behavior as the backtracking I defined, I think, just a different way of looking at it. Envoy can implement it however, it wants, but right now it feels like the implementation choice of a trie is biasing the user facing behavior in a negative way.

howardjohn on 12 Aug 2020

@howardjohn I think this makes sense, but I think there is a simpler explanation. Can we just say that if a match criteria is not set, it's effectively wildcard, and we always match on the most specific criteria first, with precedence towards the earlier matchers as tie breaker?

htuch on 12 Aug 2020

@htuch That makes sense to me. If we throw an example or 2 in the doc I think it will clear things up as well

howardjohn on 12 Aug 2020

@lambdai @PiotrSikora WDYT?

htuch on 12 Aug 2020

Adding more examples and explanation sounds fine. I think the missing piece of information is that once the criteria is matched, only filter chains containing that criteria are going to be considered. I thought it was documented, but it doesn't look like it is.

I have no idea what {} vs null is. Shouldn't FilterChainMatch be always present (i.e. non-null) in the proto?

PiotrSikora on 12 Aug 2020

@howardjohn I think this makes sense, but I think there is a simpler explanation. Can we just say that if a match criteria is not set, it's effectively wildcard, and we always match on the most specific criteria first, with precedence towards the earlier matchers as tie breaker?

This is the semantics that we had assumed. But it does not seem to be the case. i.e.

{
  "transportProtocol": "raw_buffer",
  "applicationProtocols": [
    "http/1.0",
    "http/1.1",
    "h2c"
  ]
}
{
  "transportProtocol": "tls"
}
{
  "transportProtocol": "raw_buffer"
}

indicates that one needs to enumerate the state space of values for the NOT-MATCH in order to emulate the wildcard catch all. Note that wildcard catch all is not just one FC. There are different levels of wildcards (any plaintext http but not tls, any tcp but not tls, any tls irrespective of alpns set by client, etc). The current match semantics just doesn't match the intuition we have (i.e. missing matches ==> wildcard). Throw in ports, IPs, and the entire filter chain blows up. This is what @howardjohn is alluding to.

I think the missing piece of information is that once the criteria is matched, only filter chains containing that criteria are going to be considered.

This right here is the root of the problem. If we want to match on port 80+IP, and just port 80, we have two filter chains. Then when we throw in ALPN based matches, we double the number of filter chains with and without ALPNs. Throw in transport protocol matches, we double it again. So the number of filter chains grows exponentially with each match condition we add. How is this expected to scale in a cluster that has 1000+ services, 5000+ pods/VMs, with arbitrary ports of different protocols?

rshriram on 12 Aug 2020

This right here is the root of the problem. If we want to match on port 80+IP, and just port 80, we have two filter chains. Then when we throw in ALPN based matches, we double the number of filter chains with and without ALPNs. Throw in transport protocol matches, we double it again. So the number of filter chains grows exponentially with each match condition we add. How is this expected to scale in a cluster that has 1000+ services, 5000+ pods/VMs, with arbitrary ports of different protocols?

It sounds like all your problems could be solved with a default / fallback filter chain that's used in case of no_filter_chain_match. Is that correct?

PiotrSikora on 12 Aug 2020

@PiotrSikora that solves a specific issue, but not all issues. We could just as easily have different case. For example:

I start with filter chains

match:
  alpn: [h2]
  source_port: 80
match:
  alpn: [h1]
  source_port: 80
match:
  source_port: 80

Then I decide I want to add a transportProtocol=tls to the first match. You would think you could just add that to the first match -- but that is not the case anymore. I need to duplicate ALL my other filters and come up with:

match:
  alpn: [h2]
  source_port: 80
  transport_protocol: tls
match:
  alpn: [h1]
  source_port: 80
  transport_protocol: tls
match:
  source_port: 80
  transport_protocol: tls
match:
  alpn: [h1]
  source_port: 80
  transport_protocol: raw_buffer
match:
  source_port: 80
  transport_protocol: raw_buffer

The no_filter_chain_match case is just a specialization of the problem, I would encourage us to not focus on that too much or I fear in a few weeks when our filter chains get more complicated we will be back at square one (or, in reality, it will probably still be broken and we just may not notice).

howardjohn on 12 Aug 2020

Maybe we should just have an actual if/then/else lists? At least it'll be comprehensible by regular souls. I'm thinking of something like

select:
- match: <predicate1>
  filter_chain: _
- match: <predicate2>
  select:
  - match: <predicate3>
    filter_chain: _
    # no fall through here
# last clause is a fallthrough
- filter_chain: _

kyessenov on 12 Aug 2020

You mean nested matches in an ordered list? Linear search won't work with on-demand / incremental FCDS.

PiotrSikora on 12 Aug 2020

@PiotrSikora You can represent a trie of predicates in this form and materialize filter chains on demand. It's just not automatic. I really dislike magic in the config, so in my opinion being explicit and letting then control plane plan smartly the decision tree is better. I think incremental updates would be scoped to sub-trees in this representation.

As John says, current incremental FCDS has a serious issue with effect scoping. Adding a filter chain should not alter decisions for other filter chains drastically. It means that incremental delta need to be aware of the state of the world, which defeats the points on incremental updates.

kyessenov on 12 Aug 2020

As John says, current incremental FCDS has a serious issue with effect scoping. Adding a filter chain should not alter decisions for other filter chains drastically. It means that incremental delta need to be aware of the state of the world, which defeats the points on incremental updates.

Where is this expectation coming from?

Imagine that you have:

match:
- server_names: ["*.example.com"]

now, you add another filter chain with:

match:
- server_names: ["www.example.com"]

in what world would 2nd filter chain be not expected to take over all the traffic destined to www.example.com, which was previously routed via 1st filter chain?

PiotrSikora on 12 Aug 2020

@PiotrSikora I don't think there is confusion about more specific matches taking place over a different. think its more that something matches, then we add a new match and nothing matches:

I have a

match: alpn=h2
match: {}

And TCP request matches the 2nd filter chain

Now I add

match: alpn=h2,transport=raw_buffer
match: {}

And the same TCP request now matches no filter chain

howardjohn on 12 Aug 2020

@PiotrSikora in a world where the first control plane that sends listener update does not expect the second control plane that sends a FCDS delta to take over.

kyessenov on 12 Aug 2020

@PiotrSikora in a world where the first control plane that sends listener update does not expect the second control plane that sends a FCDS delta to take over.

That's not a real world, though, is it?

PiotrSikora on 12 Aug 2020

@PiotrSikora Let's not digress. I just provided a reason why the ability to scope effects is important with federating control planes. I don't know if that's the world that Envoy want to bring to reality, or how it applies specifically to FCDS and LDS interaction. I just see a problem that it's getting many people confused and two developers working on Pilot code may not always communicate on specific layout of match conditions.

kyessenov on 12 Aug 2020

@PiotrSikora I don't think there is confusion about more specific matches taking place over a different. think its more that something matches, then we add a new match and _nothing_ matches:

I have a
match: alpn=h2
match: {}
And TCP request matches the 2nd filter chain

Now I add
match: alpn=h2,transport=raw_buffer
match: {}
And the same TCP request now matches no filter chain

This is a decision tree, the 2nd update added a valid branch for raw_buffer which previously didn't exist which only accepts HTTP/2 traffic. There is no "magic" here.

Also, there are options to make sure that other filter chain updates won't affect others, including specifying listening port and/or CIDR ranges... or always defining all criteria, instead of matching on transport_protocol in one filter chain, but not specifying it in others.

PiotrSikora on 12 Aug 2020

@PiotrSikora I understand how it works (now, after trying to wrap my head around it for quite a while). My primary issues are that:

I would be surprised if anyone other than the people on the issue/that implemented the code understand this
The behavior makes it functionally impossible to use filter chain match in any complex scenarios

or always defining all criteria, instead of matching on transport_protocol in one filter chain, but not specifying it in others.

How? We want to match any transport protocol. The way to do this, per the doc, is to leave it empty, which does not work as seen here. How exactly am I supposed to do that? For a field like source_ports, am I supposed to have a list of 65535 entries? For destination_port, which is not a list, do I need to create 65535 duplicate filter chains

it quickly gets VERY out of hand (see https://github.com/envoyproxy/envoy/issues/12572#issuecomment-672916226).

Why are we so dead set on the current sieving implementation? To myself (and others, if I may project) it has only cost in terms of cognitive load and control plane/envoy cost of duplicate FCs. Is it solving real problems, or is it just the current implementation so the default option is to keep it as is? Put another way, if we implement FCM today, would we make the same decisions.

My short term goal here is really to fix a bug in Istio where we need to match all traffic. The rest here is more hypothetical, although I think its very important. Despite all of this discussion, I still don't see a way we can possibly do that without having a massive regression in duplicated filter chains

howardjohn on 12 Aug 2020

@PiotrSikora Let's not digress. I just provided a reason why the ability to scope effects is important with federating control planes. I don't know if that's the world that Envoy want to bring to reality, or how it applies specifically to FCDS and LDS interaction. I just see a problem that it's getting many people confused and two developers working on Pilot code may not always communicate on specific layout of match conditions.

Why Pilot cannot generate complete and/or non-overlapping filter chain matches?

Also, I don't see how the if/else/then list changes anything, e.g. using John's example, you could write it either as:

select:
- match: application_protocols=h2
  filter_chain: _
- match: transport_protocol=raw_buffer
  select:
  - match: application_protocols=h2
    filter_chain: _
    # no fall through here
# last clause is a fallthrough
- filter_chain: _

which is _exactly_ the same as the existing decision making tree.

or as:

select:
- match: application_protocols=h2
  filter_chain: _
- match: application_protocols=h2
  select:
  - match: transport_protocol=raw_buffer
    filter_chain: _
    # no fall through here
# last clause is a fallthrough
- filter_chain: _

which is illegal, because you have 2 match: application_protocols=h2 at the top level.

PiotrSikora on 12 Aug 2020

Why Pilot cannot generate complete and/or non-overlapping filter chain matches?

If we want to do this, we need to have a huge explosion of duplicate filter chains - see https://github.com/envoyproxy/envoy/issues/12572#issuecomment-672916226 for a clear explanation of this. Its not impossible - but it will be extremely painful in terms of the complexity for devs/users and the performance cost on both envoy and the control plane. I strongly suspect these are not Istio only concerns either, although I don't have evidence for that

howardjohn on 12 Aug 2020

Why are we so dead set on the current sieving implementation? To myself (and others, if I may project) it has only cost in terms of cognitive load and control plane/envoy cost of duplicate FCs. Is it solving real problems, or is it just the current implementation so the default option is to keep it as is? Put another way, if we implement FCM today, would we make the same decisions.

You might be misreading my comments. I don't really care, if you have anything better, then go for it. I'm just trying to help you achieve what you need using the existing implementation.

If we want to do this, we need to have a huge explosion of duplicate filter chains - see #12572 (comment) for a clear explanation of this. Its not _impossible_ - but it will be extremely painful in terms of the complexity for devs/users and the performance cost on both envoy and the control plane. I strongly suspect these are not Istio only concerns either, although I don't have evidence for that

That specific case, again, could be solved with a filter chain for no_filter_chain_match.

Do you have a _real_ example that we could use and that would not be solved with the default filter?

PiotrSikora on 12 Aug 2020

That specific case, again, could be solved with a filter chain for no_filter_chain_match.

Do you have a real example that we could use and that would not be solved with the default filter?

I don't have a specific case right now for how we would use this in Istio today - I bet @rshriram does though, and I strongly suspect we will in the near future, especially if we move to a single listener which seems inevitable. For example, same example as above but applying ports:

match: port=90,alpn=h2
match: port=90

And TCP request on port 90 matches the 2nd filter chain

Now I add

match: port=90,alpn=h2,transport=raw_buffer
match: port=90

And the same TCP request on port 90 now matches no filter chain

The no_filter_chain_match is just a specialization of this problem. If I had to chose between doing nothing and doing no_filter_chain_match, I would chose the latter, but I suspect doing a more comprehensive change may be preferable for the broader Envoy community. I could be entirely wrong here though

howardjohn on 12 Aug 2020

I don't have a specific case right now for how we would use this in Istio today - I bet @rshriram does though, and I strongly suspect we will in the near future, especially if we move to a single listener which seems inevitable. For example, same example as above but applying ports:
match: port=90,alpn=h2
match: port=90
And TCP request on port 90 matches the 2nd filter chain

Now I add
match: port=90,alpn=h2,transport=raw_buffer
match: port=90

That's not a full sub-tree, though, what you should add is:

match: port=90,transport=raw_buffer,alpn=h2
match: port=90,transport=raw_buffer

But what are you trying to achieve adding this 2nd filter chain independently from the 1st filter chain?

PiotrSikora on 12 Aug 2020

@PiotrSikora

match: port=90,transport=raw_buffer,alpn=h2
match: port=90,transport=raw_buffer

is not correct, we want to match all non plaintext,h2,port=90 traffic. So to do this, we need a third filter chain:

match: port=90,transport=raw_buffer,alpn=h2
match: port=90,transport=raw_buffer
match: port=90,transport=tls

double the filter chains isn't the end of the world, but it is easy to see how it can get more complicated.

Btw, we allow adding arbitrary filters, etc, so its a bit of a split brain scenario. We essentially need go through all filter chains, see what fields are set, then duplicate our existing filter chains to cover all cases. It gets complicated pretty quickly

howardjohn on 12 Aug 2020

is not correct, we want to match all non plaintext,h2,port=90 traffic. So to do this, we need a third filter chain:
match: port=90,transport=raw_buffer,alpn=h2
match: port=90,transport=raw_buffer
match: port=90,transport=tls

I'm pretty sure that match: port=90,transport=tls case is covered by match: port=90 from the 1st filter chain... unless your intention was to add a separate filter chain, but then you don't need match: port=90.

double the filter chains isn't the end of the world, but it is easy to see how it can get more complicated.

Double != huge explosion.

Btw, we allow adding arbitrary filters, etc, so its a bit of a split brain scenario. We essentially need go through all filter chains, see what fields are set, then duplicate our existing filter chains to cover all cases. It gets complicated pretty quickly

You mean via EnvoyFilter? That's Istio's self-inflicted issue.

PiotrSikora on 12 Aug 2020

Double != huge explosion.

double is an example. Just because Istio doesn't currently use all 8 of the match types doesn't mean its not a valid use case of Envoy. Maybe 2x is fine, but is 4x? 64x?

howardjohn on 12 Aug 2020

double is an example. Just because Istio doesn't currently use all 8 of the match types doesn't mean its not a valid use case of Envoy. Maybe 2x is fine, but is 4x? 64x?

But that explosion is only happening because you're insisting on creating fallback "catch-all" filter chain matches covering all possible combinations...

PiotrSikora on 12 Aug 2020

@PiotrSikora He is not talking about Envoy filter. He is talking about TLS authentication, authz, etc. Long story short, what you are advocating is to have a very very long trie and that envoy will ensure that the memory usage and performance remains within the 5% threshold of what we have today with orig dst listeners?

The basic numbers we have is about this:
1000 services, about 100 unique ports in total wherein,
given an outbound port, it could be HTTP or TLS or TCP (linkerd style protocol sniffing)
for N services using the same outbound port (say 443),
p have marked protocol as http (so it will be 0.0.0.0:port)
q have unknown protocol (so sniffing required)
r have marked protocol as TCP (so we need a non wildcard listener x.x.x.x:port) [mostly k8s internal services]
s have protocol set as TLS, with one or more of them having CIDR match [traffic exiting to external services on the web]. Note that traffic to these will have ALPNs set by the client libraries.
finally one final fallback on 0.0.0.0:port that catches any unmatched traffic (the telemetry will be very different here)

today, we have 1 listener for set p with two filter chains (one sniffed as HTTP and the other as catch all traffic), q listeners for the set q with two filter chains each, r listeners for the set r, and 1 <= i <= s listeners for the set s depending on whether we know the VIP/cidr of the external service or not.

total listeners = 1+q+r+s (upper bound is N)
total filter chains across all listeners = 2+2q+r+s

I may be off the mark in some numbers here (pretty sure we have more filter chains).

So with this info, @PiotrSikora can you elaborate how many filter chains we would need in the outbound path for a given port that N services are sharing?

After having gone through relevant code in Linkerd's proxy, the envoy filter chain matching design seems to be a lot more complicated for the exact same functionality I described above, and requires lot more configuration for control planes to program the desired behavior. Judging by the intense back and forth in this thread and the push backs, it seems there is a design rationale for this complexity tradeoff. So it would be good to understand this. We are doing all this so that we can transparently insert the sidecar into any application pod/VM without any modification required at the application/pod level. Linkerd proxy seems to be achieving this goal of transparent proxy in performant manner while we are still hitting rough edges in the data plane trying to configure Envoy to behave as a transparent proxy [case in point is this issue. Our fallback logic is completely broken as the Salesforce people discovered in production]. So, if there is something we can do to configure envoy without having to pump out 100s of MBs of config for each sidecar, it would be really helpful.

rshriram on 13 Aug 2020

Again, there is no pushback, I'm just trying to explain that what you need can be achieved using existing implementation.

You're more than welcome to either suggest a better selection algorithm or send a PR with the implementation.

PiotrSikora on 13 Aug 2020

@PiotrSikora I have suggested a better (or different, at least :slightly_smiling_face: ) selection mechanism: https://github.com/envoyproxy/envoy/issues/12572#issuecomment-672355207. I think @htuch may have responded "I think this makes sense" in response to that, but I am not sure if it was about changing the implementation or about the docs comment, so I could be wrong. If we can get some consensus on what the desired behavior is then I think we can work on changing the implementation.

One concern I do have with changing the implementation is this seems like a breaking change - not sure how Envoy typically handles things like this, but we would probably need some flag or new field or similar I would imagine?

I am not very familiar with contributing to Envoy - what are the next steps? A more formal proposal, discuss in community meeting, send out a PR, ..?

howardjohn on 13 Aug 2020

@PiotrSikora I have suggested a better (or different, at least 🙂 ) selection mechanism: #12572 (comment). I think @htuch may have responded "I think this makes sense" in response to that, but I am not sure if it was about changing the implementation or about the docs comment, so I could be wrong. If we can get some consensus on what the desired behavior is then I think we can work on changing the implementation.

I think that @htuch said "this makes sense" about the improved docs, not about backtracking. But he can confirm that.

One concern I do have with changing the implementation is this seems like a breaking change - not sure how Envoy typically handles things like this, but we would probably need some flag or new field or similar I would imagine?

I am not very familiar with contributing to Envoy - what are the next steps? A more formal proposal, discuss in community meeting, send out a PR, ..?

Usually, breaking changes are introduced by adding a new field (e.g. filter_chain_backtracking_match). It's definitely worth discussing the algorithm and implementation before working on a PR.

PiotrSikora on 13 Aug 2020

Yeah, I think we can both improve docs and consider a new field/algorithm for folks who want efficient support for wildcards at arbitrary levels, e.g. filter chains with destination port with literals and wildcards, most specific matches. The wildcard matching would support some of the intuition that the Istio folks are looking for and avoid some potential combinatorial blowout in the configuration.

However, I'd caution that we're setting ourselves up for a harder to reason about FCM evaluation. Today it is O(N) where N is the number of levels. I think that in the worst case with the wildcard semantics, we might end up being O(2^N).

htuch on 13 Aug 2020

If we do a dumb sequential match mode within FCM, wont the complexity be O(N) (where N is number of filter chains) ? We can add most specific filter chains first, and then the least specific ones in the end

rshriram on 13 Aug 2020

to clarify, with proper wildcard semantics, if we have sequential match, then we get the same behavior as the listeners today, except with more conditions

rshriram on 13 Aug 2020

The current implementation is O(level) time complexity which utilize the assumption that all the subtree are equally good.

What is the case that
FC1 dest port 80, dest ip range 10.0.0.0/24
vs
FC2 dest port empty, dest ip range is [10.0.0.1, 10.0.0.2, 10.0.0.3] (ip range is repeated field)
when the traffic is
{dest port = 80, dest ip = 10.0.0.1}

Still FC1 is preferred IMHO.

This is saying, if we follow the preprocessing impl, FC2 could propagate the space complexity from the ip ranges to all the filter chains which has dedicated port.
Think of the below 100 chains
FC i (i in 1...99) {dest port i, dest ip range 10.0.0.0/24}
FC 100 {dest port empty, dest ip range has 200 unique ip]

With the current implementation, the overall decision tree have 300 branches, 100 from FC1 to FC 99, 200 branches from FC 100
With the new proposal the overall decision tree have 100 * 200 = 20000 branches

lambdai on 13 Aug 2020

If we do a dumb sequential match mode within FCM, wont the complexity be O(N) (where N is number of filter chains) ? We can add most specific filter chains first, and then the least specific ones in the end

No, it would be O(N * X), where X is the cost of 3 int + 2 CIDR + 3 string (inc. 1 wildcard) comparisons.

Also, a dumb sequential match is even worse at cross-polluting other filter chain rules, and it is order dependent, i.e. it would solve the "human readability" problem, but it would be a regression for everything else.

PiotrSikora on 13 Aug 2020

to clarify, with proper wildcard semantics, if we have sequential match, then we get the same behavior as the listeners today, except with more conditions

That's not true, unless you write the sequential match in an order that matches the logic of the current decision tree, but then I'm not sure what would be the benefit of doing that.

PiotrSikora on 13 Aug 2020

In the Istio meeting today I proposed something like this:

We still do sieve for the 8 criteria in same order, instead of just keep the best match ones, we keep all match (i.e. including wildcard)
When all sieves are done, there will be multiple filter chains left, chose the most matched FCM in the sieve (need a definition of "most matched"). If there's no FCMs left, reject with "no match".

This would be same user-facing behavior to back-tracking, can address wildcard in levels. But does not complicate implementation a lot. WDYT?

lizan on 13 Aug 2020

👍1

OK, what problem are we trying to solve here? Incremental / federation, fallback filter chains or "human readability"?

PiotrSikora on 13 Aug 2020

Sorry I meant same user-facing behavior to back-tracking, the goal here is for fallback filter chains, without adding much complexity. I don't think we aimed for "human readability".

lizan on 13 Aug 2020

Reorder make some cases easier but will definitely surface other corner cases. Even Envoy could support self define order, it should be the low priority.

lambdai on 13 Aug 2020

What about cross-polluting (that @howardjohn mentioned) and federation (that @kyessenov mentioned)? Are those not problems we want to solve? Backtracking makes both worse, IMHO.

PiotrSikora on 13 Aug 2020

I am not concerned with the cross-polluting case if we do Lizan's proposal. My main concern is that if someone injects a new match suddenly the others behave differently - someone injecting a new match and it taking precedent is perfectly fine, because they are explicitly defining the match so if the make it broad enough to capture certain types of traffic we should respect that

howardjohn on 13 Aug 2020

👍1

If we put complexity aside for now,
is there any solid proof that istio will benefit from a new implementation?

We only have piece of potential cases implying alternative match might be better.
However, from human's pespective, there is no straightforward answer to the below question
which is better FC to a traffic attrA = X and attrB = Y.

FC1: attrA = X
FC2: attrB = Y

lambdai on 13 Aug 2020

Hi folks, friendly request for a doc? This issue/comment thread is getting very long. :)

mattklein123 on 13 Aug 2020

I don't think we aimed for "human readability".

From the Istio doc, the main issue with the current implementation (which we presumably want to solve) is literally this:

It is not human friendly. There is a frequent question that a particular filter chain match satisfied the incoming traffic but the chain is not selected.

PiotrSikora on 13 Aug 2020

@mattklein123 https://docs.google.com/document/d/1Uda_lFRD2pKPWmSStMmba3lLYKVHIajBL1t9Mcyiod4/edit

PiotrSikora on 13 Aug 2020

@PiotrSikora "human-friendly" != "human readability", that question is literally just about how to address fallback/wildcard in levels.

lizan on 13 Aug 2020

Envoy: filter_chain_match difference between {} and null is confusing/under documented

All 64 comments

Related issues