Envoy: Revisit how we configure RE2 regex engine

Created on 29 Nov 2019  Â·  20Comments  Â·  Source: envoyproxy/envoy

See https://github.com/envoyproxy/envoy/pull/9171.

It turns out that program size can change between RE2 versions. Our original intention here was to have some kind of knob that would let users know if a configured regex is "too expensive." We should revisit if there are better ways of doing this.

cc @junyer @ramaraochavali @jmarantz

aresecurity enhancement help wanted

Most helpful comment

I'm very open to changes here, but it seems like it's going to be hard to satisfy everyone so maybe we need some more config? What about an option to just warn if a program is above a certain size and/or have a stat which outputs the max loaded regex program size for monitoring?

All 20 comments

@howardjohn FYI. We may have to revisit how we configure this in Pilot depending on how this goes.

Just to be clear, my intention was not to stop Envoy from using program size. The RE2 fuzzer, for example, uses program size and program fanout as "cost" signals. It's barely more advanced than looking at the length of the pattern string, but it does enable one to draw a line in the sand.

The memory budget is another mechanism of constraint. Most of it goes to the DFA state cache, which does permit a CPU–RAM tradeoff to be made, but that's just another way of saying that an expensive regular expression must be paid for in some way. It's worth configuring in combination with the other limits – especially if the default (8MiB) seems unnecessarily generous to you.

FWIW, I did some experiments and each .* increases the program size by 15, so one rule of thumb is that a program costs ~50 + 15 for each .*. I don't know what that means for execution time.

At the risk of derailing into inside baseball, https://github.com/google/re2/commit/6a86f6b3f4a6d797886cd4cf6bca73ed25b2d00a reduced the cost of . and will be in the next release of RE2. (I wanted https://github.com/envoyproxy/envoy/pull/9171 in place so that the Envoy tests wouldn't suddenly break.)

FYI the max program size defaulting to 100 was detrimental for github.com/GoogleCloudPlatform/esp-v2.

We were using GoogleRE2 from RouteMatch in route.proto to match OpenAPI v2 path templates. With the current regex logic we have, a customer ran into the max program size in a URL that had just 3 path templates. Our suite of integration tests did not catch this, as they used shorter URLs.

In the short term, perhaps we can temporarily raise the max program size. Or add a warning to the documentation mentioning that the entire Envoy config will be rejected if the program size is exceeded. This field can be problematic and is difficult to catch via testing.

I guess there are two questions @nareddyt

  1. when there is an RE failure due to program size, does the error message propagate nicely through the subsystems, enabling the user to bump up the program size if they dare?
  2. is the regex really that simple? Maybe it can be rewritten to be more efficient. Disclaimer: I haven't clicked through your links.

For question 2: Here is the path the user was trying to configure, where the terms in {} are path templates. (Modified to hide the user's real path):

/api/v1/resource1/{param1}/resource2/{param2}/resource3/{param3}/action

ESPv2 substitutes path templates (ex: {template1}) with the regex [^\/]+ so Envoy can match any path templates. It ends up being this regular expression. ([^\/]+ means any character except /):

/api/v1/resource1/[^\/]+/resource2/[^\/]+/resource3/[^\/]+/action$

This doesn't seem too complex to me. Is there a simpler regex we could have used?

For question 1: We didn't realize the consequences of that field, so we did not expose any way to configure the field. The user was able to see the error message in the logs, but unable to fix it. The fact that Envoy will reject the entire config due to one regex being long made this error even worse.

For reference, here is the full failure the user saw (modified to hide the user's real path):

2020-01-30 15:39:51.138 GETW0130 11:39:51.024 19 envoy] [19][config][external/envoy/source/common/config/grpc_mux_subscription_impl.cc:82] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) : regex \'/api/v1/resource1/[^\\/]+/resource2/[^\\/]+/resource3/[^\\/]+/action$\' RE2 program size of 104 > max program size of 100. Increase configured max program size if necessary.

Also note that this is logged as a warning (not error) by Envoy. This makes it a little harder to notice.

If this is causing customer issues I would suggest that we raise the default until we figure out a better way of doing what we want. Or perhaps raising the value but also warning if above a certain value?

There's nothing wrong with @nareddyt's regular expression, but here's how to reduce the program size:

Firstly, anchor the regular expression with ^. Any literal string prefix (e.g. /api/v1/resource1/) will be detached and so will no longer form part of the automaton, which means that it will take up zero program instructions.

Secondly, set the encoding option to Latin-1. Any character class (e.g. [^\/]) will no longer handle multibyte characters, which means that it will take up fewer program instructions.

Thanks @junyer -- I knew about anchoring. In fact elsewhere in Envoy we exploit the presence of anchoring in a regex pattern to build a map of first-word --> possibly-matching-regexes to greatly accelerate the evaluation of a collection of regexes. It's good to know that the RE2 engine itself also exploits this.

I did not know about setting the encoding option to latin-1. That looks pretty impactful too. It might be hard to know though if there's a multi-byte utf-8 input.

encoding could be set via a template parameter, I suppose, if some callers could be using UTF-8.

dot_nl (AKA (?s)) is worth setting when appropriate. Combined with Latin-1 encoding, it means that .* ≡ \C*, so it short-circuits the match instead of stepping over the rest of the input string. (Under such circumstances, ^prefix.* becomes a memcmp(3) call and a short-circuited match.)

never_capture is also worth setting when appropriate. It means that ( … ) ≡ (?: … ), so users writing the former can't block various optimisations with capturing groups that will never be used.

Another example from the field:

[2020-02-10 07:28:58.295][1][warning][config] [source/common/config/grpc_mux_subscription_impl.cc:82] gRPC config for type.googleapis.com/envoy.api.v2.RouteConfiguration rejected: regex '/static/[a-zA-Z0-9._-]{0,200}/[a-zA-Z0-9._-]{0,200}' RE2 program size of 2013 > max program size of 1000. Increase configured max program size if necessary.

I had hardcoded our ingress controller on the assumption that no one would need more than 1000 🙄

@mattklein123 Would you take a patch to remove the limit? I can see that it makes sense for some operational patterns, but when end users get to specify the regex directly, we don't always have a good feedback mechanism to report the problem.

@mattklein123 Would you take a patch to remove the limit? I can see that it makes sense for some operational patterns, but when end users get to specify the regex directly, we don't always have a good feedback mechanism to report the problem.

Can't you just set the limit to effectively unlimited (close to max int) in your product?

But this doesn't fix the root problem, where product owners don't notice this until a customer complains. As more people move away from envoy.config.route.v3alpha.RouteMatch.hidden_envoy_deprecated_regex, they may NOT realize the new SafeRegex field has a default limit that is fairly low (like me). And they will not see this discussion until after a customer complains.

We can at least make it clear in the documentation that exceeding this size will result in a config validation fail exception and the entire config will be rejected. This will make it more obvious that affect this arbitrary value has.

I can remove the limit or make the documentation clearer if needed.

I can remove the limit or make the documentation clearer if needed.

I will defer to @jmarantz here who originally requested the limit. I do like the idea of a limit of some type as this is an indication that the user may be doing something they probably should not be. Whether the limit becomes a warning, becomes unlimited, etc. I'm not sure.

From the perspective of someone running Envoy as a managed proxy for customers, I want to bound the amount of CPU they can consume. So I want there to be a limit that can be enforced in the engine.

It seems reasonable though to interpret a limit of 0 or -1 or INT_MAX or whatever to be unlimited and just not do the check in that case, and whoever owns the control-plane can default their setting to that. But it would effectively the same thing I think just to have the control-plane default to a very high number and leave the check in. Then no core functional change is needed, and in either case a control-plane change is needed.

Seem reasonable?

Yeh, we previously bumped the limit to 1000 and now bumped it to 1m.

From a control-plane PoV, I'd ideally want to have no effective limit and have envoy report some information about regex cost. I can expose that information in any number of ways.

One of the issues here is that it is not clear to me what program size means in terms of CPU and memory cost. For example,

$ ./re2sz '/static/[a-zA-Z0-9._-]{0,200}/[a-zA-Z0-9._-]{0,200}'
program size is 2013
$ ./re2sz '^/static/[a-zA-Z0-9._-]+/[a-zA-Z0-9._-]+'
program size is 15

Is the first regex really 100x more expensive (in some dimension) than the second?

The former compiles to bytecode containing 400 copies of the [a-zA-Z0-9._-] character class whereas the latter compiles to bytecode containing 2 copies of it. The former has the /static/ prefix attached whereas the latter has it detached. Both of these differences have effects on the number of program instructions and implications for the memory footprint of the DFA.

I'm very open to changes here, but it seems like it's going to be hard to satisfy everyone so maybe we need some more config? What about an option to just warn if a program is above a certain size and/or have a stat which outputs the max loaded regex program size for monitoring?

Assigning this over to @LisaLudique who is going to make relevant changes her w/ runtime config, warn, error, etc.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

justConfused picture justConfused  Â·  3Comments

zanes2016 picture zanes2016  Â·  3Comments

roelfdutoit picture roelfdutoit  Â·  3Comments

rshriram picture rshriram  Â·  3Comments

hzxuzhonghu picture hzxuzhonghu  Â·  3Comments