Envoy: extract subset load balancer metadata matches from request

Created on 23 Jan 2018  ·  44Comments  ·  Source: envoyproxy/envoy

Per #2334, the idea is to allow metadata matches (used to select subsets via the subset load balancer) to be computed from request data. Specifically, the intent is to match request header values against a regular expression and allow matched groups to be used as metadata values.

Configuration
One question is how to encode this configuration. Two proposals have been made.

  1. Encode the regexes and metadata mappings in a structure parallel to the existing metadata_match field of RouteAction.
  2. Use the regexes already encoded in the regex field of RouteMatch and introduce an additional field to allow mapping matched groups to metadata.

Option 1 allows headers to be optionally matched and converted into match criteria for the purposes of load balancing. Several optional headers could be defined and the resulting match criteria (if any) applied to routing. This option may cause the same header to be matched against multiple regular expressions (one to match the route, and one to extract metadata values).

Option 2 requires producing a route for each valid combination of headers. N optional, independent headers that produce match criteria would require 2^N routes. For a given route regexes are only matched once.

In either case, care will have to be taken to make sure the match criteria delivered to the subset load balancer are correctly sorted.

Security Considerations
My analysis is that any configuration using dynamically generated match criteria for the subset load balancer can be written using static match criteria, given a sufficient number of routes. There may be need to limit the length of header values or extract match criteria to avoid large memory allocations in the request path (existing header limits may be sufficient). Allowing arbitrary matches might allow an attacker to craft special requests to probe for metadata matches that were not meant to be exposed (e.g. internal-use versions, etc.)

areload balancing enhancement help wanted

All 44 comments

@zuercher Do you have a preference between the two options? Either would work for my use case but option 1 is probably a better fit. This option also feels a little more consistent with options like cluster_header in RouteAction since both are driving upstream node selection based on headers (one to choose the cluster and the other to select a subset within the cluster).

Allowing arbitrary matches might allow an attacker to craft special requests to probe for metadata matches that were not meant to be exposed

In my use case, I'm looking to inject the necessary headers used for subset load balancing via a lua filter so I'd probably just remove any headers from the initial request that I use internally for routing to prevent probing.

I think the choice is down to performance. There are two parts:

  1. Matching. I thought option 2 would avoid re-matching regexes. But in the case where you have to produce 2^N routes for N optional headers, you end up re-evaluating regexes multiple times across routes. For the case where there's a single route has a required header (that's not repeated in another route), option 2 does have the advantage of only processing the regex once.

  2. Generating the match criteria. As long as you're able insert the metadata values as you generate them (as opposed to doing sorting pass afterwards) I think either is ok. Option 2 might make this slightly easer, but I don't think it's faster per se.

Since the number of times regex matches are needed is somewhat dependent on the use case, what about a third option where a new http filter is introduced that enables the addition of new headers based on regex processing of existing headers and then the metadata processing in the router doesn't need to do any regex - it simply pulls the metadata as-is from a configurable set of headers.

Loosly based on the example from #2334, the config might look something like:

"route_config" : {
  "name" : "local_route",
  "virtual_hosts" : [
    {
      "name" : "local_service",
      "domains" : ["*"],
      "routes": [
        {
          "match" : { "prefix" : "/" },
          "route" : {
              "cluster" : "foo",
              "header_metadata_match" : {
                  "filter_metadata" : {
                      "envoy.lb" : {
                          "env" : "x-foo-env-name",
                          "loc" : "x-foo-location",
                          "stage" : "x-foo-stage"
                      }
                  }
              }
          }
        }
      ]
    }
  ]
},
"http_filters" : [
  {
    "name" : "envoy.header_rewrite",
    "config" : {
        "rewrite_rules" : [
            {
                "name" : ":authority",
                "match" : "([a-z]*)-[a-z]*)-foo-service",
                "add" : {
                    "x-foo-env-name" : "$1",
                    "x-foo-location" : "$2"
                }
            },
            {
                "name" : "x-do-something",
                "match" : "stage=([a-z]*)",
                "add" : {
                    "x-foo-stage" : "$1"
                }
            }
        ]
    }
  },
  {
    "name" : "envoy.router"
  }
]

We do the regex processing once in the new filter and then the metadata is just grabbing the header values directly without any need for regex. If header matching is required for route selection, you could eliminate the need to have a duplicate regex and instead have the match section reference a header you processed in the new filter. Potentially the header match could be expanded to support matching strictly based on presence of a header rather than matching a specific value to avoid possibly needing to add an extra header simply for matching.

I think this approach potentially gives the flexibility to avoid duplicate regex matches in all use cases.

The new filter isn't event strictly necessary because it could be handled by the lua filter although performance would likely be better if handled in a C++ filter.

@ufodone I think this approach makes sense. BTW, when rewriting header values, from where does $1 and $2 come from?
Thinking further, this can be used to select a cluster also (not just hosts with in the cluster) dynamically. Let us say for example, we want to route requests based on cluster_header and cluster_header value needs to be derived from :authority , we could write a rule like the following

"route_config" : {
  "name" : "local_route",
  "virtual_hosts" : [
    {
      "name" : "local_service",
      "domains" : ["*"],
      "routes": [
        {
          "match" : { "prefix" : "/" },
          "route" : {
              "cluster_header" : "foo_header",
          }
        }
      ]
    }
  ]
},
"http_filters" : [
  {
    "name" : "envoy.header_rewrite",
    "config" : {
        "rewrite_rules" : [
            {
                "name" : ":authority",
                "match" : "([a-z]*)-[a-z]*)-foo-service",
                "add" : {
                    "foo_header" : "$1" // Here foo_header can be derived from :authority header
                }
            }
        ]
    }
  },
  {
    "name" : "envoy.router"
  }
]

So in the above example, a cluster is selected based on foo_header attribute and foo_header attribute is computed based on ":authority" header, added dynamically by this new filter.
What do you think?

The $1, $2 refer to the matched sub-expressions from the regex. I pulled the example directly from the other ticket. It might make more sense to use \1 and \2 to be consistent with typical regex syntax?

I hadn't considered using this for cluster_header as well but it makes sense. There are probably other use cases where the same pattern could apply.

The side-effect of this approach is that these extra headers will be passed to the upstream host. That probably isn't generally an issue but in many cases this is really an internal detail of Envoy and it might be preferred not to leak that info upstream. I wonder if it would be worth having a way to add "private" headers which are used internally but not propagated upstream? Or alternately have a different way of passing info between filters and the router?

Thanks for taking the time to think it though. In general, this looks like a good approach to me.

One odd bit is that the Metadata message in the API contains a map of strings to protobuf Struct messages but in the case of the header_metadata_match field, the values can really only be strings (header names). I don't think there's necessarily any better type to use, though.

@ufodone I think one way to not pass these header values to upstream host is may be naming them with a different name space like envoy.router.foo_header so that router filter can remove them after route decisions has been made?

I knew there were namespaces for metadata but wasn't aware that headers could also be namespaced such that the router could remove them. But if that's the case, it seems like it would do the trick.

@zuercher @ufodone what do you think of something like https://github.com/envoyproxy/envoy/pull/3254 as a workaround? This would at least — for now — allow us to do this from a filter. And even with regex support, some more advanced use cases would need to be able to do this from a filter.

So I am curious on your thoughts of using RequestInfo.setDynamicMetadata() to tag a request and then construct MetadataMatchCriteria from that.

@rgs1 I'll have to look into the details a bit more but at first glance I think I could use to to fulfill my use case. I'd originally wanted to avoid writing a custom filter so that a vanilla envoy executable could be used but that's probably less important now as the deployment of such instances will be more contained than I'd originally thought.

I'd been planning on using a LUA filter so I suppose that it would be fairly easy to expose RequestInfo.setDynamicMetadata() to the LUA filter and that would eliminate my need to build a C++ filter (although I may eventually find that I want a C++ one for performance reasons anyway).

I am in need of this feature as part of our envoy migration from V1 to V2 API. As in V1 we were routing based on cluster_header
https://www.envoyproxy.io/docs/envoy/latest/api-v1/route_config/route.html?highlight=cluster_header
Now in V2 API we are in need to provide dynamic value for metadata match criteria.

Do we have any timeline when this fix will be available ?

@mattklein123
Thanks for the response.
Do we also support dynamic value in metadata match criteria ? In my use case I need to match ":path" in route metadata_match, something like below...

                   "route": {
                          "cluster": "some_service",
                          "metadata_match": {
                            "filter_metadata": {
                              "envoy.lb": {
                                "url":      ":path",
                                "type":   "dev"
                              }
                            }
                          }
                          }

I tried the above configuration but it did not work.

@sahanaha sorry do you just need to match on the path header? Can you describe what you are trying to accomplish exactly?

@mattklein123

Yes, idea here is to dynamically match the :Path value in route metadata with the host metadata ex:

"route": {
"cluster": "some_service",
"metadata_match": {
"filter_metadata": {
"envoy.lb": {
"url": ":path", //match the dynamic value with below EDS response
"type": "dev"
}
}
}
}

and my host metadata from EDS response would be something like below..

{
   "version_info": "0",
   "resources": [
   {
     "@type": "type.googleapis.com/envoy.api.v2.ClusterLoadAssignment",
     "cluster_name": "some_service",
     "endpoints": [
     {
      "lb_endpoints": [
      {
       "endpoint": {
        "address": {
         "socket_address": {
          "address": "host_ip1",
          "port_value": 80
         }
        }
       },
       "metadata": {
        "filter_metadata": {
          "envoy.lb" : {
“url” : "/v1/heartbeat”, //match the URL with :path & route it.
           "type": "dev"
         }
        }
       }
      },
      {
       "endpoint": {
        "address": {
         "socket_address": {
          "address": "host_ip3",
          "port_value": 80
         }
        }
       },
       "metadata": {
        "filter_metadata": {
          "envoy.lb" : {
“url” : "/v1/heartbeat123”, //match the URL with :path & route it.
           "type": "test"
         }
        }
       }
      }

]
    }
   ]
  }
 ]
}

So that I just define a single metadata_match in route configuration, as you see metadata value for url is different for each host, however I want to configure a single route match_criteria where url value is dynamic.

@sahanaha sorry don't know off the top of my head. @zuercher is this is the same issue as the original issue?

@sahanaha we achieve this with a filter, by doing something like:

ProtobufWkt::Struct keyval;
ProtobufWkt::Value val;

const auto* header_entry = headers.get(header);
auto header_value = std::string(header_entry->value().getStringView());

val.set_string_value(header_value);
(*keyval.mutable_fields())["url"] = val;  // url is the key from your example

callbacks.requestInfo().setDynamicMetadata("envoy.lb", keyval);

However, it would be nice to be able to express this via config.

@rgs1
Thanks! I will work on adding the above mentioned filter to suite my requirement.

Do we have any ETA to achieve dynamic metadata matching via config ?

@rgs1

I have 2 questions..

1 with the above filter example you have given how does my route metadata_match configuration looks like ? Should I remove metadata_match config from route ?

2 can we achieve setting dynamic metadata value with LUA filter ?

I was just exploring the LUA filter. I see that we can extract the ":path" value from the header. Is there any way that we can set :path value extracted from LUA filter into the metadata_match ?

"route": {
"cluster": "some_service",
"metadata_match": {
"filter_metadata": {
"envoy.lb": {
"url" : "/v1/heartbeat", // Not sure how to set "url" from the :path value in LUA filter below
"type": "dev"
}
}
}
}

"http_filters": [
{
"name": "envoy.lua",
"config": {
"inline_code": "
function envoy_on_request(request_handle)
headers = request_handle:headers()
request_handle:logTrace(headers:get(\"\:path\")) //this extracts the value which I need for "url"
end"
}
},
{
"name": "envoy.router",
"config": {}
}
],

Since I am able to extract the :path value in inline function, is there a way that I can assign this value back to metadata match, please share any config example that you have done this way.

@sahanaha:

1) that's correct, you don't need metadata_match per route anymore because the dynamic metadata that's set by the filter is matched directly against the endpoints' metadata. If you look at https://github.com/envoyproxy/envoy/commit/2d2506761a3c9d03594619f7c6cc8596ddb59950, you'll see that the request's metadata overrides the route's metadata (if any).

2) i haven't played with LUA filters, but if the requestInfo object is reachable and you can call setDynamicMetadata, you are good to go...

@sahanaha
Unfortunately the LUA filter included with Envoy doesn't expose access to the dynamic meta data feature that was added. Would be great if it was...
I had the same use case, and hacked this together by modifying lua_filter.cc/h (this is a rough outline).

`//Expose requestInfo in Decoder/Encoder callbacks
_RequestInfo::RequestInfo& getRequestInfo() override { return callbacks_->requestInfo(); }

//Add to lua exported functions
{"dynamicMetadata", static_luaDynamicMetadata}};

//Impelment the lua call with something like
int StreamHandleWrapper::luaDynamicMetadata(lua_State* state) {
ASSERT(state_ == State::Running);

const char* key = luaL_checkstring(state, 2);
const char* value = luaL_checkstring(state, 3);

ENVOY_LOG(info, "Dynamic metadata key={} value={} ", key, value);

auto keyval = MessageUtil::keyValueStruct(key, value);
callbacks_.getRequestInfo().setDynamicMetadata("envoy.lb", keyval);

return 0;
}`

You then can just add the following line to your LUA script
somevalue = request_handle:headers():get(":path")
request_handle:dynamicMetadata("somekey", "somevalue");

Hope that helps. I have dreams of contributing something like that back, but time is lacking, and we haven't decided if we're going this route or not.

@moss94 stealing your idea, I have a WIP in here: https://github.com/dio/envoy/commit/293fcaccf4dbcac62c3fc59bf916c2636ba51454

I put the API as follows

function envoy_on_request(request_handle)
  request_handle:requestInfo():dynamicMetadata():set("envoy.lb", "foo", "bar")
  request_handle:requestInfo():dynamicMetadata():get("envoy.lb")["foo"]
end

As exhibited in the test file here. Not sure if we should default to "envoy.lb" key whenever possible.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lps0535 picture lps0535  ·  3Comments

jmillikin-stripe picture jmillikin-stripe  ·  3Comments

justConfused picture justConfused  ·  3Comments

karthequian picture karthequian  ·  3Comments

weixiao-huang picture weixiao-huang  ·  3Comments