Envoy: [UDP][Feature] UDP Content Token Routing

Created on 28 Dec 2019  Â·  9Comments  Â·  Source: envoyproxy/envoy

(Please consider this document a sacrificial draft. Feedback corrections and comments are very much appreciated, and likely warranted as I am only newley experienced with Envoy)

Objective

Be able to preemptively route a UDP session to a specific upstream entry in the cluster, based on content available (i.e. a token) within the UDP packet.

This is specifically useful for stateful endpoints in a cluster, such as a Dedicated Game Server for multiplayer games (which is my primary expertise), or VOIP/SIP backends utilise (I believe).

For this reason, any sort of random/round robin type load balancing is not effective, as we need to be able to specifically send a session to a specific cluster upstream endpoint.

Background

Articles

Presentations

Requirements and scale

Requirements:

  • Be able to add and remove from set of “client tokens” (arbitrary byte[]/string) to an upstream cluster endpoint
  • Envoy should have a configurable way to pull the client token from the incoming UDP packet contents. E.g. The token could be the last 1024 bytes of the UDP packet.

    • We have to pass the token this way, as UDP packets don’t have headers, so any extra information must be part of the byte[] payload of the UDP packet.

  • When a UDP packet is received by Envoy, it will:

    • Parse the client token out of the packet, based on the client token configuration above

    • Compare the token to the sets of upstream cluster endpoints, find the one that it matches to



      • If a match is found:





        • Configure a session to the matching upstream cluster endpoint, so that data can be sent back to the sending downstream client,



        • Send the UDP packet to the matched upstream endpoint.



        • If the token matches to a different upstream endpoint than previously, move the current session to the new upstream endpoint.





      • If there is no match, drop the packet, and end processing.



Use Cases

The specific use case that I want to cover is around Dedicated Game Servers for multiplayer games, but could potentially be applied to any sort of stateful system that uses a UDP stream as a communication protocol.

  • Can run Game Server on a private network, and only expose the Envoy proxy, thus reducing the surface area that is available to potential attackers.
  • Can have fine grained, real time control of who can access GameServers, and which ones through client token addition and removal

    • This means that bad actors can have their client tokens removed quickly, removing their access to the most sensitive part of multiplayer games infrastructure

  • Dedicated Game Servers are usually a single public IP and port, and can be single points of failure for a single multiplayer game session - and as such, are targeted for DDOS attacks.

    • Clients can distribute their traffic to multiple proxies, which is much harder to DDOS and take down, and provides redundancy.

  • Reduce public facing IP addresses (more of a problem for ipv4 than ipv6)
  • Standard UDP traffic statistic reporting through Envoy’s statistics collection.

Concerns / Questions

  • Is there a way we can do this without sending the token on every request? Since it may not be encrypted, the token can be seen in traffic. Maybe we could only send the token on initial request / network change? (or maybe we need to also have encryption?)

Design ideas

This is a sacrificial draft for a potential configuration for the content token routing:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 127.0.0.1
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: UDP
          address: 127.0.0.1
          port_value: 7650
      listener_filters:
        # our new type of udp router
        name: envoy.filters.udp_listener.udp_router
        typed_config:
          '@type': type.googleapis.com/envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig
          stat_prefix: service
          cluster: gameservers_cluster
  clusters:
    - name: gameservers_cluster
      connect_timeout: 0.25s
      type: STATIC
      # since our listener filter provides the routing
      lb_policy: CLUSTER_PROVIDED
      load_assignment:
        cluster_name: gameservers_cluster
        endpoints:
        # three potential game servers to connect to on localhost
        # but different ports.
        - lb_endpoints:
            - endpoint:
                metadata:
                  # client tokens are stored in the metadata, as struct key values
                  # When `true`, the token has access, when false or non existent, access is denied.
                  "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                    x7zs9: true
                    18z9y: true
                    j9zwk: true
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26000
        - lb_endpoints:
            - endpoint:
                metadata:
                  "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                    97zx9: true
                    18zyy: false # this client-token no longer has access
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26001
        - lb_endpoints:
            - endpoint:
                "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                  97ix0: true
                  16zyy: true
                  p6z9y: true
                  f6z3y: true
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26002

Concerns / Questions

  • Game Servers / Upstream client endpoints could potentially be added and removed in a very dynamic way (100’s added and 100’s removed at a time). Can Envoy handle this type of dynamic configuration rate of change?
  • A single game session could have potentially thousands of players per endpoint. That means that every upstream cluster endpoint could have thousands of client tokens associated with it. Can Envoy handle this extra amount of data.
  • Client tokens will be added and removed a rate much higher than that of Upstream endpoints - Can Envoy handle this type of dynamic configuration rate of change?

Alternatives considered

  • Being able to somehow preemptively create sessions based on sender IP/port information.

    • At initial pass, couldn’t find a way to implement this

    • Also, with games on mobile networks, especially - network changes are far more frequent than PC/Consoles. (Although you may have to re-auth anyway? May be worth discussion)

  • Being able to provide a token that identifies the upstream cluster endpoint specifically

    • This is a potential security concern, as any client with the upstream endpoint token has access, and you can’t revoke it

    • In reality, with the current design, you could do this anyway if you wanted to, but using a single token per endpoint.

arequic areudp design proposal help wanted

All 9 comments

One thing I'd like to see here is to be able to have a sort of fallback routing for when no specific token is configured.

I'd also like to add the explicit consideration that tokens could be any length and envoy shouldn't be opinionated on that.

Thanks for raising this @markmandel. This is actually a more general case of what needs to be done for https://github.com/envoyproxy/envoy/issues/1193 in which we need to route UDP packets based on the QUIC connection ID. I have some thoughts on how we can approach this and will reply back when I have some more time. cc @danzh2010

@mattklein123 glad to hear it has a more general application than the use cases I am thinking of as well.

I didn't think to look at how QUIC implements this! :man_facepalming: There is so much good prior art there for a variety of use cases (sessions, crypto, etc).

https://quicwg.org/base-drafts/draft-ietf-quic-transport.html#name-connections (for this also subscribed who want to read up)

As another data point, IoT protocols also rely on tokens for routing - and other things, like request/response matching, caching and congestion control. CoAP (rfc7252) is based on UDP and one I'm particularly interested in seeing work with Envoy. The way CoAP uses tokens is slightly different than the way Mark/QUIC is describing (it's more a request ID) but hopefully helpful in thinking about a generalized solution.

I would like to support hash policy in udp proxy.

The udp proxy does not support hash based lb algorithms perfectly because it does not provide LoadBalancerContext when choose a host.
So, the udp proxy with hash based lb algorithms will select a host by random manner.

I have investigated the tcp case and I found that it has the hash policy option.
So, I think that we can support it in udp case as well simply.

This does not depend on the incoming packet's content.

Here is the my draft version of implementation : chadr123@d95c3f5

Please give your opinions for my idea.
Thanks!!

@chadr123 can you open a PR where we can discuss? I want to make sure we built the API in a way that will allow for byte range hashing. I think this can just be a wrapper message with a oneof inside of it that initially just has the general hash policy, and then later we can add byte range hashing on the datagram. Thank you!

@chadr123 can you open a PR where we can discuss? I want to make sure we built the API in a way that will allow for byte range hashing. I think this can just be a wrapper message with a oneof inside of it that initially just has the general hash policy, and then later we can add byte range hashing on the datagram. Thank you!

Ok. I will open a PR soon. :)

In addition to the work that @chadr123 is planning to do, if that is combined with a filter similar to header-to-metadata from http and #12594, token-based routing will work.

Was this page helpful?
0 / 5 - 0 ratings