Envoy: [UDP][Feature] UDP Content Token Routing

Created on 28 Dec 2019 · 9Comments · Source: envoyproxy/envoy

(Please consider this document a sacrificial draft. Feedback corrections and comments are very much appreciated, and likely warranted as I am only newley experienced with Envoy)

Objective

Be able to preemptively route a UDP session to a specific upstream entry in the cluster, based on content available (i.e. a token) within the UDP packet.

This is specifically useful for stateful endpoints in a cluster, such as a Dedicated Game Server for multiplayer games (which is my primary expertise), or VOIP/SIP backends utilise (I believe).

For this reason, any sort of random/round robin type load balancing is not effective, as we need to be able to specifically send a session to a specific cluster upstream endpoint.

Background

Articles

Wikipedia: Game Servers
- The best definition of a dedicated game server in writing.
UDP vs. TCP
- Why UDP is so important for realtime multiplayer games

Presentations

Scaling Multiplayer Game Servers with Kubernetes
- First introductory section of this presentation covers the general architecture for multiplayer game servers.
Denial of Service Mitigation (Valve @ GDC)
- Good discussion of applying proxies to multiplayer, dedicated game servers and the problems they can be applied to.

Requirements and scale

Requirements:

Be able to add and remove from set of “client tokens” (arbitrary byte[]/string) to an upstream cluster endpoint
Envoy should have a configurable way to pull the client token from the incoming UDP packet contents. E.g. The token could be the last 1024 bytes of the UDP packet.
- We have to pass the token this way, as UDP packets don’t have headers, so any extra information must be part of the byte[] payload of the UDP packet.
When a UDP packet is received by Envoy, it will:
- Parse the client token out of the packet, based on the client token configuration above
- Compare the token to the sets of upstream cluster endpoints, find the one that it matches to
  - If a match is found:
    - Configure a session to the matching upstream cluster endpoint, so that data can be sent back to the sending downstream client,
    - Send the UDP packet to the matched upstream endpoint.
    - If the token matches to a different upstream endpoint than previously, move the current session to the new upstream endpoint.
  - If there is no match, drop the packet, and end processing.

Use Cases

The specific use case that I want to cover is around Dedicated Game Servers for multiplayer games, but could potentially be applied to any sort of stateful system that uses a UDP stream as a communication protocol.

Can run Game Server on a private network, and only expose the Envoy proxy, thus reducing the surface area that is available to potential attackers.
Can have fine grained, real time control of who can access GameServers, and which ones through client token addition and removal
- This means that bad actors can have their client tokens removed quickly, removing their access to the most sensitive part of multiplayer games infrastructure
Dedicated Game Servers are usually a single public IP and port, and can be single points of failure for a single multiplayer game session - and as such, are targeted for DDOS attacks.
- Clients can distribute their traffic to multiple proxies, which is much harder to DDOS and take down, and provides redundancy.
Reduce public facing IP addresses (more of a problem for ipv4 than ipv6)
Standard UDP traffic statistic reporting through Envoy’s statistics collection.

Concerns / Questions

Is there a way we can do this without sending the token on every request? Since it may not be encrypted, the token can be seen in traffic. Maybe we could only send the token on initial request / network change? (or maybe we need to also have encryption?)

Design ideas

This is a sacrificial draft for a potential configuration for the content token routing:

admin:
  access_log_path: /tmp/admin_access.log
  address:
    socket_address:
      protocol: TCP
      address: 127.0.0.1
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: UDP
          address: 127.0.0.1
          port_value: 7650
      listener_filters:
        # our new type of udp router
        name: envoy.filters.udp_listener.udp_router
        typed_config:
          '@type': type.googleapis.com/envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig
          stat_prefix: service
          cluster: gameservers_cluster
  clusters:
    - name: gameservers_cluster
      connect_timeout: 0.25s
      type: STATIC
      # since our listener filter provides the routing
      lb_policy: CLUSTER_PROVIDED
      load_assignment:
        cluster_name: gameservers_cluster
        endpoints:
        # three potential game servers to connect to on localhost
        # but different ports.
        - lb_endpoints:
            - endpoint:
                metadata:
                  # client tokens are stored in the metadata, as struct key values
                  # When `true`, the token has access, when false or non existent, access is denied.
                  "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                    x7zs9: true
                    18z9y: true
                    j9zwk: true
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26000
        - lb_endpoints:
            - endpoint:
                metadata:
                  "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                    97zx9: true
                    18zyy: false # this client-token no longer has access
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26001
        - lb_endpoints:
            - endpoint:
                "envoy.config.filter.udp.udp_router.v2alpha.UdpRouterConfig/tokens":
                  97ix0: true
                  16zyy: true
                  p6z9y: true
                  f6z3y: true
                address:
                  socket_address:
                    address: 127.0.0.1
                    port_value: 26002

Concerns / Questions

Game Servers / Upstream client endpoints could potentially be added and removed in a very dynamic way (100’s added and 100’s removed at a time). Can Envoy handle this type of dynamic configuration rate of change?
A single game session could have potentially thousands of players per endpoint. That means that every upstream cluster endpoint could have thousands of client tokens associated with it. Can Envoy handle this extra amount of data.
Client tokens will be added and removed a rate much higher than that of Upstream endpoints - Can Envoy handle this type of dynamic configuration rate of change?

Alternatives considered

Being able to somehow preemptively create sessions based on sender IP/port information.
- At initial pass, couldn’t find a way to implement this
- Also, with games on mobile networks, especially - network changes are far more frequent than PC/Consoles. (Although you may have to re-auth anyway? May be worth discussion)
Being able to provide a token that identifies the upstream cluster endpoint specifically
- This is a potential security concern, as any client with the upstream endpoint token has access, and you can’t revoke it
- In reality, with the current design, you could do this anyway if you wanted to, but using a single token per endpoint.

arequic areudp design proposal help wanted

Source

markmandel

All 9 comments

One thing I'd like to see here is to be able to have a sort of fallback routing for when no specific token is configured.

luna-duclos on 28 Dec 2019

I'd also like to add the explicit consideration that tokens could be any length and envoy shouldn't be opinionated on that.

luna-duclos on 28 Dec 2019

👍1

Thanks for raising this @markmandel. This is actually a more general case of what needs to be done for https://github.com/envoyproxy/envoy/issues/1193 in which we need to route UDP packets based on the QUIC connection ID. I have some thoughts on how we can approach this and will reply back when I have some more time. cc @danzh2010

mattklein123 on 29 Dec 2019

🎉1

@mattklein123 glad to hear it has a more general application than the use cases I am thinking of as well.

I didn't think to look at how QUIC implements this! :man_facepalming: There is so much good prior art there for a variety of use cases (sessions, crypto, etc).

https://quicwg.org/base-drafts/draft-ietf-quic-transport.html#name-connections (for this also subscribed who want to read up)

markmandel on 29 Dec 2019

As another data point, IoT protocols also rely on tokens for routing - and other things, like request/response matching, caching and congestion control. CoAP (rfc7252) is based on UDP and one I'm particularly interested in seeing work with Envoy. The way CoAP uses tokens is slightly different than the way Mark/QUIC is describing (it's more a request ID) but hopefully helpful in thinking about a generalized solution.

beriberikix on 22 Jan 2020

I would like to support hash policy in udp proxy.

The udp proxy does not support hash based lb algorithms perfectly because it does not provide LoadBalancerContext when choose a host.
So, the udp proxy with hash based lb algorithms will select a host by random manner.

I have investigated the tcp case and I found that it has the hash policy option.
So, I think that we can support it in udp case as well simply.

This does not depend on the incoming packet's content.

Here is the my draft version of implementation : chadr123@d95c3f5

Please give your opinions for my idea.
Thanks!!

chadr123 on 20 Aug 2020

@chadr123 can you open a PR where we can discuss? I want to make sure we built the API in a way that will allow for byte range hashing. I think this can just be a wrapper message with a oneof inside of it that initially just has the general hash policy, and then later we can add byte range hashing on the datagram. Thank you!

mattklein123 on 20 Aug 2020

@chadr123 can you open a PR where we can discuss? I want to make sure we built the API in a way that will allow for byte range hashing. I think this can just be a wrapper message with a oneof inside of it that initially just has the general hash policy, and then later we can add byte range hashing on the datagram. Thank you!

Ok. I will open a PR soon. :)

chadr123 on 21 Aug 2020

👍1

In addition to the work that @chadr123 is planning to do, if that is combined with a filter similar to header-to-metadata from http and #12594, token-based routing will work.