Envoy: Postgres filter: implement Postgres SSL termination and monitoring

Created on 25 Apr 2020  Â·  8Comments  Â·  Source: envoyproxy/envoy

Previous work

This issue elaborates on the general design of a Postgres filter proposed in #9107.

Background

Encrypting the communications with the database is a hard requirement in many environments. And while cryptography is currently very fast on modern hardware, it still imposes some penalty where it is executed. Particularly, establishing SSL connections is expensive.

RDBMs like Postgres have a primary-replicas architecture, where the former is the only instance that takes writes. Offloading SSL from the primary instance can help reduce the workload, and increase the room for vertical scalability of these services. Similarly, some connection poolers frequently used in combination with Postgres, like PgBouncer, are single-threaded and may saturate the CPU soon if they are dealing with frequent SSL connection establishment.

Those are good enough reasons to implement SSL termination in the Postgres filter for the Envoy proxy. It is very convenient also because certificate management can also be offloaded, and potentially handled via Envoy APIs (and existing tools that leverage them), without having to change Postgres configuration.

Moreover, the current version of the Postgres Envoy Filter implements several traffic inspection metrics that are useful for monitoring. But they (obviously) cannot peek into the SSL traffic, exposing these metrics only for unencrypted connections (as of today).

It might be argued that the CPU offloading advantage is not present in scenarios where Envoy is deployed as a sidecar within the same Pod as Postgres. While true, it doesn’t neglect anyway the advantages of monitoring metrics of the encrypted traffic, separation of concerns and API-based management of certificates.

Goals

  • Implement Postgres SSL Termination at the existing Envoy Proxy’s Postgres filter.
  • Adapt the filter to expose the same metrics it exposes now for unencrypted traffic also for the encrypted traffic terminated at Envoy.
  • Provide SSL configuration capabilities both via the config file and the Envoy APIs.
  • Optional (or next phase refinements). Provide (equivalent) support to Postgres’ advanced SSL configuration capabilities, like ssl_ciphers, ssl_ecdh_curve or ssl_min_protocol_version, among several others.

Non-goals

  • Support encrypted communications from Envoy to the upstream Postgres server. It is assumed that this communication will happen unencrypted.
  • Support client authentication via SSL certificates (this would be a goal for a future issue).

Implementation notes

Envoy does support SSL termination at the TCP layer. However Postgres SSL support does not happen at the TCP level, but rather at the application layer (Frontend/Backend Protocol). This questions the amount of existing infrastructure that may be reused. In any case, the same SSL library that Envoy uses, BoringSSL, will be used. It is a key requirement that TLS levels and general compatibility with Postgres SSL are appropriately tested.

The following diagram (note that the Postgres terminology of the FrontEnd-BackEnd protocol is used, instead of the usual Downstream/Upstream at Envoy) illustrates how the encrypted/unencrypted flows may work:

Diagram-PGClient_Envoy_PGServer-v1

Note that the “fake response” happens after SSL Request, where we will “imitate” the backend sending back the byte “S” to the frontend.

Ideally, we should leverage as much as possible all the SSL infrastructure available already in Envoy, and not create different configuration files/keys/APIs. Here is an example of how Envoy TLS is currently configured:

tls_context:
  common_tls_context:
    tls_certificates:
      - certificate_chain:
          filename: "/etc/example-com.crt"
        private_key:
          filename: "/etc/example-com.key"

Limitations

  • Potentially, some SSL versions or encryption mechanisms may not work, and a reduced set of options may be exposed. This should be fine as Postgres clients negotiate with the server the mechanisms to use. There could be, potentially, some limitations with some specific clients, but it is not expected to be a relevant issue.

  • SCRAM authentication with channel binding may not be used when proxying through Envoy, as the Postgres server will not be running in SSL mode, and Postgres implementation of channel binding uses tls-server-end-point.

References

arepostgres help wanted

Most helpful comment

I see very valid use cases for terminating SSL and having a plain text upstream connection, @davidfetter. For example, when using Envoy as a sidecar, and connecting to the upstream Postgres server via Unix Domain Sockets. Also offloading SSL certificate management to Envoy (and the management layers and software above) brings significant advantages (for example avoid Postgres restarts).

That doesn't prevent, however, that other use cases may establish a new SSL connection to upstream. In this case is beneficial to decode the protocol metrics, that current version of the extension doesn't support (only plain-text traffic).

All 8 comments

Would be nice if this can be used to make workloads behind envoy talk to an ssl forced postgres RDS so client code doesn't care about TLS. Currently it is impossible because of the "SSL request" and expected "S".

Hi @royantman So if I understand it correctly, the use case you are suggesting is a client -> Envoy unencrypted connection and then a Envoy -> upstream Postgres encrypted connection? This scenario would certainly not supported by this proposed design. But I'd like to understand better the use case, as:

  • Normally the Envoy->Postgres connection is equally or more trusted than the client -> Envoy, not the reverse.
  • This prevents the use case for using SSL certificates for authentication.
  • It kind of defeats the validation the client may do of the server's certificate, since the client will be SSL-unaware.

I guess implementing this would require some non trivial additional effort, so I'd like to understand if there's a strong use case behind it. Thank you!

FYI: https://github.com/envoyproxy/envoy/issues/9577 is requested before to properly support STARTTLS and there is a PoC to terminating STARTTLS.

With utmost respect, I ask you to reconsider the first non-goal of supporting encrypted communication to the PostgreSQL server.

Terminating TLS in the hope that the onward network is free of attackers is pretty similar to not using TLS at all. Is there some way you could, say, make reconnecting with TLS an optional feature with opt-in so that you're not obligating people to choose soft chewy center as the price for using this system?

I see very valid use cases for terminating SSL and having a plain text upstream connection, @davidfetter. For example, when using Envoy as a sidecar, and connecting to the upstream Postgres server via Unix Domain Sockets. Also offloading SSL certificate management to Envoy (and the management layers and software above) brings significant advantages (for example avoid Postgres restarts).

That doesn't prevent, however, that other use cases may establish a new SSL connection to upstream. In this case is beneficial to decode the protocol metrics, that current version of the extension doesn't support (only plain-text traffic).

Where I can find a POC for this topic ?

Where I can find a POC for this topic ?

Not sure if still working but the POC code is here: https://github.com/cpakulski/envoy/tree/issue/10942

I see very valid use cases for terminating SSL and having a plain text upstream connection, @davidfetter. For example, when using Envoy as a sidecar, and connecting to the upstream Postgres server via Unix Domain Sockets. Also offloading SSL certificate management to Envoy (and the management layers and software above) brings significant advantages (for example avoid Postgres restarts).

That doesn't prevent, however, that other use cases may establish a new SSL connection to upstream. In this case is beneficial to decode the protocol metrics, that current version of the extension doesn't support (only plain-text traffic).

+1 to these example use cases

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jmillikin-stripe picture jmillikin-stripe  Â·  3Comments

zanes2016 picture zanes2016  Â·  3Comments

vpiduri picture vpiduri  Â·  3Comments

karthequian picture karthequian  Â·  3Comments

weixiao-huang picture weixiao-huang  Â·  3Comments