Linkerd2: The proxy should receive TLS support metadata from the Destination service

Created on 17 Feb 2018  路  19Comments  路  Source: linkerd/linkerd2

Along with the destination IP address and port, the destination service should indicate whether the service supports TLS using the conduit CA. I propose that this indicator be an enumeration something like this:

enum TLSSupport {
    // Assume it doesn't support TLS.
    Unknown = 0;

    // Assume it supports TLS using the default mesh settings.
    MeshDefault = 1;
}

Initially the Destination service should use TLSSupport::MeshDefault for any endpoint that has the conduit proxy injected into it (as determined from the EndPoints API), and TLSSupport::Unknown for destinations that aren't discovered by the EndPoints service.

This would make the Destinations service will be part of the TCB for the mesh. Later when routing is added we will revisit this, hopefully by fully moving the TLS decision making to routing.

areproxy aresecurity prioritP0

All 19 comments

Initially the Destination service should use TLSSupport::MeshDefault for any endpoint that has the conduit proxy injected into it (as determined from the EndPoints API)

Note: this should be _any endpoint that has the conduit proxy injected into AND is controlled by the same controller.

Based on discussion yesterday the current plan is to use TLS when the destination's controller namespace is equal to the current pod's controller namespace.

In the immediate term, the plan described yesterday is fine. We should encapsulate this logic in a single place that turns a destination endpoint pair (SocketAddr, labels) into a (SocketAddr, bool) where that boolean flag indicates whether the labels indicate that TLS will work. We should write some tests that take responses from the Destination service and verify that we're setting this flag correctly.

Later we'll need to support pods using different controllers communicating with each other, e.g. to handle the case where the user is trying to upgrade Conduit by creating a new namespace for the new version of the controller and then migrating pods incrementally to the new controller. In that scenerio, we don't want to lose TLS protections when such a migration happens. We will need to change how we handle the distribution of the the Conduit CA bundle and how we determine whether or not to use TLS (and what CA bundle to use, when we do) to account for this. However, we can do that separately, and it should be straightforward to do so if we encapsulate and isolate this logic as described in the preceding paragraph.

in a single place

Note that the flag needs to be available, ultimately, as an argument to Connection::connect().

@briansmith

Note that the flag needs to be available, ultimately, as an argument to Connection::connect().

Thanks, I was about to ask where you'd want this to be wired through to.

Another question:

We should encapsulate this logic in a single place that turns a destination endpoint pair (SocketAddr, labels) into a (SocketAddr, bool) where that boolean flag indicates whether the labels indicate that TLS will work.

In the issue description, you describe an enum with variants Unknown and MeshDefault, this implies that we might eventually want to add more variants describing additional TLS configurations. Is it sufficient for the TLS flag to be a bool, or should we use an enum now so it will be easier to add more variants in the future?

It's fine for it to be a boolean for now. Definitely we're going to change it later to address the longer-term issues, which will require more than a single enum can express anyway. The important thing for now is to to abstract away the labels and to test that our short-term namespace matching works as we expect.

Based on our conversation today, here is my proposal for the changes we intend to make to how we determine whether to use TLS for a given connection:

  1. Add a new field tls_verification_strategy (name not final) to the Destination service's WeightedAddrSet protobuf message:
message WeightedAddr {
  common.TcpAddress addr = 1;
  uint32 weight = 3;
  map<string, string> metric_labels = 4;


  TlsVerificationStrategy tls_verification_strategy = 5;
}

message TlsVerificationStrategy {
   KubernetesPod kubernetes_pod = 1;
   message KubernetesPod {
    string controller_namespace = 1;
    string pod_name = 2;
  }
}
  1. If TLS is enabled for a given pod as determined by the control plane, the Destination service will send the name and controller namespace of that pod in the tls_verification_strategy field. Otherwise, the field will not be set (nil/None).
  2. PR #1008 will be changed so that rather than providing the Connect function with a tls_enabled boolean, it will be provided with an Option<TlsVerificatonStrategy>.
  3. For a given endpoint, the TLS code in the proxy will determine whether to use TLS based on the following logic:

    • if the endpoint's tls_verification_strategy is Some(TlsVerificationStrategy::KubernetesPod), and the endpoint's controller namespace is equal to the CONDUIT_PROXY_CONTROLLER_NAMESPACE env var, then



      • the connection will use TLS


      • the proxy will set a tls="true" label



    • else



      • the connection will not use TLS


      • the proxy will set atls="false" label.



  4. The UI will be changed to use the value of the tls label, rather than the meshed label, to determine if a connection was secured.

Does this sound correct?

/cc @olix0r, @briansmith, @klingerf

Edit: changed to reflect @olix0r's suggestions

if the endpoint's tls_verification_strategy is Some(TlsVerificationStrategy::NameAndNamespace), then the proxy will set the meshed="true" label, ...
else, the proxy will set the meshed="false" and tls="false" labels

This is surprising. What does the meshed flag have to do with TlsVerificationStrategy? I would expect this decision to be entirely unrelated to this field; and only the tls field should be impacted.


Also, let me suggest some minor changes to your proposed protobuf:

message WeightedAddr {
  common.TcpAddress addr = 1;
  uint32 weight = 3;
  map<string, string> metric_labels = 4;
  TlsVerificationStrategy tls_verification_strategy = 5;
}

message TlsVerificationStrategy  {
  NameAndNamespace name_and_namespace = 1;

  message NameAndNamespace {
    string controller_namespace = 1;
    string pod_name = 2;
  }
}

The main benefit of this is that TlsVerificationStrategy messages need not be scoped under WeightedAddr -- there's nothing about the message type that needs to be scoped onto an individual address. Furthermore, oneofs need not be introduced early -- they can be introduced later once there are more variants.

Also, what does pod_name mean in this context? Is it specific to a kubernetes resource? If so, we should change the name to be KubernestesPodValidation or something. If it's not kubernetes-specific, we should avoid using pod_ prefixes

@olix0r

if the endpoint's tls_verification_strategy is Some(TlsVerificationStrategy::NameAndNamespace), then the proxy will set the meshed="true" label, ...
else, the proxy will set the meshed="false" and tls="false" labels

This is surprising. What does the meshed flag have to do with TlsVerificationStrategy? I would expect this decision to be entirely unrelated to this field; and only the tls field should be impacted.

This is a good point. That field should still be set based on the presence of a controller namespace label (if we still care about _having_ a meshed label in addition to the tls label?).

Also, what does pod_name mean in this context? Is it specific to a kubernetes resource? If so, we should change the name to be KubernestesPodValidation or something. If it's not kubernetes-specific, we should avoid usingpod_ prefixes

Also a good point. This strategy would be kubernetes-specific, as it uses the pod namespace. Will change the name to make that clear. 馃憤

Oh, also @olix0r:

Furthermore, oneofs need not be introduced early -- they can be introduced later once there are more variants.

I thought it was worth making this a oneof to begin with so that prost will generate a Rust enum with one variant, in the hopes of minimizing the necessary proxy changes if we want to add more variants later. WDYT?

@hawkw i think it's fine to make it a oneof to start, just noting that protobuf allows you to make a single field a oneof later without breaking wire compat

@briansmith, if possible, I'd like to get your approval on this proposal before going too much farther on this.

@hawkw Thanks for writing this up. I've added issues for the controller work in #1039 and #1040. I also just wanted to verify my understanding of how we're handling metric labels. Specifically:

  • The proxy will not set a meshed label
  • The proxy will only set a tls label if the connection uses TLS; the value of the label will be tls="true"
  • The proxy will not set a tls label if the connection does not use TLS

Is that consistent with your understanding?

It's my expectation that the _meshed_ label is still useful (and distinct from the _tls_ label), but that we may not have a use case for consuming this label yet. Unless it's costly to maintain it, I have a slight preference for leaving the _meshed_ label unchanged and only introducing a new label.

@olix0r

Unless it's costly to maintain it, I have a slight preference for leaving the _meshed _label unchanged and only introducing a new label.

Currently, there is no code for setting a meshed label in the proxy. By "leaving it unchanged", do you mean that we should not add this label, or that we _should_ add code to add that label?

@klingerf

  • The proxy will not set a meshed label
  • The proxy will only set a tls label if the connection uses TLS; the value of the label will be tls="true"
  • The proxy will not set a tls label if the connection does not use TLS

Modulo any decision regarding whether or not we set a meshed label, this is correct. Regardless of whether or not we eventually add such a label, it should _not_ indicate that a connection used TLS.

By "leaving it unchanged", do you mean that we should not add this label, or that we should add code to add that label?

My mistake. It's fine to omit for now and we can introduce this label if we need it later.

@olix0r Okay, thanks for clearing that up! 馃憤

I've renamed this issue, as the actual changes to the Destination service should now be tracked by #1040.

I believe this was completed a fairly long time ago. :)

Was this page helpful?
0 / 5 - 0 ratings