Linkerd2: Support RSA certificates from external PKI

Created on 1 May 2020  ยท  16Comments  ยท  Source: linkerd/linkerd2

Feature Request

We would like to onboard to using an external CA, but unfortunately they do not support ECDSA, only RSA:

Error: failed to read CA from linkerd-identity-issuer: must use ECDSA for public key algorithm, instead RSA was used

What problem are you trying to solve?

Using an external CA, so we don't have to rely on certificates generated within the cluster.

How should the problem be solved?

Preferably by supporting RSA certs in linkerd identity.

How would users interact with this feature?

Same way as they do with external certs: create the TLS secret, and point linkerd at it.

arecontroller help wanted

Most helpful comment

Update: I've done some comparisons and loaded them onto raintank, where they will be viewable until Aug 2. The first set of links uses external CA with RSA 2048 bit keys and the second set uses external CA with ECDSA P-256 curve keys. I ran the test in a minikube environment on my Surface Book 2 running Manjaro Linux. The tests ran for about two hours each with linkerd installed and emojivoto demo installed with linkerd injection. I tried to keep an even load on my processors during both tests.

Brief breakdown (RSA vs ECDSA):

  • linkerd identity deployment ~0.004-0.006 % CPU and ~90-100MiB memory for both
  • emojivoto total deployments ~0.75% CPU and ~350-380MiB vs ~380MiB memory
  • linkerd identity p95 latency ~0.95-9ms with average around 3.8 ms for both
  • emojivoto p95 latency average ~1ms for voting and emoji deployments and ~20ms for web deployment for both

Seems that there isn't really a difference between using RSA vs ECDSA for root and issuer CA ciphers.

RSA 2048

Kubernetes / Compute Resources / Namespace (Workloads) / linkerd

Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto

Linkerd Namespace / linkerd

Linkerd Namespace / emojivoto

ECDSA P-256

Kubernetes / Compute Resources / Namespace (Workloads) / linkerd

Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto

Linkerd Namespace / linkerd

Linkerd Namespace / emojivoto

All 16 comments

@dwj300 your external CA does not have ECDSA support?

So, I think the main thing we'd need to validate is whether this change can be limited to the identity service without impacting how proxies generate certificates. We really want the proxies to have a very minimal surface of crypto-related config.

For context, ECDSA was chosen to minimize memory & transfer overhead.

@grampelberg, as of now, the Azure internal CA does not. I am pushing them to support it, but it has yet to get funded.

@olix0r maybe, dumb question, but doesn't the identity service generate certs? Which certs do the proxies generate? Or do you just mean the CSRs.

@dwj300 ahh, makes sense.

@dwj300 Sorry, I mean the Key and CSR. The question is can the ECDSA CSR be signed by the RSA issuer key? If that works, I think we could relax some of the service-side checks. We'd need to validate that the proxies can still verify the certs, obviously; and it would be helpful to understand how the resulting certs differ (especially in terms of size). But I think we'd be open to the change if we can more-or-less isolate it to the identity service. If broader changes are necessary, we'd want to get very clear on what those changes are...

good question(s). I can do some digging...

Hey,
Any update on this issue?

@liorhasson no one is actively working on this right now AFAIK.

I'm gonna hop on this. I'll provide an update of what I find after I do some digging. Then if it seems worthwhile I can move towards a pr.

@han-so1omon Great! Please be sure to read Oliver's comments from May 1st for what you we'd need to understand before moving to a PR.

Yes, I have been specifically thinking about this. My initial thought is that the different hash formulas, (i.e. different mathematical subgroups and different verification algorithms), for RSA vs ECDSA would mean that any key, public or private, would be specific to the type of cryptography used.

I haven't looked enough at linkerd identity service, but I suspect that the library used for the identity issuer supports both. I'll see what that means in terms of the ability to isolate to only changing the identity service.

I have an update on changing signature algorithms at different points in the certification chain as well as some initial investigations into necessary code changes to support RSA.

About certification

From what I've seen, a CA that uses one signature algorithm is able to sign a certificate that uses a different signature algorithm. As of TLS 1.3, the client sending the request specifies valid signature algorithms that the CAs should choose from, and that CAs on the same chain may choose independently.

About changes to linkerd

I'm doing my testing in a minikube environment. I've so far only had to change 5-10 lines between the following files to support RSA through the linkerd install command:
pkg/tls/codec.go, pkg/tls/cred.go, pkg/issuercerts/issuercerts.go, pkg/healthcheck/healthcheck.go

Upon kubectl apply, using the RSA keys causes container linkerd-proxy to be unable to start. This can be seen with the linkerd check --verbose command with DEBUG[0279] Retrying on error: pod/linkerd-controller-... container linkerd-proxy is not ready. Everything works normally if the keys are ECDSA p-256.

As a new contributor and not an expert on the architecture, I would guess that this has something to do with the data-plane's hard-coded, ECDSA-only support in github.com/linkerd2/linkerd2-proxy/linkerd/identity/src/lib.rs. It could also be due to a webhook on the control-plane side.

From the kubectl logs -n linkerd linkerd-controller-... -c linkerd-proxy:

[     0.107075474s]  INFO linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.107157548s]  INFO linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.107175806s]  INFO linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.107188262s]  INFO linkerd2_proxy: Tap interface on 0.0.0.0:4190
[     0.107199953s]  INFO linkerd2_proxy: Local identity is linkerd-controller.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.107229214s]  INFO linkerd2_proxy: Identity verified via linkerd-identity.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.107244395s]  INFO linkerd2_proxy: Destinations resolved via linkerd-dst.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.108681858s]  INFO outbound: linkerd2_app_outbound: Serving addr=127.0.0.1:4140
[     0.109051778s]  INFO inbound: linkerd2_app_inbound: Serving addr=0.0.0.0:4143
[     5.160332955s]  WARN trust_dns_proto::xfer::dns_exchange: io_stream hit an error, shutting down: io error: Connection reset by peer (os error 104)
[    11.764696688s]  WARN trust_dns_proto::xfer::dns_exchange: io_stream hit an error, shutting down: io error: Broken pipe (os error 32)
[    28.593559685s]  WARN trust_dns_proto::xfer::dns_exchange: io_stream hit an error, shutting down: io error: Connection reset by peer (os error 104)

I have three questions that would help me in debugging:
1) What do people think is the issue here?
2) The log level on the linkerd-proxy container is info. How could I change that to debug?
3) What is the fastest way to regenerate linkerd containers? I've been using DOCKER_TRACE=1 bin/mkube bin/docker-build whenever kubernetes complains that the right container versions can't be found, but that is a slow process on my laptop.

You can track my code so far in han-so1omon/linkerd2 branch: feature/RSASupport.

@han-so1omon You can change the proxy log level following the instructions here. Did the identity component and its proxy come up fine with your changes?

As for rebuilding the container images, if memory serves, the bin/docker-build spends most of its time building the CLI binaries for different platforms. Since you are working mainly with the control plane, try with just bin/docker-build-controller.

@ihcsim It appears that the inital pods stall before the constituent containers complete their setup processes. Based on the error received during the linkerd check, I can see that linkerd-proxy container is not setup and running. With my changes thus far, everything works the same for all valid cases that use ECDSA keys for trust-anchor and issuer. The proxy container only fails to start up if the user-provided pki resources use RSA keys.

Below I show the state of linkerd and minikube after running the install + apply commands:

$ bin/go-run cli install --identity-trust-anchors-file rsa-trust-anchors.pem --identity-issuer-certificate-file rsa-crt.pem --identity-issuer-key-file rsa-key.pem | kubectl apply -f -

# resources created ...

$ bin/go-run cli check
kubernetes-api
--------------
โˆš can initialize the client
โˆš can query the Kubernetes API

kubernetes-version
------------------
โˆš is running the minimum Kubernetes API version
โˆš is running the minimum kubectl version

linkerd-existence
-----------------
โˆš 'linkerd-config' config map exists
โˆš heartbeat ServiceAccount exist
โˆš control plane replica sets are ready
โˆš no unschedulable pods
โˆš controller pod is running
โˆš can initialize the client
โˆš can query the control plane API

linkerd-config
--------------
โˆš control plane Namespace exists
โˆš control plane ClusterRoles exist
โˆš control plane ClusterRoleBindings exist
โˆš control plane ServiceAccounts exist
โˆš control plane CustomResourceDefinitions exist
โˆš control plane MutatingWebhookConfigurations exist
โˆš control plane ValidatingWebhookConfigurations exist
โˆš control plane PodSecurityPolicies exist

linkerd-identity
----------------
โˆš certificate config is valid
โˆš trust anchors are using supported crypto algorithm
โˆš trust anchors are within their validity period
โˆš trust anchors are valid for at least 60 days
โˆš issuer cert is using supported crypto algorithm
โˆš issuer cert is within its validity period
โˆš issuer cert is valid for at least 60 days
โˆš issuer cert is issued by the trust anchor

linkerd-api
-----------
ร— control plane pods are ready
    pod/linkerd-controller-6bfc96c754-jjdw7 container linkerd-proxy is not ready
    see https://linkerd.io/checks/#l5d-api-control-ready for hints

Status check results are ร—

$ kubectl get all

NAME                                          READY   STATUS    RESTARTS   AGE
pod/linkerd-controller-6bfc96c754-jjdw7       1/2     Running   0          90s
pod/linkerd-destination-58c68d6b7f-h2qp4      1/2     Running   0          89s
pod/linkerd-grafana-c797bc5bb-6rq4r           1/2     Running   0          84s
pod/linkerd-identity-6b4f678bcb-6xhnj         1/2     Running   0          90s
pod/linkerd-prometheus-645846bf99-8dhpr       1/2     Running   0          82s
pod/linkerd-proxy-injector-75fc77658f-4ztjp   1/2     Running   0          88s
pod/linkerd-sp-validator-5ddbc64594-pfxbk     1/2     Running   0          87s
pod/linkerd-tap-65454c4c79-fk9wc              1/2     Running   0          86s
pod/linkerd-web-8595f846cc-pw4tm              1/2     Running   0          89s

NAME                             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/linkerd-controller-api   ClusterIP   10.98.4.202     <none>        8085/TCP            90s
service/linkerd-dst              ClusterIP   10.111.12.180   <none>        8086/TCP            90s
service/linkerd-grafana          ClusterIP   10.97.23.57     <none>        3000/TCP            84s
service/linkerd-identity         ClusterIP   10.109.214.48   <none>        8080/TCP            90s
service/linkerd-prometheus       ClusterIP   10.108.4.124    <none>        9090/TCP            83s
service/linkerd-proxy-injector   ClusterIP   10.96.192.213   <none>        443/TCP             88s
service/linkerd-sp-validator     ClusterIP   10.104.43.247   <none>        443/TCP             87s
service/linkerd-tap              ClusterIP   10.102.86.101   <none>        8088/TCP,443/TCP    87s
service/linkerd-web              ClusterIP   10.103.23.57    <none>        8084/TCP,9994/TCP   89s

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/linkerd-controller       0/1     1            0           90s
deployment.apps/linkerd-destination      0/1     1            0           90s
deployment.apps/linkerd-grafana          0/1     1            0           84s
deployment.apps/linkerd-identity         0/1     1            0           90s
deployment.apps/linkerd-prometheus       0/1     1            0           83s
deployment.apps/linkerd-proxy-injector   0/1     1            0           89s
deployment.apps/linkerd-sp-validator     0/1     1            0           87s
deployment.apps/linkerd-tap              0/1     1            0           86s
deployment.apps/linkerd-web              0/1     1            0           89s

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/linkerd-controller-6bfc96c754       1         1         0       90s
replicaset.apps/linkerd-destination-58c68d6b7f      1         1         0       90s
replicaset.apps/linkerd-grafana-c797bc5bb           1         1         0       84s
replicaset.apps/linkerd-identity-6b4f678bcb         1         1         0       90s
replicaset.apps/linkerd-prometheus-645846bf99       1         1         0       82s
replicaset.apps/linkerd-proxy-injector-75fc77658f   1         1         0       89s
replicaset.apps/linkerd-sp-validator-5ddbc64594     1         1         0       87s
replicaset.apps/linkerd-tap-65454c4c79              1         1         0       86s
replicaset.apps/linkerd-web-8595f846cc              1         1         0       89s

NAME                              SCHEDULE     SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/linkerd-heartbeat   0 4 * * *    False     0        <none>          89s

I have another update. Everything seems to be working now. The problem had to do with the RSA keyset that I was using, so I regenerated the keys and problem solved. Rookie mistake, but ok.

I have gone through the setup and emojivoto demo processes specified in BUILD.md and the test process specified at the top of TEST.md. @olix0r mentioned about doing some comparisons, so I think I will compare memory usage and latency of linkerd-identity and emojivoto-voting services with external CA using ECDSA vs RSA. Is there a recommended way to stress the network for testing?

I have also noticed that when running bin/go-run cli check, it is necessary for me to wait a minute or so after running bin/go-run cli install ... or else the check fails, showing that linkerd is not able to communicate with prometheus. This is the same whether the external PKI uses RSA or ECDSA. That's a different issue, but I'm bringing it up to make sure that it is documented.

Update: I've done some comparisons and loaded them onto raintank, where they will be viewable until Aug 2. The first set of links uses external CA with RSA 2048 bit keys and the second set uses external CA with ECDSA P-256 curve keys. I ran the test in a minikube environment on my Surface Book 2 running Manjaro Linux. The tests ran for about two hours each with linkerd installed and emojivoto demo installed with linkerd injection. I tried to keep an even load on my processors during both tests.

Brief breakdown (RSA vs ECDSA):

  • linkerd identity deployment ~0.004-0.006 % CPU and ~90-100MiB memory for both
  • emojivoto total deployments ~0.75% CPU and ~350-380MiB vs ~380MiB memory
  • linkerd identity p95 latency ~0.95-9ms with average around 3.8 ms for both
  • emojivoto p95 latency average ~1ms for voting and emoji deployments and ~20ms for web deployment for both

Seems that there isn't really a difference between using RSA vs ECDSA for root and issuer CA ciphers.

RSA 2048

Kubernetes / Compute Resources / Namespace (Workloads) / linkerd

Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto

Linkerd Namespace / linkerd

Linkerd Namespace / emojivoto

ECDSA P-256

Kubernetes / Compute Resources / Namespace (Workloads) / linkerd

Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto

Linkerd Namespace / linkerd

Linkerd Namespace / emojivoto

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ihcsim picture ihcsim  ยท  4Comments

adleong picture adleong  ยท  4Comments

briansmith picture briansmith  ยท  4Comments

coleca picture coleca  ยท  4Comments

geekmush picture geekmush  ยท  4Comments