We would like to onboard to using an external CA, but unfortunately they do not support ECDSA, only RSA:
Error: failed to read CA from linkerd-identity-issuer: must use ECDSA for public key algorithm, instead RSA was used
Using an external CA, so we don't have to rely on certificates generated within the cluster.
Preferably by supporting RSA certs in linkerd identity.
Same way as they do with external certs: create the TLS secret, and point linkerd at it.
@dwj300 your external CA does not have ECDSA support?
So, I think the main thing we'd need to validate is whether this change can be limited to the identity service without impacting how proxies generate certificates. We really want the proxies to have a very minimal surface of crypto-related config.
For context, ECDSA was chosen to minimize memory & transfer overhead.
@grampelberg, as of now, the Azure internal CA does not. I am pushing them to support it, but it has yet to get funded.
@olix0r maybe, dumb question, but doesn't the identity service generate certs? Which certs do the proxies generate? Or do you just mean the CSRs.
@dwj300 ahh, makes sense.
@dwj300 Sorry, I mean the Key and CSR. The question is can the ECDSA CSR be signed by the RSA issuer key? If that works, I think we could relax some of the service-side checks. We'd need to validate that the proxies can still verify the certs, obviously; and it would be helpful to understand how the resulting certs differ (especially in terms of size). But I think we'd be open to the change if we can more-or-less isolate it to the identity service. If broader changes are necessary, we'd want to get very clear on what those changes are...
good question(s). I can do some digging...
Hey,
Any update on this issue?
@liorhasson no one is actively working on this right now AFAIK.
I'm gonna hop on this. I'll provide an update of what I find after I do some digging. Then if it seems worthwhile I can move towards a pr.
@han-so1omon Great! Please be sure to read Oliver's comments from May 1st for what you we'd need to understand before moving to a PR.
Yes, I have been specifically thinking about this. My initial thought is that the different hash formulas, (i.e. different mathematical subgroups and different verification algorithms), for RSA vs ECDSA would mean that any key, public or private, would be specific to the type of cryptography used.
I haven't looked enough at linkerd identity service, but I suspect that the library used for the identity issuer supports both. I'll see what that means in terms of the ability to isolate to only changing the identity service.
I have an update on changing signature algorithms at different points in the certification chain as well as some initial investigations into necessary code changes to support RSA.
From what I've seen, a CA that uses one signature algorithm is able to sign a certificate that uses a different signature algorithm. As of TLS 1.3, the client sending the request specifies valid signature algorithms that the CAs should choose from, and that CAs on the same chain may choose independently.
I'm doing my testing in a minikube environment. I've so far only had to change 5-10 lines between the following files to support RSA through the linkerd install command:
pkg/tls/codec.go, pkg/tls/cred.go, pkg/issuercerts/issuercerts.go, pkg/healthcheck/healthcheck.go
Upon kubectl apply, using the RSA keys causes container linkerd-proxy to be unable to start. This can be seen with the linkerd check --verbose command with DEBUG[0279] Retrying on error: pod/linkerd-controller-... container linkerd-proxy is not ready. Everything works normally if the keys are ECDSA p-256.
As a new contributor and not an expert on the architecture, I would guess that this has something to do with the data-plane's hard-coded, ECDSA-only support in github.com/linkerd2/linkerd2-proxy/linkerd/identity/src/lib.rs. It could also be due to a webhook on the control-plane side.
From the kubectl logs -n linkerd linkerd-controller-... -c linkerd-proxy:
[ 0.107075474s] INFO linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.107157548s] INFO linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.107175806s] INFO linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.107188262s] INFO linkerd2_proxy: Tap interface on 0.0.0.0:4190
[ 0.107199953s] INFO linkerd2_proxy: Local identity is linkerd-controller.linkerd.serviceaccount.identity.linkerd.cluster.local
[ 0.107229214s] INFO linkerd2_proxy: Identity verified via linkerd-identity.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.107244395s] INFO linkerd2_proxy: Destinations resolved via linkerd-dst.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.108681858s] INFO outbound: linkerd2_app_outbound: Serving addr=127.0.0.1:4140
[ 0.109051778s] INFO inbound: linkerd2_app_inbound: Serving addr=0.0.0.0:4143
[ 5.160332955s] WARN trust_dns_proto::xfer::dns_exchange: io_stream hit an error, shutting down: io error: Connection reset by peer (os error 104)
[ 11.764696688s] WARN trust_dns_proto::xfer::dns_exchange: io_stream hit an error, shutting down: io error: Broken pipe (os error 32)
[ 28.593559685s] WARN trust_dns_proto::xfer::dns_exchange: io_stream hit an error, shutting down: io error: Connection reset by peer (os error 104)
I have three questions that would help me in debugging:
1) What do people think is the issue here?
2) The log level on the linkerd-proxy container is info. How could I change that to debug?
3) What is the fastest way to regenerate linkerd containers? I've been using DOCKER_TRACE=1 bin/mkube bin/docker-build whenever kubernetes complains that the right container versions can't be found, but that is a slow process on my laptop.
You can track my code so far in han-so1omon/linkerd2 branch: feature/RSASupport.
@han-so1omon You can change the proxy log level following the instructions here. Did the identity component and its proxy come up fine with your changes?
As for rebuilding the container images, if memory serves, the bin/docker-build spends most of its time building the CLI binaries for different platforms. Since you are working mainly with the control plane, try with just bin/docker-build-controller.
@ihcsim It appears that the inital pods stall before the constituent containers complete their setup processes. Based on the error received during the linkerd check, I can see that linkerd-proxy container is not setup and running. With my changes thus far, everything works the same for all valid cases that use ECDSA keys for trust-anchor and issuer. The proxy container only fails to start up if the user-provided pki resources use RSA keys.
Below I show the state of linkerd and minikube after running the install + apply commands:
$ bin/go-run cli install --identity-trust-anchors-file rsa-trust-anchors.pem --identity-issuer-certificate-file rsa-crt.pem --identity-issuer-key-file rsa-key.pem | kubectl apply -f -
# resources created ...
$ bin/go-run cli check
kubernetes-api
--------------
โ can initialize the client
โ can query the Kubernetes API
kubernetes-version
------------------
โ is running the minimum Kubernetes API version
โ is running the minimum kubectl version
linkerd-existence
-----------------
โ 'linkerd-config' config map exists
โ heartbeat ServiceAccount exist
โ control plane replica sets are ready
โ no unschedulable pods
โ controller pod is running
โ can initialize the client
โ can query the control plane API
linkerd-config
--------------
โ control plane Namespace exists
โ control plane ClusterRoles exist
โ control plane ClusterRoleBindings exist
โ control plane ServiceAccounts exist
โ control plane CustomResourceDefinitions exist
โ control plane MutatingWebhookConfigurations exist
โ control plane ValidatingWebhookConfigurations exist
โ control plane PodSecurityPolicies exist
linkerd-identity
----------------
โ certificate config is valid
โ trust anchors are using supported crypto algorithm
โ trust anchors are within their validity period
โ trust anchors are valid for at least 60 days
โ issuer cert is using supported crypto algorithm
โ issuer cert is within its validity period
โ issuer cert is valid for at least 60 days
โ issuer cert is issued by the trust anchor
linkerd-api
-----------
ร control plane pods are ready
pod/linkerd-controller-6bfc96c754-jjdw7 container linkerd-proxy is not ready
see https://linkerd.io/checks/#l5d-api-control-ready for hints
Status check results are ร
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/linkerd-controller-6bfc96c754-jjdw7 1/2 Running 0 90s
pod/linkerd-destination-58c68d6b7f-h2qp4 1/2 Running 0 89s
pod/linkerd-grafana-c797bc5bb-6rq4r 1/2 Running 0 84s
pod/linkerd-identity-6b4f678bcb-6xhnj 1/2 Running 0 90s
pod/linkerd-prometheus-645846bf99-8dhpr 1/2 Running 0 82s
pod/linkerd-proxy-injector-75fc77658f-4ztjp 1/2 Running 0 88s
pod/linkerd-sp-validator-5ddbc64594-pfxbk 1/2 Running 0 87s
pod/linkerd-tap-65454c4c79-fk9wc 1/2 Running 0 86s
pod/linkerd-web-8595f846cc-pw4tm 1/2 Running 0 89s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/linkerd-controller-api ClusterIP 10.98.4.202 <none> 8085/TCP 90s
service/linkerd-dst ClusterIP 10.111.12.180 <none> 8086/TCP 90s
service/linkerd-grafana ClusterIP 10.97.23.57 <none> 3000/TCP 84s
service/linkerd-identity ClusterIP 10.109.214.48 <none> 8080/TCP 90s
service/linkerd-prometheus ClusterIP 10.108.4.124 <none> 9090/TCP 83s
service/linkerd-proxy-injector ClusterIP 10.96.192.213 <none> 443/TCP 88s
service/linkerd-sp-validator ClusterIP 10.104.43.247 <none> 443/TCP 87s
service/linkerd-tap ClusterIP 10.102.86.101 <none> 8088/TCP,443/TCP 87s
service/linkerd-web ClusterIP 10.103.23.57 <none> 8084/TCP,9994/TCP 89s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/linkerd-controller 0/1 1 0 90s
deployment.apps/linkerd-destination 0/1 1 0 90s
deployment.apps/linkerd-grafana 0/1 1 0 84s
deployment.apps/linkerd-identity 0/1 1 0 90s
deployment.apps/linkerd-prometheus 0/1 1 0 83s
deployment.apps/linkerd-proxy-injector 0/1 1 0 89s
deployment.apps/linkerd-sp-validator 0/1 1 0 87s
deployment.apps/linkerd-tap 0/1 1 0 86s
deployment.apps/linkerd-web 0/1 1 0 89s
NAME DESIRED CURRENT READY AGE
replicaset.apps/linkerd-controller-6bfc96c754 1 1 0 90s
replicaset.apps/linkerd-destination-58c68d6b7f 1 1 0 90s
replicaset.apps/linkerd-grafana-c797bc5bb 1 1 0 84s
replicaset.apps/linkerd-identity-6b4f678bcb 1 1 0 90s
replicaset.apps/linkerd-prometheus-645846bf99 1 1 0 82s
replicaset.apps/linkerd-proxy-injector-75fc77658f 1 1 0 89s
replicaset.apps/linkerd-sp-validator-5ddbc64594 1 1 0 87s
replicaset.apps/linkerd-tap-65454c4c79 1 1 0 86s
replicaset.apps/linkerd-web-8595f846cc 1 1 0 89s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/linkerd-heartbeat 0 4 * * * False 0 <none> 89s
I have another update. Everything seems to be working now. The problem had to do with the RSA keyset that I was using, so I regenerated the keys and problem solved. Rookie mistake, but ok.
I have gone through the setup and emojivoto demo processes specified in BUILD.md and the test process specified at the top of TEST.md. @olix0r mentioned about doing some comparisons, so I think I will compare memory usage and latency of linkerd-identity and emojivoto-voting services with external CA using ECDSA vs RSA. Is there a recommended way to stress the network for testing?
I have also noticed that when running bin/go-run cli check, it is necessary for me to wait a minute or so after running bin/go-run cli install ... or else the check fails, showing that linkerd is not able to communicate with prometheus. This is the same whether the external PKI uses RSA or ECDSA. That's a different issue, but I'm bringing it up to make sure that it is documented.
Update: I've done some comparisons and loaded them onto raintank, where they will be viewable until Aug 2. The first set of links uses external CA with RSA 2048 bit keys and the second set uses external CA with ECDSA P-256 curve keys. I ran the test in a minikube environment on my Surface Book 2 running Manjaro Linux. The tests ran for about two hours each with linkerd installed and emojivoto demo installed with linkerd injection. I tried to keep an even load on my processors during both tests.
Brief breakdown (RSA vs ECDSA):
Seems that there isn't really a difference between using RSA vs ECDSA for root and issuer CA ciphers.
Kubernetes / Compute Resources / Namespace (Workloads) / linkerd
Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto
Kubernetes / Compute Resources / Namespace (Workloads) / linkerd
Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto
Most helpful comment
Update: I've done some comparisons and loaded them onto raintank, where they will be viewable until Aug 2. The first set of links uses external CA with RSA 2048 bit keys and the second set uses external CA with ECDSA P-256 curve keys. I ran the test in a minikube environment on my Surface Book 2 running Manjaro Linux. The tests ran for about two hours each with linkerd installed and emojivoto demo installed with linkerd injection. I tried to keep an even load on my processors during both tests.
Brief breakdown (RSA vs ECDSA):
Seems that there isn't really a difference between using RSA vs ECDSA for root and issuer CA ciphers.
RSA 2048
Kubernetes / Compute Resources / Namespace (Workloads) / linkerd
Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto
Linkerd Namespace / linkerd
Linkerd Namespace / emojivoto
ECDSA P-256
Kubernetes / Compute Resources / Namespace (Workloads) / linkerd
Kubernetes / Compute Resources / Namespace (Workloads) / emojivoto
Linkerd Namespace / linkerd
Linkerd Namespace / emojivoto