Linkerd2: BadCertificate in case of \n at the end of root cert

Created on 2 Jan 2020 · 25Comments · Source: linkerd/linkerd2

Bug Report

What is the issue?

Every day after cert-manager issue new certificate new proxy pods get BadCertificate errors in log. I investigate a bit and looks like my problem related to \n symbol at the and of certificate. So there is \n symbol in the end of ca.crt in linkerd-identity-issuer secret and tls.crt in linkerd-trust-anchor secret but no \n symbol in the end of identityContext.trustAnchorsPem in linkerd-config configMap. So it looks like that linkerd dataplane decide that we get new root certificate every day.

How can it be reproduced?

Not sure what to add here. I installed linkerd with helm-chart.

Logs, error output, etc

A lot of repetitive logs like these:

WARN [ 60241.999887s] rustls::session Sending fatal alert BadCertificate
ERR! [ 60242.523537s] rustls::session TLS alert received: Message {
    typ: Alert,
    version: TLSv1_2,
    payload: Alert(
        AlertMessagePayload {
            level: Fatal,
            description: HandshakeFailure,
        },
    ),
}

`linkerd check` output

At the begining I got these error:

× [prometheus] control plane can talk to Prometheus
    Error calling Prometheus from the control plane: server_error: server error: 503
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Then after upgrading to edge-19.12.3 I got the following:

linkerd-identity
----------------
× certificate config is valid
    IdentityContext.TrustAnchorsPem does not match ca.crt in linkerd-identity-issuer

Environment

Kubernetes Version: v1.15.5
Cluster Environment: AKS
Host OS: Ubuntu 16.04.6 LTS
Linkerd version: edge-19.12.3

Possible solution

Handle \n correctly. I believe that for real there is no difference between two following certs:

-----BEGIN CERTIFICATE-----\n
...\n
-----END CERTIFICATE-----

```text
-----BEGIN CERTIFICATE-----\n
...\n
-----END CERTIFICATE-----\n


### Additional context

I found out where the problem in check https://github.com/linkerd/linkerd2/blob/master/pkg/healthcheck/healthcheck.go#L1152
And with dirty fix like
```go
if data != nil && idctx.TrustAnchorsPem != strings.TrimRight(data.TrustAnchors, "\n") {
...
}

check has been passed.

Also after all I manually fix identityContext.trustAnchorsPem in linkerd-config configMap and add \n symbol. So looks like the problem is gone.

bug prioritP0

Source

StupidScience

Most helpful comment

It seems I managed to reproduce the problem. I will investigate it further

zaharidichev on 6 Jan 2020

🎉2

All 25 comments

This is such a weird bug, @zaharidichev can you try replicating this?

grampelberg on 2 Jan 2020

I will take a look at this one. @StupidScience the fact that the check passes after the change does not help the original issue that rustls is complaining about, correct ?

zaharidichev on 2 Jan 2020

@zaharidichev yep, I believe there are kinda two bugs: one for cli and another for control plane. But I am not 100% sure about it

StupidScience on 2 Jan 2020

Looks like the problem is not gone with \n adding in ConfigMap. I just restarted prometheus/linkerd-web pod and got a lot of logs like this in proxy

WARN [   104.035829s] rustls::session Sending fatal alert BadCertificate
WARN [   104.545066s] rustls::session Sending fatal alert BadCertificate
WARN [   105.062436s] rustls::session Sending fatal alert BadCertificate
WARN [   105.570898s] rustls::session Sending fatal alert BadCertificate
WARN [   106.080047s] rustls::session Sending fatal alert BadCertificate
WARN [   106.588864s] rustls::session Sending fatal alert BadCertificate
ERR! [   107.026998s] linkerd2_proxy_identity::certify Failed to certify identity: grpc-status: Unknown, grpc-message: "the request could not be dispatched in a timely fashion"

And check linkerd check results in

linkerd-api
-----------
\ pod/linkerd-prometheus-58455b974c-htp58 container linkerd-proxy is not ready

and then

linkerd-api
-----------
× control plane pods are ready
    pod/linkerd-prometheus-58455b974c-htp58 container linkerd-proxy is not ready
    see https://linkerd.io/checks/#l5d-api-control-ready for hints

So looks like issue with \n relate only to linkerd check.

Problem is gone only after kubectl rollout restart deploy -n linkerd but it return every day.

Also here is related linkerd-identity logs:

time="2020-01-02T02:12:15Z" level=info msg="Updated identity issuer"
time="2020-01-02T04:10:54Z" level=info msg="certifying linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:14 +0000 UTC"
time="2020-01-02T04:11:04Z" level=info msg="certifying linkerd-controller.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:24 +0000 UTC"
time="2020-01-02T04:11:06Z" level=info msg="certifying linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:26 +0000 UTC"
time="2020-01-02T04:11:07Z" level=info msg="certifying linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:27 +0000 UTC"
time="2020-01-02T04:11:10Z" level=info msg="certifying linkerd-sp-validator.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:30 +0000 UTC"
time="2020-01-02T04:11:10Z" level=info msg="certifying linkerd-prometheus.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:30 +0000 UTC"
time="2020-01-02T04:11:11Z" level=info msg="certifying linkerd-web.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:31 +0000 UTC"
time="2020-01-02T04:11:14Z" level=info msg="certifying linkerd-tap.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:34 +0000 UTC"
time="2020-01-02T04:11:15Z" level=info msg="certifying linkerd-grafana.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:35 +0000 UTC"
time="2020-01-02T04:11:16Z" level=info msg="certifying linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:36 +0000 UTC"
time="2020-01-02T04:11:18Z" level=info msg="certifying linkerd-controller.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:38 +0000 UTC"
time="2020-01-02T04:11:38Z" level=info msg="certifying linkerd-tap.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 04:11:58 +0000 UTC"
time="2020-01-02T11:10:11Z" level=info msg="certifying linkerd-prometheus.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 11:10:31 +0000 UTC"
time="2020-01-02T20:59:08Z" level=info msg="certifying linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:28 +0000 UTC"
time="2020-01-02T20:59:18Z" level=info msg="certifying linkerd-controller.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:38 +0000 UTC"
time="2020-01-02T20:59:20Z" level=info msg="certifying linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:40 +0000 UTC"
time="2020-01-02T20:59:21Z" level=info msg="certifying linkerd-proxy-injector.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:41 +0000 UTC"
time="2020-01-02T20:59:24Z" level=info msg="certifying linkerd-sp-validator.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:44 +0000 UTC"
time="2020-01-02T20:59:25Z" level=info msg="certifying linkerd-web.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:45 +0000 UTC"
time="2020-01-02T20:59:28Z" level=info msg="certifying linkerd-tap.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:48 +0000 UTC"
time="2020-01-02T20:59:29Z" level=info msg="certifying linkerd-grafana.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:49 +0000 UTC"
time="2020-01-02T20:59:30Z" level=info msg="certifying linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:50 +0000 UTC"
time="2020-01-02T20:59:32Z" level=info msg="certifying linkerd-controller.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 20:59:52 +0000 UTC"
time="2020-01-02T20:59:52Z" level=info msg="certifying linkerd-tap.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-03 21:00:12 +0000 UTC"
time="2020-01-03T01:11:38Z" level=info msg="Updated identity issuer"
time="2020-01-03T03:58:25Z" level=info msg="certifying linkerd-prometheus.linkerd.serviceaccount.identity.linkerd.cluster.local until 2020-01-04 03:58:45 +0000 UTC"

StupidScience on 3 Jan 2020

Still can't figure out how to reproduce issue.
Tried to delete linkerd-identity-issuer Secret to force certificate reissuance but looks like everything works as expected.
Can you give me some suggestions where to dig when issue will occur again?

StupidScience on 3 Jan 2020

Problem is gone only after kubectl rollout restart deploy -n linkerd but it return every day.

@StupidScience I will look into that first thing on Monday. One thing to keep in mind is that the root cert is baked into the proxy config for linkerd components and added via injection to other meshed pods. So if you simply change the value of the root vert in the config map, this will likely not have any effect untill you restart the pods.

What is more important for me is to replicate the original cause of the problem. Can you share with me the cert manager version that you are using along with the cert manager resources used to issue certificates so I can replicate things. I have not noticed this problem before, despite the fact that I have done quite a bit of testing where I have se setup consisting of linkerd + cert manager

zaharidichev on 3 Jan 2020

One thing to keep in mind is that the root cert is baked into the proxy config for linkerd components and added via injection to other meshed pods. So if you simply change the value of the root vert in the config map, this will likely not have any effect untill you restart the pods.

@zaharidichev I already restarted all control plane pods after changing value in ConfigMap.

I use cert-manager with 0.12.0 version.
Resources are same as on your site.
Issuer:

---
apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
  name: linkerd-trust-anchor
  namespace: linkerd
spec:
  ca:
    secretName: linkerd-trust-anchor

Certificate:

---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: linkerd-identity-issuer
  namespace: linkerd
spec:
  secretName: linkerd-identity-issuer
  duration: 24h
  renewBefore: 1h
  issuerRef:
    name: linkerd-trust-anchor
    kind: Issuer
  commonName: identity.linkerd.cluster.local
  isCA: true
  keyAlgorithm: ecdsa
  usages:
  - cert sign
  - crl sign
  - server auth
  - client auth

Also here is my values-file for helm installation if you need it. I got it mostly from charts' values-ha.yaml file

---
installNamespace: false
enablePodAntiAffinity: true
# controller configuration
controllerReplicas: "2"
controllerResources: &controller_resources
  cpu: &controller_resources_cpu
    limit: "1"
    request: 100m
  memory:
    limit: 250Mi
    request: 50Mi
destinationResources: *controller_resources
publicAPIResources: *controller_resources

# identity configuration
identityResources:
  cpu: *controller_resources_cpu
  memory:
    limit: 250Mi
    request: 10Mi

# grafana configuration
grafanaResources:
  cpu: *controller_resources_cpu
  memory:
    limit: 1024Mi
    request: 50Mi

# heartbeat configuration
heartbeatResources: *controller_resources

# prometheus configuration
prometheusResources:
  cpu:
    limit: "4"
    request: 300m
  memory:
    limit: 8192Mi
    request: 300Mi

# proxy configuration
proxy:
  resources:
    cpu:
      limit: "1"
      request: 100m
    memory:
      limit: 250Mi
      request: 20Mi

# proxy injector configuration
proxyInjectorResources: *controller_resources
webhookFailurePolicy: Fail

# service profile validator configuration
spValidatorResources: *controller_resources

# tap configuration
tapResources: *controller_resources

# web configuration
webResources: *controller_resources

identity:
  trustAnchorsPEM: |
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  issuer:
    scheme: kubernetes.io/tls

StupidScience on 3 Jan 2020

@StupidScience

Looks like the problem is not gone with \n adding in ConfigMap. I just restarted prometheus/linkerd-web pod and got a lot of logs like this in proxy

How did you actually restart these?

zaharidichev on 6 Jan 2020

@zaharidichev

How did you actually restart these?

kubectl delete po linkerd-web-...

StupidScience on 6 Jan 2020

@StupidScience @grampelberg

So here is what I know so far:

Firstly the ca.crt in linkerd-identity-issuer will always have a newline char at the end becase this is how cert-manager is formatting it. It does not matter whether your linkerd-trust-anchor secret has the ca.crt with newline or not.

Secondly, this apart from checks should not matter all that much. Namely, whenever a new pod is injected with a proxy the trust anchors that are in the linkerd-identity-issuer secret are used to configure the anchors of its proxy. These are the very same anchors that are used to configure the anchors of the indentity service (used to verify the CSR).

I tested all of this by installing linkerd with helm, setting my issuer renewal to 5m (so cert manager is issuing new certs every 5 min) and setting my issuanceLifetime to 2m (so each proxy is firing a new CSR every 2 min). Despite the fact that the checks do not pass because there is a different value in the config map and the ca.crt of the secret, nothing else is broken. Namely, I tested with emojivoto and new pods are being injected properly, certs are rotated automatically, CSRs are served, etc.

Wrt to your problem, I do have the suspicion that you might have done something different that what is described here. These bad cert errors that you are seeing, are they being emitted after you edited the linkerd-identity-issuer to pass the checks and hen possibly restarted some component from the linkerd namespace?

Wrt to the check failing, its failing exactly as it should because the values are not the same. Now... its a different question what definition of "same" we want to accept. If we want to strip of the trailing newlines , I am fine with that. @grampelberg WDYT?

One more thing. Is it correct to say that if the trustAnchorsPEM contains a \n it all works well?

So it looks like that linkerd dataplane decide that we get new root certificate every day.

This is strange... We do not support automatice trust root rotation so I dont see how this can happen.

Every day after cert-manager issue new certificate new proxy pods get BadCertificate errors in log.

What new pods are you refering to. At what point do these new pods get created?

zaharidichev on 6 Jan 2020

@zaharidichev

What new pods are you refering to. At what point do these new pods get created?

Any new pod that got new certificate. For example, if I delete pod with linkerd-web or linkerd-prometheus. Looks like it appears only when cert in linkerd-identity-issuer that certified previous proxy is already expired. So even if I force certificate reissuing so until previous certificate got expired everything will work as expected but once 24h will pass and I will delete a pod so new one will be created it will get BadCertificate error.

These bad cert errors that you are seeing, are they being emitted after you edited the linkerd-identity-issuer to pass the checks and hen possibly restarted some component from the linkerd namespace?

No, I got these errors even before modification. Modification was one of my attempts to fix issue.

I'm pretty sure that if I will delete one of linkerd-web/linkerd-prometheus pods right now I will get same error. I can try to gather more diagnostics if you will explain me where to dig.

StupidScience on 6 Jan 2020

Hmm, I think the problem that you are explaining does not relate directly to the \n char. Can you as a start make sure that you install linkerd with a cert root that has \n at the end and verify that you are experiencing the same problem just to rule this out?

I have another theory and will try to validate it.

zaharidichev on 6 Jan 2020

@StupidScience Is cert-manager or something else auto-rotating your trust anchor or linkerd-trust-anchor secret? Since the Linkerd components aren't auto-injected, (implying that on restart, these pods always use back the install-time trust anchor), I can see how their LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS env var can get out-of-sync with that managed by cert-manager.

ihcsim on 6 Jan 2020

@ihcsim apart from the root being rotated which is hard to imagine happening, I cannot see any other reason for this happening. On the other hand though if what you are suggesting is true the linkerd check --proxy would catch that, right? All his checks show up green....

zaharidichev on 6 Jan 2020

It seems I managed to reproduce the problem. I will investigate it further

zaharidichev on 6 Jan 2020

🎉2

So how I reproduced it (there may be an easier way):

Create a root CA as per the docs:
step certificate create identity.linkerd.cluster.local ca.crt ca.key --profile root-ca --no-password --insecure

create the trust-anchor secret

kubectl create secret tls \
    linkerd-trust-anchor \
    --cert=ca.crt \
    --key=ca.key \
    --namespace=linkerd

Install cert manager and use the following resources to set it up:

issuer.yaml:

apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
  name: linkerd-trust-anchor
  namespace: linkerd
spec:
  ca:
    secretName: linkerd-trust-anchor

certificate.yaml

apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: linkerd-identity-issuer
  namespace: linkerd
spec:
  secretName: linkerd-identity-issuer
  duration: 1h
  renewBefore: 55m
  issuerRef:
    name: linkerd-trust-anchor
    kind: Issuer
  commonName: identity.linkerd.cluster.local
  isCA: true
  keyAlgorithm: ecdsa
  usages:
  - cert sign
  - crl sign
  - server auth
  - client auth

at that point you can instal linkerd with:

 bin/linkerd install --identity-external-issuer=true --controller-log-level=debug  | kubectl apply -f -

Wait one hour (unfortunatelly you cannot configure cert manager ro issue certs that have lifetime of less than 1h )

Ensure linkerd check --proxy shows all green and then do kubectl rollout restart deployment linkerd-prometheus -n linkerd.

You will see the proxy of the prometheus pod failing and logging errors such as:

WARN [  1807.957868s] rustls::session Sending fatal alert BadCertificate
WARN [  1808.466786s] rustls::session Sending fatal alert BadCertificate
WARN [  1808.973501s] rustls::session Sending fatal alert BadCertificate
WARN [  1809.481054s] rustls::session Sending fatal alert BadCertificate
WARN [  1809.988382s] rustls::session Sending fatal alert BadCertificate
ERR! [  1810.443891s] linkerd2_proxy_identity::certify Failed to certify identity: grpc-status: Unknown, grpc-message: "the request could not be dispatched in a timely fashion"
WARN [  1820.452851s] rustls::session Sending fatal alert BadCertificate
WARN [  1820.961825s] rustls::session Sending fatal alert BadCertificate
WARN [  1821.470946s] rustls::session Sending fatal alert BadCertificate
WARN [  1821.979640s] rustls::session Sending fatal alert BadCertificate
WARN [  1822.488054s] rustls::session Sending fatal alert BadCertificate
WARN [  1822.996038s] rustls::session Sending fatal alert BadCertificate
ERR! [  1823.446416s] linkerd2_proxy_identity::certify Failed to certify identity: grpc-status: Unknown, grpc-message: "the request could not be dispatched in a timely fashion"
WARN [  1833.453647s] rustls::session Sending fatal alert BadCertificate
WARN [  1833.961598s] rustls::session Sending fatal alert BadCertificate
WARN [  1834.469027s] rustls::session Sending fatal alert BadCertificate
WARN [  1834.976614s] rustls::session Sending fatal alert BadCertificate
WARN [  1835.484578s] rustls::session Sending fatal alert BadCertificate
WARN [  1835.992778s] rustls::session Sending fatal alert BadCertificate
ERR! [  1836.448164s] linkerd2_proxy_identity::certify Failed to certify identity: grpc-status: Unknown, grpc-message: "the request could not be dispatched in a timely fashion"

zaharidichev on 6 Jan 2020

@zaharidichev did you just forget to mention about trust-anchor secret creation in your comment?

StupidScience on 6 Jan 2020

👍1

So as @olix0r correctly pointed out, the problem really stems from the fact that the certificate issued by the identity service has a lifetime that is longer than the issuer certificate. In order to replicate the problem you can do the following:

Create the certs (notice the lifetime of the issuer cert is just 10 minz)

step certificate create identity.linkerd.cluster.local ca.crt ca.key --profile root-ca --no-password --insecure

step certificate create identity.linkerd.cluster.local issuer.crt issuer.key --ca ca.crt --ca-key ca.key --profile intermediate-ca --not-after 10m --no-password --insecure

Install linkerd with:

 bin/linkerd install \
           --identity-trust-anchors-file ca.crt \
           --identity-issuer-certificate-file issuer.crt \
           --identity-issuer-key-file issuer.key \
           --controller-log-level=debug \
           --proxy-log-level=warn,linkerd2_proxy=debug \
           | kubectl apply -f -

Mesh these resources:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: buoyantio/bb:v0.0.5
        args:
        - terminus
        - "--h1-server-port=8085"
        - "--response-text=backend1"
        ports:
        - containerPort: 8085
---
apiVersion: v1
kind: Service
metadata:
  name: backend-svc
spec:
  selector:
    app: backend
  ports:
  - name: http
    port: 8085
    targetPort: 8085
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: slow-cooker
spec:
  replicas: 1
  selector:
    matchLabels:
      app: slow-cooker
  template:
    metadata:
      labels:
        app: slow-cooker
    spec:
      containers:
      - name: slow-cooker
        image: buoyantio/slow_cooker:1.1.1
        command:
        - "/bin/sh"
        args:
        - "-c"
        - |
          sleep 600 # wait for pods to start
          slow_cooker http://backend-svc:8085
        ports:
        - containerPort: 9999
---
apiVersion: v1
kind: Service
metadata:
  name: slow-cooker
spec:
  selector:
    app: slow-cooker
  ports:
  - name: metrics
    port: 9999
    targetPort: 9999
---

At that point the meshed pods will get a certificate from the identitiy service. The end identitiy cert will have a NotAfter of now + 24h (the default) but the issuer cert that is in the chain will have a lifetime of just 10 mins.
After 10 minutes if you tail the logs of the slowcooker proxy you will see errors similar to:

DBUG [   600.524189s] linkerd2_proxy_transport::tls::accept skipping TLS reason=loopback
DBUG [   600.526103s] linkerd2_proxy_http::canonicalize refined: backend-svc.slow.svc.cluster.local:8085
DBUG [   600.533630s] linkerd2_proxy_transport::connect connecting to 10.100.188.33:8086
DBUG [   600.534050s] linkerd2_proxy_transport::connect connection established to 10.100.188.33:8086
WARN [   600.534811s] rustls::session Sending fatal alert BadCertificate
DBUG [   600.655586s] linkerd2_proxy_transport::connect connecting to 10.100.188.33:8086
DBUG [   600.656116s] linkerd2_proxy_transport::connect connection established to 10.100.188.33:8086
WARN [   600.657169s] rustls::session Sending fatal alert BadCertificate
DBUG [   600.868550s] linkerd2_proxy_transport::connect connecting to 10.100.188.33:8086
DBUG [   600.868796s] linkerd2_proxy_transport::connect connection established to 10.100.188.33:8086

This proxy is holding a trust chain that has an expired certificate in it (the issuer cert). And since cert refresh on the proxy is configured to happen at 70% of the lifetime of the end cert, such a refresh has not happened yet.

I guess there are two possible fixes:

As originally suggested, ensure the identitiy service cannot issue an end cert that has lifetime grater than the issuer cert. Or if we have to generalize it, that has a lifetime greater than any of the certs in thr trust chain.
Configure the proxy to schedule the refresh to happen at 70% of the lifetime of the certificate in the chain that expires first.

To me the first fix sounds more logical.

zaharidichev on 7 Jan 2020

@zaharidichev so if I will set identity.issuer.issuanceLifeTime to for example 12h it should resolve this issue, right?

StupidScience on 7 Jan 2020

@StupidScience that is correct in theory. In practice, it will not work because of this bug that I submitted a PR for. You can instead make your issuer cert to have a larger validity period (i.e. 48 hours...)

zaharidichev on 7 Jan 2020

@zaharidichev thanks for clarification, I'll test it out.

StupidScience on 7 Jan 2020

Looks like that was wrong theory. I changed certificate duration to 48h and once it expired issue occurred again @zaharidichev

StupidScience on 9 Jan 2020

@StupidScience I think it's a little bit more complicated than just having the issuance lifetime longer than the issuer lifetime.

Let me give an example:

Suppose you have an issuer certificate that is valid for 48 hours. 6 hours before the issuer certificate is due to expire, a new pod is created and requests an identity from the identity service. It is issued an identity certificate with a lifetime of 24 hours. The problem is that in 6 hours, the issuer certificate will expire before the leaf identity certificate and the Linkerd proxy will have an invalid certificate chain, but won't request a new certificate because the leaf has not yet expired.

The long and short of it is that the identity service should only issue certificates that expire at or before its own certificate's expiry. This is what #3893 fixes.

You could also work around this by always rotating the issuer certificate at least X hours before it expires, where X is the issuance lifetime.

adleong on 10 Jan 2020

Thanks for clarification @adleong
That was definitely the thing I was wondering about.
As I can see new edge release is out now so I’ll test it.

StupidScience on 10 Jan 2020

@StupidScience, @adleong described this pretty well. Also for your testing in order to ensure that things are working correctly and see results sooner you can have the following configured in cert manager:
Issuer Cert Lifetime: 1h
Rotate issuer cert: 10 mins before expiry

zaharidichev on 10 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Test cleanup does not work on MacOS

zaharidichev · 4Comments

Add support for rate limiting

steve-fraser · 4Comments

Document service discovery and load balancing

briansmith · 4Comments

Typo in check error message

adleong · 4Comments

Publish Helm chart to Helm Hub

ihcsim · 4Comments

Linkerd2: BadCertificate in case of \n at the end of root cert

Bug Report

What is the issue?

How can it be reproduced?

Logs, error output, etc

linkerd check output

Environment

Possible solution

Most helpful comment

All 25 comments

Related issues

`linkerd check` output