I don't understand all details, but periodically I see the error in different places but Linkerd works in general. The error appears randomly. Pods restarting helps to solve it but I don't think it's a good workaround.
โ linkerd top deployment/application --namespace default
Error: HTTP error, status Code [503] (unexpected API response: Error: 'x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "linkerd-tap.linkerd.svc")'
Trying to reach: 'https://10.56.79.145:8089/apis/tap.linkerd.io/v1alpha1/watch/namespaces/default/deployments/application/tap')
Usage:
linkerd top [flags] (RESOURCE)
#...
kubectl rollout restart -n linkerd deployment/linkerd-tap
# ...
linkerd top deployment/application --namespace default
# now it works, but after a while the problem returns
[linkerd-tap-86c9f7cc98-p49b5 tap] 2019/09/30 14:58:25 http: TLS handshake error from 127.0.0.1:33188: remote error: tls: bad certificate
[linkerd-tap-86c9f7cc98-psztb tap] 2019/09/30 14:58:25 http: TLS handshake error from 127.0.0.1:37118: remote error: tls: bad certificate
[linkerd-tap-86c9f7cc98-psztb tap] 2019/09/30 14:58:26 http: TLS handshake error from 127.0.0.1:37198: remote error: tls: bad certificate
I didn't find any other errors in other L5d pods.
NAME READY STATUS RESTARTS AGE
linkerd-controller-784c8ddfbd-6l7zv 2/2 Running 0 8h
linkerd-controller-784c8ddfbd-b67s2 2/2 Running 0 47m
linkerd-controller-784c8ddfbd-m95ll 2/2 Running 0 8h
linkerd-destination-7655c8bc7c-4zcxm 2/2 Running 0 8h
linkerd-destination-7655c8bc7c-q4jwz 2/2 Running 0 8h
linkerd-destination-7655c8bc7c-xlx9g 2/2 Running 0 8h
linkerd-grafana-86df8766f8-xlxld 2/2 Running 0 8h
linkerd-identity-59f8fbf6fc-ll597 2/2 Running 0 47m
linkerd-identity-59f8fbf6fc-wgcpj 2/2 Running 0 8h
linkerd-identity-59f8fbf6fc-z66p7 2/2 Running 0 8h
linkerd-prometheus-98c96c5d5-jc2lz 2/2 Running 0 8h
linkerd-proxy-injector-67f7db5566-9wdls 2/2 Running 0 8h
linkerd-proxy-injector-67f7db5566-hc2kv 2/2 Running 0 8h
linkerd-proxy-injector-67f7db5566-t225x 2/2 Running 0 8h
linkerd-sp-validator-c4c598c49-djhv7 2/2 Running 0 47m
linkerd-sp-validator-c4c598c49-ktmdw 2/2 Running 0 8h
linkerd-sp-validator-c4c598c49-lb7jv 2/2 Running 0 30m
linkerd-tap-86c9f7cc98-h8c2d 2/2 Running 0 7h31m
linkerd-tap-86c9f7cc98-p49b5 2/2 Running 0 7h31m
linkerd-tap-86c9f7cc98-psztb 2/2 Running 0 7h30m
linkerd-web-549f59496c-sm6p9 2/2 Running 0 47m
linkerd check outputโ linkerd check
kubernetes-api
--------------
โ can initialize the client
โ can query the Kubernetes API
kubernetes-version
------------------
โ is running the minimum Kubernetes API version
โ is running the minimum kubectl version
linkerd-config
--------------
โ control plane Namespace exists
โ control plane ClusterRoles exist
โ control plane ClusterRoleBindings exist
โ control plane ServiceAccounts exist
โ control plane CustomResourceDefinitions exist
โ control plane MutatingWebhookConfigurations exist
โ control plane ValidatingWebhookConfigurations exist
โ control plane PodSecurityPolicies exist
linkerd-existence
-----------------
โ 'linkerd-config' config map exists
โ control plane replica sets are ready
โ no unschedulable pods
โ controller pod is running
โ can initialize the client
โ can query the control plane API
linkerd-api
-----------
โ control plane pods are ready
โ control plane self-check
โ [kubernetes] control plane can talk to Kubernetes
โ [prometheus] control plane can talk to Prometheus
โ no invalid service profiles
linkerd-version
---------------
โ can determine the latest version
โผ cli is up-to-date
is running version 19.9.3 but the latest edge version is 19.9.4
see https://linkerd.io/checks/#l5d-version-cli for hints
control-plane-version
---------------------
โผ control plane is up-to-date
is running version 19.9.3 but the latest edge version is 19.9.4
see https://linkerd.io/checks/#l5d-version-control for hints
โ control plane and cli versions match
Status check results are โ
@KIVagant IIRC, you installed Linkerd using the Helm templates, right? Did you override the Tap TLS cert and key in your values.yaml? If yes, how was the cert generated?
I also wonder if this verification error is caused by clock skew on your servers. Can you confirm? Do you have ntpd etc. installed on your server?
There are still the same certs that were generated in https://github.com/linkerd/linkerd2/issues/3414#issuecomment-530268377 taking into account this issue https://github.com/linkerd/website/issues/516
When the problem with extra newlines was solved, L5d works well, but randomly we start getting the TLS handshake error
if this verification error is caused by clock skew on your servers
That's a nice point. I will try to check this tomorrow (UTC+3).
This is referring to an APIService which uses a certificate that is part of the resource configuration (caBundle) on the server side and a configmap from kube-system on the client side (extension-apiserver-authentication). Are you doing any kind of certificate rotation?
The fact that restarting the pod fixes it leads me to believe that extension-apiserver-authentication is being updated.
Are you doing any kind of certificate rotation?
No. I created the cert only once, added it into a secret storage and that's it. I'm updating the Helm chart periodically from the upstream, so this maybe can cause secrets regeneration, but the content of the secret stays the same.
โ kgsecn linkerd
NAME TYPE DATA AGE
default-token-w7k9t kubernetes.io/service-account-token 3 18d
linkerd-controller-token-vj8p4 kubernetes.io/service-account-token 3 18d
linkerd-destination-token-9hm89 kubernetes.io/service-account-token 3 7d11h
linkerd-grafana-token-8j8mq kubernetes.io/service-account-token 3 18d
linkerd-heartbeat-token-kv9hc kubernetes.io/service-account-token 3 18d
linkerd-identity-issuer Opaque 2 18d
linkerd-identity-token-fwj7c kubernetes.io/service-account-token 3 18d
linkerd-prometheus-token-khz72 kubernetes.io/service-account-token 3 18d
linkerd-proxy-injector-tls Opaque 2 18d
linkerd-proxy-injector-token-5sdmb kubernetes.io/service-account-token 3 18d
linkerd-sp-validator-tls Opaque 2 18d
linkerd-sp-validator-token-s9b79 kubernetes.io/service-account-token 3 18d
linkerd-tap-tls Opaque 2 18d
linkerd-tap-token-x4skt kubernetes.io/service-account-token 3 18d
linkerd-web-token-x24x6 kubernetes.io/service-account-token 3 18d
I will try to detect if there's a clock skew when the problem appears, as @ihcsim suggested.
@KIVagant the certificate in question isn't part of the trust chain at all. ca.pem in linkerd-tap-tls should match caBundle in apiservice/v1alpha1.tap.linkerd.io. I still think extension-apiserver-authentication is being rotated by the api-server though.
@KIVagant any new details?
@grampelberg , sorry, not yet. I am busy with other tickets, but I still see the error (upgraded L5d to 2.6.0 stable). I will return back when I find more. Please, don't close this if it is okay for you.
My findings:
ntpd service, and ntpstat responds something close to this:synchronised to NTP server (185.144.157.134) at stratum 3
time correct to within 143 ms
polling server every 1024 s
linkerd tap:โ linkerd tap pod/tools-web-796dc6fb95-5gcv7 --namespace devops --verbose
DEBU[0001] Response from [https://3......A.sk1.us-east-1.eks.amazonaws.com/apis/tap.linkerd.io/v1alpha1/watch/namespaces/devops/pods/tools-web-796dc6fb95-5gcv7/tap] had headers: map[Audit-Id:[35d5289d-a532-431a-97f9-b89cc7112de9] Content-Length:[327] Content-Type:[text/plain; charset=utf-8] Date:[Wed, 16 Oct 2019 13:11:59 GMT]]
Error: HTTP error, status Code [503] (unexpected API response: Error: 'x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "linkerd-tap.linkerd.svc")'
Trying to reach: 'https://10.56.73.221:8089/apis/tap.linkerd.io/v1alpha1/watch/namespaces/devops/pods/tools-web-796dc6fb95-5gcv7/tap')
apis/tap.linkerd.io:# (the cert for https://10.56.73.221:8089/apis/tap.linkerd.io)
# echo | openssl s_client -showcerts -servername 10.56.73.221 -connect 10.56.73.221:8089 2>/dev/null | openssl x509 -inform pem -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
75:0f:b6:09:3e:ed:94:49:e0:8e:be:65:6b:35:c6:01
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = linkerd-tap.linkerd.svc
Validity
Not Before: Oct 15 11:54:18 2019 GMT
Not After : Oct 14 11:54:18 2020 GMT
Subject: CN = linkerd-tap.linkerd.svc
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
Modulus:
00:d1:9e:71:48:02:88:eb:78:8a:eb:d5:7c:31:d7:
...
b7:0f
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Certificate Sign
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Basic Constraints: critical
CA:TRUE
Signature Algorithm: sha256WithRSAEncryption
69:05:9c:fb:c5:bf:f9:2a:2e:1e:f9:ea:d8:87:28:d4:42:fa:
...
b8:c3:13:e1
I cannot confirm that this is correct. From what I see (if I understand it right), the cert was created long time ago.
โ k get configmaps -n kube-system |grep extension-apiserver-authentication
extension-apiserver-authentication 5 212d
โ k get configmaps -n kube-system extension-apiserver-authentication -o json |jq -r '.data["requestheader-client-ca-file"]' | openssl x509 -inform pem -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 0 (0x0)
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN=kubernetes
Validity
Not Before: Mar 18 11:30:28 2019 GMT
Not After : Mar 15 11:30:28 2029 GMT
Subject: CN=kubernetes
curl --insecure https://10.56.73.221:8089/apis/tap.linkerd.io/v1alpha1/watch/namespaces/devops/pods/tools-web-796dc6fb95-5gcv7/tap
{"error":"no valid CN found. allowed names: [front-proxy-client], client names: []"}
After kubectl rollout restart -n linkerd deployment/linkerd-tap
# (the cert for https://10.56.47.53:8089/apis/tap.linkerd.io)
# echo | openssl s_client -showcerts -servername 10.56.47.53 -connect 10.56.47.53:8089 2>/dev/null | openssl x509 -inform pem -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
3e:27:68:b7:1f:e3:c4:c0:ef:16:0c:fe:c6:13:93:5e
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = linkerd-tap.linkerd.svc
Validity
Not Before: Oct 15 12:28:58 2019 GMT
Not After : Oct 14 12:28:58 2020 GMT
Subject: CN = linkerd-tap.linkerd.svc
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
RSA Public-Key: (2048 bit)
Modulus:
00:d3:b6:8e:77:9e:59:8e:84:c5:64:62:5d:dc:f3:
...
78:b1
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Certificate Sign
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Basic Constraints: critical
CA:TRUE
Signature Algorithm: sha256WithRSAEncryption
31:cb:56:40:f6:04:ff:7d:f9:05:0a:be:94:0a:22:c1:98:11:
...
d4:d5:21:df
So, this is the difference:
Not Before: Oct 15 11:54:18 2019 GMT
Not After : Oct 14 11:54:18 2020 GMT
Not Before: Oct 15 12:28:58 2019 GMT
Not After : Oct 14 12:28:58 2020 GMT
And the new date is equal to the last chart update (LAST DEPLOYED: Tue Oct 15 12:28:58 2019), which makes me think that helm upgrade does not restart everything that must be restarted:
# _Identity.TrustAnchorsPEM.tmp and all others are loaded from a secret storage, the internal content is always the same
helm upgrade --install --namespace=linkerd --values ./values/linkerd2/values.yaml --set-file=Identity.TrustAnchorsPEM=_Identity.TrustAnchorsPEM.tmp --set-file=Identity.Issuer.TLS.CrtPEM=_Identity.Issuer.TLS.CrtPEM.tmp --set-file=Identity.Issuer.TLS.KeyPEM=_Identity.Issuer.TLS.KeyPEM.tmp linkerd2 ./linkerd2-2.6.0-f90805b8.tgz
Release "linkerd2" has been upgraded.
LAST DEPLOYED: Tue Oct 15 12:28:58 2019
NAMESPACE: linkerd
STATUS: DEPLOYED
RESOURCES:
==> v1/APIService
NAME AGE
v1alpha1.tap.linkerd.io 33d
==> v1/ClusterRole
NAME AGE
linkerd-linkerd-controller 33d
linkerd-linkerd-destination 21d
linkerd-linkerd-identity 33d
linkerd-linkerd-prometheus 33d
linkerd-linkerd-proxy-injector 33d
linkerd-linkerd-sp-validator 33d
linkerd-linkerd-tap 33d
linkerd-linkerd-tap-admin 33d
==> v1/ClusterRoleBinding
NAME AGE
linkerd-linkerd-controller 33d
linkerd-linkerd-destination 21d
linkerd-linkerd-identity 33d
linkerd-linkerd-prometheus 33d
linkerd-linkerd-proxy-injector 33d
linkerd-linkerd-sp-validator 33d
linkerd-linkerd-tap 33d
linkerd-linkerd-tap-auth-delegator 33d
linkerd-linkerd-web-admin 33d
==> v1/ConfigMap
NAME DATA AGE
linkerd-config 3 33d
linkerd-grafana-config 3 33d
linkerd-prometheus-config 1 33d
==> v1/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
linkerd-controller 3/3 3 3 33d
linkerd-destination 3/3 3 3 21d
linkerd-grafana 1/1 1 1 33d
linkerd-identity 3/3 3 3 33d
linkerd-prometheus 1/1 1 1 33d
linkerd-proxy-injector 3/3 3 3 33d
linkerd-sp-validator 3/3 3 3 33d
linkerd-tap 3/3 3 3 33d
linkerd-web 1/1 1 1 33d
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
linkerd-controller-6dbb9f99c7-8zq9p 3/3 Running 0 34m
linkerd-controller-6dbb9f99c7-cf47c 3/3 Running 0 34m
linkerd-controller-6dbb9f99c7-z8dc8 3/3 Running 0 34m
linkerd-destination-5f85657cdf-fbfgh 2/2 Running 0 34m
linkerd-destination-5f85657cdf-jmq9h 2/2 Running 0 34m
linkerd-destination-5f85657cdf-p49mg 2/2 Running 0 34m
linkerd-grafana-9fd8b57cf-hw28q 2/2 Running 0 34m
linkerd-identity-54789dd4dd-ngt8f 2/2 Running 0 34m
linkerd-identity-54789dd4dd-r9dmt 2/2 Running 0 34m
linkerd-identity-54789dd4dd-sz94l 2/2 Running 0 34m
linkerd-prometheus-7947675d6d-kpkht 2/2 Running 0 34m
linkerd-proxy-injector-5847d54cbc-cpwnj 2/2 Running 0 34m
linkerd-proxy-injector-5847d54cbc-dgpt8 2/2 Running 0 34m
linkerd-proxy-injector-5847d54cbc-p6cnv 2/2 Running 0 34m
linkerd-sp-validator-57c89c6dd4-6d29w 2/2 Running 0 34m
linkerd-sp-validator-57c89c6dd4-nzwmf 2/2 Running 0 34m
linkerd-sp-validator-57c89c6dd4-w2dns 2/2 Running 0 34m
linkerd-tap-5d4454b48b-d6x7j 2/2 Running 0 34m
linkerd-tap-5d4454b48b-hx7h7 2/2 Running 0 34m
linkerd-tap-5d4454b48b-qpntm 2/2 Running 0 34m
linkerd-web-77b64597d8-qdxxs 2/2 Running 0 34m
==> v1/Role
NAME AGE
linkerd-heartbeat 33d
linkerd-psp 33d
==> v1/RoleBinding
NAME AGE
linkerd-heartbeat 33d
linkerd-linkerd-tap-auth-reader 33d
linkerd-psp 33d
==> v1/Secret
NAME TYPE DATA AGE
linkerd-identity-issuer Opaque 2 33d
linkerd-proxy-injector-tls Opaque 2 33d
linkerd-sp-validator-tls Opaque 2 33d
linkerd-tap-tls Opaque 2 33d
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
linkerd-controller-api ClusterIP 172.20.234.99 <none> 8085/TCP 33d
linkerd-destination ClusterIP 172.20.118.59 <none> 8086/TCP 33d
linkerd-dst ClusterIP 172.20.199.84 <none> 8086/TCP 34m
linkerd-grafana ClusterIP 172.20.17.98 <none> 3000/TCP 33d
linkerd-identity ClusterIP 172.20.187.42 <none> 8080/TCP 33d
linkerd-prometheus ClusterIP 172.20.199.230 <none> 9090/TCP 33d
linkerd-proxy-injector ClusterIP 172.20.199.105 <none> 443/TCP 33d
linkerd-sp-validator ClusterIP 172.20.98.94 <none> 443/TCP 33d
linkerd-tap ClusterIP 172.20.246.46 <none> 8088/TCP,443/TCP 33d
linkerd-web ClusterIP 172.20.42.162 <none> 8084/TCP,9994/TCP 33d
==> v1/ServiceAccount
NAME SECRETS AGE
linkerd-controller 1 33d
linkerd-destination 1 21d
linkerd-grafana 1 33d
linkerd-heartbeat 1 33d
linkerd-identity 1 33d
linkerd-prometheus 1 33d
linkerd-proxy-injector 1 33d
linkerd-sp-validator 1 33d
linkerd-tap 1 33d
linkerd-web 1 33d
==> v1beta1/CronJob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
linkerd-heartbeat 0 0 * * * False 0 12h 33d
==> v1beta1/CustomResourceDefinition
NAME AGE
serviceprofiles.linkerd.io 33d
trafficsplits.split.smi-spec.io 33d
==> v1beta1/MutatingWebhookConfiguration
NAME AGE
linkerd-proxy-injector-webhook-config 33d
==> v1beta1/PodSecurityPolicy
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
linkerd-linkerd-control-plane false NET_ADMIN,NET_RAW RunAsAny RunAsAny MustRunAs MustRunAs true configMap,emptyDir,secret,projected,downwardAPI,persistentVolumeClaim
==> v1beta1/ValidatingWebhookConfiguration
NAME AGE
linkerd-sp-validator-webhook-config 33d
NOTES:
...
At this moment I can't find any recently changed secrets:
โ k get secret,configmap -n linkerd
NAME TYPE DATA AGE
secret/default-token-w7k9t kubernetes.io/service-account-token 3 34d
secret/linkerd-controller-token-vj8p4 kubernetes.io/service-account-token 3 34d
secret/linkerd-destination-token-9hm89 kubernetes.io/service-account-token 3 22d
secret/linkerd-grafana-token-8j8mq kubernetes.io/service-account-token 3 34d
secret/linkerd-heartbeat-token-kv9hc kubernetes.io/service-account-token 3 34d
secret/linkerd-identity-issuer Opaque 2 34d
secret/linkerd-identity-token-fwj7c kubernetes.io/service-account-token 3 34d
secret/linkerd-prometheus-token-khz72 kubernetes.io/service-account-token 3 34d
secret/linkerd-proxy-injector-tls Opaque 2 34d
secret/linkerd-proxy-injector-token-5sdmb kubernetes.io/service-account-token 3 34d
secret/linkerd-sp-validator-tls Opaque 2 34d
secret/linkerd-sp-validator-token-s9b79 kubernetes.io/service-account-token 3 34d
secret/linkerd-tap-tls Opaque 2 34d
secret/linkerd-tap-token-x4skt kubernetes.io/service-account-token 3 34d
secret/linkerd-web-token-x24x6 kubernetes.io/service-account-token 3 34d
NAME DATA AGE
configmap/linkerd-config 3 34d
configmap/linkerd-grafana-config 3 34d
configmap/linkerd-prometheus-config 1 34d
So I see the correlation between the last deploy and the certificate issue date (Not Before) but I don't see why the linkerd-tap.linkerd.svc certificate was changed.
I guess this can be fixed if Helm always restarts tap pods (and maybe others).
Ooooh, you're totally right! That's it. This'd be a really simple PR using helm's shasum.
Should happen for at least tap, proxy-injector and sp-validator as those are all using certificates associated with the api server.
@grampelberg I took a look at that but have a few questions. To me it looks like we need to hash (in the case of linkerd-tap )against the certs defined in v1alpha1.tap.linkerd.io resource. But in order to do that, we need to define these certs as partials so we are able to refer to them from tap.yaml. And since the certs are generated on the fly every time we include this partial the certs shall be different (i.e. both in tap-rbac.yaml and tap.yaml). Maybe I am a not very well versed in helm syntax, but I cannot really see a good way to handle that one as we are generating these on the fly. Can we refer to the value of caBundle defined in tap-rbac.yaml from tap.yaml as the template is being rendered?
If the former proves to be cumbersome, cant we use --recreate-pods to solve that issue?
@zaharidichev , --recreate-pods will immediately kill all of them. https://github.com/helm/helm/issues/5218
I'm not sure if it was changed in Helm v3 but from my experience, it is not the best idea to use that. kubectl rollout restart does the job better but it must be called explicitly in CI. Other way is to have an environment variable like RESTART_ME or something or an annotation for the same purposes. Any of these solutions don't look mature enough. The simplest solution would be to add random characters somewhere so Kubernetes performs RollingUpdate after any Helm chart update (randAlphaNum). But some people may not be happy with this solution in production.
@zaharidichev I'm resonably sure this'll do it:
{{ include (print $.Template.BasePath "/tap-rbac.yaml") . | sha256sum }}
Now, while that is the "correct" way, it might be easier to just do something like:
{{ .Release.Time }}
As we're creating new certs every time, that'll just roll it every time.
@grampelberg Yes I tried that exact thing and it seems that we we always render tap-rbac.yaml as an empty string here. So if add this annotation to the spec and you do this twice you get the same hash every time. It does not seem to me that this is what we want:
โ linkerd2 git:(master) โ bin/linkerd install --ignore-cluster | grep checksum
checksum/config: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
โ linkerd2 git:(master) โ bin/linkerd install --ignore-cluster | grep checksum
checksum/config: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Am I missing something here? Alternatively, yes we can simply do a timestamp
Hmmm, that's not how I would have expected it to work. Let's just use the timestamp.
Yes will give it a shot, although I have the uneasy feeling that it will bring a different set of problems wrt to testing and .golden outputs, but lets see
So turns out that we need to make sure include is using the right scope (i.e. $ instead of .). The . refers to the current scope and has been changed to {{.Values}} at the start of the template. Using $, we make sure the tap-rbac.yaml template is included using the global scope, hence, all the variables are rendered correctly.
Also, we need to make sure the annotation is add to the pod template, not the deployment. This diff works for me:
diff --git a/charts/linkerd2/templates/tap.yaml b/charts/linkerd2/templates/tap.yaml
index 42d6cd71..d6ed4256 100644
--- a/charts/linkerd2/templates/tap.yaml
+++ b/charts/linkerd2/templates/tap.yaml
@@ -49,6 +49,7 @@ spec:
template:
metadata:
annotations:
+ linkerd.io/config-checksum: {{ include (print $.Template.BasePath "/tap-rbac.yaml") $ | sha256sum }}
{{.CreatedByAnnotation}}: {{default (printf "linkerd/helm %s" .LinkerdVersion) .CliVersion}}
{{- include "partials.proxy.annotations" .Proxy| nindent 8}}
labels:
To reproduce this problem, run:
$ helm upgrade --install linkerd2 charts/linkerd2 --set-file Identity.TrustAnchorsPEM=<crt.pem> --set-file Identity.Issuer.TLS.KeyPEM=<key.pem> --set-file Identity.Issuer.TLS.CrtPEM=<crt.pem> --set Identity.Issuer.CrtExpiry=<crt-expiry-date>
# when control plane is ready, repeat the same command
$ helm upgrade --install linkerd2 charts/linkerd2 --set-file Identity.TrustAnchorsPEM=<crt.pem> --set-file Identity.Issuer.TLS.KeyPEM=<key.pem> --set-file Identity.Issuer.TLS.CrtPEM=<crt.pem> --set Identity.Issuer.CrtExpiry=<crt-expiry-date>
$ linkerd -n linkerd tap deploy
Error: HTTP error, status Code [503] (unexpected API response: Error: 'x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "linkerd-tap.linkerd.svc")'
Trying to reach: 'https://10.106.41.71:443/apis/tap.linkerd.io/v1alpha1/watch/namespaces/linkerd/pods//tap')
With the new annotation, the tap pod will get restarted after the second upgrade --install command.
Most helpful comment
So turns out that we need to make sure
includeis using the right scope (i.e.$instead of.). The.refers to the current scope and has been changed to{{.Values}}at the start of the template. Using$, we make sure thetap-rbac.yamltemplate isincluded using the global scope, hence, all the variables are rendered correctly.Also, we need to make sure the annotation is add to the pod template, not the deployment. This diff works for me:
To reproduce this problem, run:
With the new annotation, the
tappod will get restarted after the secondupgrade --installcommand.