When linkerd injects it's proxy into a pod where service account automount is disabled, the pod fails to start up And goes into a restart loop. The root cause is the linkerd-proxy looking for the mounted serviceaccount info but not finding it.
Try to instrument a pod that has the following attribute set in it's spec:
automountServiceAccountToken: false
I encountered this issue with the official argocd manifest: https://argoproj.github.io/argo-cd/getting_started/#1-install-argo-cd
On pod startup the linkerd-proxy container outputs this:
time="2019-05-23T15:47:08Z" level=info msg="running version dev-undefined"
time="2019-05-23T15:47:08Z" level=info msg="Using with pre-existing key: /var/run/linkerd/identity/end-entity/key.p8"
time="2019-05-23T15:47:08Z" level=info msg="Using with pre-existing CSR: /var/run/linkerd/identity/end-entity/key.p8"
ERR! [ 0.000310s] linkerd2_proxy::app::config Could not read LINKERD2_PROXY_IDENTITY_TOKEN_FILE: No such file or directory (os error 2)
ERR! [ 0.000354s] linkerd2_proxy::app::config LINKERD2_PROXY_IDENTITY_TOKEN_FILE="/var/run/secrets/kubernetes.io/serviceaccount/token" is not valid: InvalidTokenSource
configuration error: InvalidEnvVar
linkerd check outputโ linkerd check
kubernetes-api
--------------
โ can initialize the client
โ can query the Kubernetes API
kubernetes-version
------------------
โ is running the minimum Kubernetes API version
โ is running the minimum kubectl version
linkerd-existence
-----------------
โ control plane namespace exists
โ controller pod is running
โ can initialize the client
โ can query the control plane API
linkerd-api
-----------
โ control plane pods are ready
โ control plane self-check
โ [kubernetes] control plane can talk to Kubernetes
โ [prometheus] control plane can talk to Prometheus
โ no invalid service profiles
linkerd-version
---------------
โ can determine the latest version
โ cli is up-to-date
control-plane-version
---------------------
โ control plane is up-to-date
โ control plane and cli versions match
Status check results are โ
โ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
โ linkerd version
Client version: stable-2.3.0
Server version: stable-2.3.0
Allow the mTLS to work without a service-account.
@csreegn The mTLS feature uses the service account to verify identity. If you feel that it's absolutely necessary for the argocd-repo-server workload to have automountServiceAccountToken: false, then you can use linkerd inject --disable-identity to disable mTLS for that particular workload. That way, you will still get all the other good stuff that comes with Linkerd, minus the mTLS identity feature.
Note that the disable-identity option isn't supported with auto-injection in stable-2.3.0. If you are using an auto-inject set-up, you can try one of the newer edge versions.
I honestly don't know if automountServiceAccountToken: false is needed or not, but I would imagine that from a security perspective there are always the more paranoid who don't want to mount the SA token if they don't use it. (I am not one of them, but it makes sense to limit the attack surface of a pod).
Also if I deploy a service that has the automountServiceAccountToken: false set, I think it would be acceptable for the pod injector to flip it to true if linkerd definitely needs the serviceaccount. Otherwise using linkerd is not transparent (eg. abstraction leakage). Right now if I don't explicitly set this flag to true, my pod won't even start, and I think that's a violation of the contract of linkerd to be transparent. Maybe a graceful degradation scenario would be acceptable? (don't provide mtls for the pod if no SA token is present, but don't fail the startup).
My temporary workaround is to set the flag to true, but my worry is that for common off the shelf (COTS) applications that are installed with helm, or the manifest is taken verbatim and applied (with kubectl apply -f), you will never know for sure if the app will start, because you will never know if the flag is set or not (or for helm it's also not sure if it's configurable or not).
btw, if there the Linkerd proxy is outputting errors in its logs, the linkerd check --proxy command should fail. What does the command output on your end?
I think there is a trade-off here between the amount of auto-mutation we want to introduce into the proxy injection process and the user's experience. Like you said, there are people who explicitly don't want to mount the SA token for security reasons. Personally, I will have problem with some intermediary flipping on a security flag (in this case, automountServiceAccountToken) that I explicitly disabled. That same goes for automatically disabling the Linkerd mTLS. My preference is if a new workload that I'm about to deploy doesn't comply with my cluster security settings, I will want the deployment to fail loudly.
linkerd check --proxy command does not finish, it idles waiting for proxies to be ready:
โ linkerd check --proxy
kubernetes-api
--------------
โ can initialize the client
โ can query the Kubernetes API
kubernetes-version
------------------
โ is running the minimum Kubernetes API version
โ is running the minimum kubectl version
linkerd-existence
-----------------
โ control plane namespace exists
โ controller pod is running
โ can initialize the client
โ can query the control plane API
linkerd-api
-----------
โ control plane pods are ready
โ control plane self-check
โ [kubernetes] control plane can talk to Kubernetes
โ [prometheus] control plane can talk to Prometheus
โ no invalid service profiles
linkerd-version
---------------
โ can determine the latest version
โ cli is up-to-date
linkerd-data-plane
------------------
โ data plane namespace exists
\ data plane proxies are ready -- waiting for check to complete
That doesn't really communicate what is wrong, or how to resolve it.
Yeah, with the service account automount being a crucial precondition for mTLS to work, we can be a bit smarter with this check. @olix0r proposed that we update the proxy injector to fail the proxy injection if mTLS is enabled, but the pod spec has automountServiceAccountToken: false.
Will you be interested in submitting a PR? :wink: I'll be happy to show you where the webhook code is. It should be an easy fix, but no pressure though.
Sure, I'll try to squeeze it in within a few weeks. Never coded in golang before, but there's a first time for everything!
What does the linkerd sidecar do with the service account token?
@mikedanese https://linkerd.io/2/features/automatic-mtls/#how-does-it-work
Linkerd sidecar injector should inject an audience bound token into the linkerd proxy container. This would provide better security (token bound to identity service would not be replayable against the Kubernetes API, and vice versa) and would also remove linkerd's dependency on automountServiceAccountToken. Doc here:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection
And more details on the differences to legacy tokens here:
https://github.com/kubernetes/kubernetes/issues/70679#issue-377670415
That would be awesome!
In fact, the docs that @grampelberg linked to specifically call out time- and audience-bound SA tokens as future work once they're available "in future Kubernetes releases". @mikedanese I'm assuming this means they are ready for us to use in modern K8s versions?
trying to mesh: elasticsearch:7.4.1
โ time="2019-10-29T06:38:02Z" level=info msg="running version stable-2.6.0"
โ time="2019-10-29T06:38:02Z" level=info msg="Using with pre-existing key: /var/run/linkerd/identity/end-entity/key.p8"
โ time="2019-10-29T06:38:02Z" level=info msg="Using with pre-existing CSR: /var/run/linkerd/identity/end-entity/key.p8"
โ configuration error: InvalidEnvVar
โ ERR! [ 0.002000s] linkerd2_proxy::app::config Could not read LINKERD2_PROXY_IDENTITY_TOKEN_FILE: No such file or directory (os error 2)
โ ERR! [ 0.002021s] linkerd2_proxy::app::config LINKERD2_PROXY_IDENTITY_TOKEN_FILE="/var/run/secrets/kubernetes.io/serviceaccount/token" is not valid: InvalidTokenSource
โ
@masterkain are you using an operator? It wouldn't be something related to the image you're using, instead how you're deploying it on the cluster.
@masterkain @grampelberg
For the cloud on k8s version of elastic installation at least the issue was also an explicit automountServiceAccountToken: false setting on the pod. Fixed in https://github.com/elastic/cloud-on-k8s/issues/1151
I just posted this in the linkerd slack channel, but here is a config that will allow linkerd 2.6 to inject with elastic-cloud-on-k8s, in case this helps anyone else seeing this issue.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 7.5.2
nodeSets:
- name: default
count: 1
podTemplate:
metadata:
annotations:
linkerd.io/inject: "enabled"
spec:
automountServiceAccountToken: true
Most helpful comment
I just posted this in the linkerd slack channel, but here is a config that will allow linkerd 2.6 to inject with elastic-cloud-on-k8s, in case this helps anyone else seeing this issue.