All components in Linkerd seem to work perfectly when CNI is disabled, but recently we considered enabling CNI in order to establish restrictive pod security policies. mTLS works fine, as services can seamesly connect to each other, but we tried using the tap functionality and it is not working. We tried via the CLI, the tap tab in the panel and the live calls when on the deployment's page.
An error is shown on linkerd-tap's logs, which I copied bellow.
We use Helm to install Linkerd, so setting global.cniEnabled is enough to start using it. Then, I restart one of the deployments with kubectl rolling restart deployment <name> and after it's up I run linkerd tap deployment/<name> to try to see its connections.
Log from linkerd-tap:
time="2020-07-06T07:43:18Z" level=info msg="Tapping 1 pods for target: {Namespace:default Type:deployment Name:<name> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}"
time="2020-07-06T07:43:18Z" level=info msg="Establishing tap on 10.1.4.20:4190"
time="2020-07-06T07:43:18Z" level=error msg="[10.1.4.20] encountered an error: rpc error: code = Unauthenticated desc = not_provided_by_remote"
linkerd check outputkubernetes-api
--------------
โ can initialize the client
โ can query the Kubernetes API
kubernetes-version
------------------
โ is running the minimum Kubernetes API version
โ is running the minimum kubectl version
linkerd-existence
-----------------
โ 'linkerd-config' config map exists
โ heartbeat ServiceAccount exist
โ control plane replica sets are ready
โ no unschedulable pods
โ controller pod is running
โ can initialize the client
โ can query the control plane API
linkerd-config
--------------
โ control plane Namespace exists
โ control plane ClusterRoles exist
โ control plane ClusterRoleBindings exist
โ control plane ServiceAccounts exist
โ control plane CustomResourceDefinitions exist
โ control plane MutatingWebhookConfigurations exist
โ control plane ValidatingWebhookConfigurations exist
โ control plane PodSecurityPolicies exist
linkerd-cni-plugin
------------------
โ cni plugin ConfigMap exists
โ cni plugin PodSecurityPolicy exists
โ cni plugin ClusterRole exists
โ cni plugin ClusterRoleBinding exists
โ cni plugin Role exists
โ cni plugin RoleBinding exists
โ cni plugin ServiceAccount exists
โ cni plugin DaemonSet exists
โ cni plugin pod is running on all nodes
linkerd-identity
----------------
โ certificate config is valid
โ trust anchors are using supported crypto algorithm
โ trust anchors are within their validity period
โ trust anchors are valid for at least 60 days
โ issuer cert is using supported crypto algorithm
โ issuer cert is within its validity period
โ issuer cert is valid for at least 60 days
โ issuer cert is issued by the trust anchor
linkerd-api
-----------
โ control plane pods are ready
โ control plane self-check
โ [kubernetes] control plane can talk to Kubernetes
โ [prometheus] control plane can talk to Prometheus
โ tap api service is running
linkerd-version
---------------
โ can determine the latest version
โ cli is up-to-date
control-plane-version
---------------------
โ control plane is up-to-date
โ control plane and cli versions match
linkerd-addons
--------------
โ 'linkerd-config-addons' config map exists
linkerd-grafana
---------------
โ grafana add-on service account exists
โ grafana add-on config map exists
โ grafana pod is running
Status check results are โ
@gmaiztegi Can you attach the linkerd-tap proxy logs? The Unauthenticated error probably means that the target pod doesn't have the identity that tap expected. Did you enable the Linkerd CNI during a fresh installation, or via an upgrade using Helm?
@ihcsim thanks for the answer.
Neither of the proxies (tap and the tapped pod) show any log entry when doing the tap. I haven't touched the log levels of the proxies. Should I set them to an specific value?
I installed it upgrading using Helm. Does it make a difference?
What's the output of linkerd check --pre --linkerd-cni-enabled? Can you also share the Helm commands you used to install the CNI plugin and the control plane?
I installed it upgrading using Helm. Does it make a difference?
Assuming you did a fresh installation of the CNI plugin and control plane, not enabling CNI on an existing non-CNI control plane, this shouldn't make a difference.
@ihcsim I think I have found the issue.
I run linkerd install-cni | kubectl diff -f - to see if there was any difference between my instalation from Helm and the one rendered by the CLI and this showed up:
- "inbound-ports-to-ignore": [],
+ "inbound-ports-to-ignore": ["4190","4191"],
So, I added ignoreInboundPorts: "4190,4191" to my values, performed helm upgrade on the cni, restarted linkerd-tap and one of my deployments, and now the tap works perfectly ๐
I guess that these port numbers should already be the default values in the chart. If you think so, I don't mind taking a few minutes to submit a PR.
@zaharidichev is this issue similar to https://github.com/linkerd/linkerd2/issues/4679?
@gmaiztegi if you were able to go back and check whether the - --inbound-ports-to-ignore arg of your proxy init container was empty which probably was then yes, this is the same issue. Also you cna go ahead and try out this PR to further verify that the problem is fixed: https://github.com/linkerd/linkerd2/pull/4688
@gmaiztegi The fix in https://github.com/linkerd/linkerd2/pull/4688 has landed. Try out the latest edge.