Linkerd2: CNI plugin creates iptables rules before the linkerd proxy is started, blocking networking in initContainers

Created on 23 Jul 2020  ยท  6Comments  ยท  Source: linkerd/linkerd2

Bug Report

What is the issue?

If one uses the CNI plugin instead of the linkerd proxy-init initContainer, then a Pod's initContainers won't be able to create outbound TCP connections.
In such a setup, the Pod's initContainers will run before the linkerd proxy can start, but after the CNI plugin created the iptables rules.

Timeline:

  • a Pod is created
  • the CNI plugin creates iptables rules to redirect all outgoing traffic to the port used by the linkerd proxy to intercept traffic.
  • a initContainer is started. The linkerd proxy, being a regular container, is not started yet.
  • the initContainer fails as it has no egress connectivity because of the iptables rules

How can it be reproduced?

Install linkerd in a Kubernetes cluster, activate automatic proxy injection and the CNI plugin (linkerd install --linkerd-cni-enabled)
Install the CNI plugin, following the instructions at https://linkerd.io/2/features/cni/.
Create the following file:

$ cat <<EOF >pod-with-init-container.yaml
apiVersion: v1
kind: Pod
metadata:
    name: pod-with-init-container
spec:
    initContainers:
        - name: init
          image: busybox
          command:
              - /bin/sh
              - -c
              - wget www.google.com
    containers:
        - name: web
          image: nginx
          ports:
              - name: web
                containerPort: 80
                protocol: TCP
EOF

Then create the Pod:

$ kubectl apply -f pod-with-init-container.yaml

Observe that the Pod initContainer fails

$ kubectl get pod pod-with-init-container
NAME                      READY   STATUS                  RESTARTS   AGE
pod-with-init-container   0/2     Init:CrashLoopBackOff   2          50s

and that the culprit is

$ kubectl logs pod-with-init-container -c init
Connecting to www.google.com (172.217.16.196:80)
wget: can't connect to remote host (172.217.16.196): Connection refused

If you disable the proxy injection and recreate the Pod, then it starts OK.

Logs, error output, etc

See above. You can also see that the CNI plugin did create the iptables rules (and this can be confirmed by running iptables -L -t nat in the Pod network namespace).

kubelet[4915]: 2020/07/23 16:48:19 > iptables -t nat -vnL
kubelet[4915]: 2020/07/23 16:48:19 >> nsenter [--net=/proc/10384/ns/net iptables -t nat -vnL]
kubelet[4915]: 2020/07/23 16:48:19 < Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
kubelet[4915]: pkts bytes target     prot opt in     out     source               destination
kubelet[4915]: 0     0 PROXY_INIT_REDIRECT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/install-proxy-init-prerouting/1595522898 */
kubelet[4915]: Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
kubelet[4915]: pkts bytes target     prot opt in     out     source               destination
kubelet[4915]: Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
kubelet[4915]: pkts bytes target     prot opt in     out     source               destination
kubelet[4915]: 0     0 PROXY_INIT_OUTPUT  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/install-proxy-init-output/1595522898 */
kubelet[4915]: Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
kubelet[4915]: pkts bytes target     prot opt in     out     source               destination
kubelet[4915]: Chain PROXY_INIT_OUTPUT (1 references)
kubelet[4915]: pkts bytes target     prot opt in     out     source               destination
kubelet[4915]: 0     0 PROXY_INIT_REDIRECT  all  --  *      lo      0.0.0.0/0           !127.0.0.1            owner UID match 2102 /* proxy-init/redirect-non-loopback-local-traffic/1595522898 */
kubelet[4915]: 0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            owner UID match 2102 /* proxy-init/ignore-proxy-user-id/1595522898 */
kubelet[4915]: 0     0 RETURN     all  --  *      lo      0.0.0.0/0            0.0.0.0/0            /* proxy-init/ignore-loopback/1595522898 */
kubelet[4915]: 0     0 REDIRECT   tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/redirect-all-outgoing-to-proxy-port/1595522898 */ redir ports 4140
kubelet[4915]: Chain PROXY_INIT_REDIRECT (2 references)
kubelet[4915]: pkts bytes target     prot opt in     out     source               destination
kubelet[4915]: 0     0 RETURN     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 4190,4191 /* proxy-init/ignore-port-4190,4191/1595522898 */
kubelet[4915]: 0     0 REDIRECT   tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* proxy-init/redirect-all-incoming-to-proxy-port/1595522898 */ redir ports 4143

linkerd check output

kubernetes-api
--------------
โˆš can initialize the client
โˆš can query the Kubernetes API

kubernetes-version
------------------
โˆš is running the minimum Kubernetes API version
โˆš is running the minimum kubectl version

linkerd-existence
-----------------
โˆš 'linkerd-config' config map exists
โˆš heartbeat ServiceAccount exist
โˆš control plane replica sets are ready
โˆš no unschedulable pods
โˆš controller pod is running
โˆš can initialize the client
โˆš can query the control plane API

linkerd-config
--------------
โˆš control plane Namespace exists
โˆš control plane ClusterRoles exist
โˆš control plane ClusterRoleBindings exist
โˆš control plane ServiceAccounts exist
โˆš control plane CustomResourceDefinitions exist
โˆš control plane MutatingWebhookConfigurations exist
โˆš control plane ValidatingWebhookConfigurations exist
โˆš control plane PodSecurityPolicies exist

linkerd-identity
----------------
โˆš certificate config is valid
โˆš trust anchors are using supported crypto algorithm
โˆš trust anchors are within their validity period
โˆš trust anchors are valid for at least 60 days
โˆš issuer cert is using supported crypto algorithm
โˆš issuer cert is within its validity period
โˆš issuer cert is valid for at least 60 days
โˆš issuer cert is issued by the trust anchor

linkerd-api
-----------
โˆš control plane pods are ready
โˆš control plane self-check
โˆš [kubernetes] control plane can talk to Kubernetes
โˆš [prometheus] control plane can talk to Prometheus
โˆš tap api service is running

linkerd-version
---------------
โˆš can determine the latest version
โˆš cli is up-to-date

control-plane-version
---------------------
โˆš control plane is up-to-date
โˆš control plane and cli versions match

linkerd-addons
--------------
โˆš 'linkerd-config-addons' config map exists

linkerd-grafana
---------------
โˆš grafana add-on service account exists
โˆš grafana add-on config map exists
โˆš grafana pod is running

Status check results are โˆš

Environment

  • Kubernetes Version: 1.14
  • Cluster Environment: (GKE, AKS, kops, ...) EKS
  • Host OS: Amazon Linux 2
  • Linkerd version: 2.8.1

Possible solution

The CNI plugin should not inject iptables rules before all initContainers are completed - but I have no idea whether this would work with the CNI API.

Additional context

I suspect that this is the same issue as https://github.com/linkerd/linkerd2/issues/3812. I'm creating a new issue with more context in the hope that the discussion will start again :smile:

arecni wontfix

Most helpful comment

One workaround is to skip the outbound port of the init container, by annotating the pod template with config.linkerd.io/skip-outbound-ports. See https://linkerd.io/2/reference/proxy-configuration/.

All 6 comments

One workaround is to skip the outbound port of the init container, by annotating the pod template with config.linkerd.io/skip-outbound-ports. See https://linkerd.io/2/reference/proxy-configuration/.

Is there a way to have the annotation only apply to init containers? In our use case, monitoring outgoing HTTP traffic to non-meshed services in non-init containers has value.

@Neki I don't think that's possible. The init containers share the same network namespace as all the other containers in the same pod.

As we are hitting the same issue with some of our workloads (init container that needs network connectivity) while using the CNI plugin:

Is there a possibility to exclude a Pod from the CNI plugin (so it needs the Linkerd init container again)?

That way we could use the CNI method as default and only for some special workloads that require this behavior weโ€™d need to create a separate Pod Security Policy.

Else, weโ€™d either need to give all meshed workloads more capabilities than weโ€™d like to or not mesh the workloads with init containers at all which is both suboptimal.

@marratj I feel supporting both the CNI plugin and proxy-init will make things more complicated than just simply skipping the ports. We have to account for the _timing_ on when k8s CNI plugins are run. The Linkerd CNI plugin would have been called before the Linkerd init container is live. And whatever changes we make, the plugin has to satisfy the API spec, where each plugin has to return something sane to subsequent plugins in the chain. So I don't know if we can just "skip" the plugin.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alpeb picture alpeb  ยท  3Comments

manimaul picture manimaul  ยท  3Comments

steve-fraser picture steve-fraser  ยท  4Comments

olix0r picture olix0r  ยท  3Comments

zaharidichev picture zaharidichev  ยท  4Comments