Ingress-nginx: Validating webhook service has no endpoints with hostNet

Created on 4 May 2020  路  11Comments  路  Source: kubernetes/ingress-nginx

NGINX Ingress controller version: 0.32.0

Kubernetes version (use kubectl version): 1.18.2

Environment:

  • Cloud provider or hardware configuration: Bare metal (4 cores, 16 GB)
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.4 LTS
  • Kernel (e.g. uname -a): 4.15.0-96-generic
  • Install tools: kubeadm

What happened:

Validating webhook service has no endpoints when ingress-nginx-controller deployed as a DeamonSet with hostNet: true option, because hostNet pods are not tracked by Kubernetes services:

$ kubectl -n ingress-nginx get endpoints ingress-nginx-controller-admission
NAME                                 ENDPOINTS   AGE
ingress-nginx-controller-admission   <none>      11s

What you expected to happen:

I think working ingress-nginx-controller-admission service is required when ValidatingWebhook is registered.

How to reproduce it:

Install the ingress controller

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-0.32.0/deploy/static/provider/baremetal/deploy.yaml

using DaemonSet instead of Deployment and adding hostNet: true option to pod template spec, according to https://kubernetes.github.io/ingress-nginx/deploy/baremetal/#via-the-host-network

Anything else we need to know:

Since hostNet is the best way to deploy ingress-nginx on baremetal in many cases (e.g. without this, it is very problematic to keep source IP address for logs, since externalTrafficPolicy: Local for NodePort and LoadBalancer services has many drawbacks), please consider options to cleanly install ingress-nginx without ValidatingWebhook, or using dedicated Deployment for validation.

/kind bug

kinbug lifecyclrotten

All 11 comments

This problem also can be solved by using "portmap" CNI plugin and disabling hostNetwork mode.
When ports of nginx-ingress-controller container in nginx-ingress-controller DaemonSet looks like

          ports:
            - name: http
              containerPort: 80
              hostPort: 80
              protocol: TCP
            - name: https
              containerPort: 443
              hostPort: 443
              protocol: TCP
            - name: webhook
              containerPort: 8443
              protocol: TCP

Ingress controller works normally, and Nginx does see correct client (source) IPs. On the other hand, services also can match nginx-ingress-controller pods as endpoints.

So, let's get things together. If you need to deploy ingress-nginx on bare metal cluster and keep client's source IPs, here is the check list:

  • nginx-ingress-controller must be deployed ad DaemonSet, not Deployment, to prevent port conflicts.
  • hostNetwork must be false (default).
  • For exported ports (80 and 443), same hostPort must be set (usually there is no reason to export any other ports, like webhook and metrics).
  • These ports must not be used on nodes where nginx-ingress-controller pods are allowed to run.
  • portmap plugin must be supported and enabled in CNI configuration. For use in conjunction with Flannel plugin (for example), its CNI config (usually /etc/cni/net.d/10-flannel.conflist) should look like
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

NB: portmap plugin can also be used with other CNI plugins, like Calico or WeaveNet.

/assign

I tried with "portmap" CNI plugin Calico and no problem observed

hi @nnz1024, @kundan2707
I've set

hostNetwork = false

hostPort:
enabled: true
ports:
http: 80
https: 443

kind: DaemonSet

controller.service.type: LoadBalancer

After deployment ingress-nginx in my 'test' namespace and before creating my test ingress, I found the 80/443 is not opened in all worker nodes.
Any guidance for an example chart value.yaml is appreciated.

@wantdrink Can you please

  1. Provide contents of all config files from /etc/cni/net.d/ at any worker node?
  2. Check is binary /opt/cni/bin/portmap present on worker nodes?

Hi @nnz1024 , here is the /etc/cni/net.d/10-flannel.conflist in all of the worker nodes (my local k8s cluster installed with kubeadm including 3 nodes in controller plane and 3 worker nodes, network type in vmware is NAT).
OS: CentOS 7.7
Kernel: 3.10.0-1127.10.1.el7.x86_64

{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

also the binary exists:

-rwxr-xr-x 1 root root 3072360 Jun 23 20:10 /opt/cni/bin/portmap

Thanks.

@wantdrink All looks OK. What does iptables -t nat -vnL CNI-HOSTPORT-DNAT show? And, if there is a rule with a target named like CNI-DN-xxxxxxxxxxxxxxx, what iptables -t nat -vnL <target_name> will show?

Hi @nnz1024 ,

iptables -t nat -vnL CNI-HOSTPORT-DNAT
iptables: No chain/target/match by that name.

No records found.

@wantdrink Looks like there is no established hostPort mappings on this node. Can you show spec section of your ingress-nginx-controller daemonset? Command will look like kubectl -n ingress-nginx get daemonset ingress-nginx-controller -o yaml. There must be lot of data, but we interest only in spec section.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Was this page helpful?
0 / 5 - 0 ratings