Ingress-nginx: Ingress-wide 503 after adding a service with externalName

Created on 29 Apr 2020  路  13Comments  路  Source: kubernetes/ingress-nginx

NGINX Ingress controller version: ingress-nginx-2.0.1

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:33:14Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-gke.9", GitCommit:"e1af17fd873e15a48769e2c7b9851405f89e3d0d", GitTreeState:"clean", BuildDate:"2020-04-06T20:56:54Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel (e.g. uname -a):
    Linux cs-6000-devshell-vm-41123126-8492-4b6e-9a11-4ba16f98eb09 4.19.112+ #1 SMP Sat Apr 4 00:12:45 PDT 2020 x86_64 GNU/Linux
  • Install tools:
  • Others:

What happened:
We added a new host to ingress, and a service with externalName and it caused a 503 error for all hosts defined in the ingress.

What you expected to happen:
Having our proxy working !

How to reproduce it:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    certmanager.k8s.io/acme-http01-edit-in-place: "true"
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/custom-http-errors: 501,502,503,504
    nginx.ingress.kubernetes.io/default-backend: custom-default-backend
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
  name: apps-ingress
  namespace: default
spec:
  rules:
  - host: proxy.example.com
    http:
      paths:
      - backend:
          serviceName: proxy-google
          servicePort: 80
  tls:
  - hosts:
    - proxy.example.com
----
apiVersion: v1
kind: Service
metadata:
  name: proxy-google
  namespace: default
spec:
  externalName: google.com
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  sessionAffinity: None
  type: ExternalName
status:
  loadBalancer: {}

Anything else we need to know:
I am not quite sure how to troubleshoot this, any advice are welcome ! Thanks

/kind bug

kinbug

Most helpful comment

@sadortun please use the image quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev

All 13 comments

We added a new host to ingress, and a service with externalName and it caused a 503 error for all hosts defined in the ingress.

Can you be more specific? What are you doing exactly?

Create a local cluster with kind https://kind.sigs.k8s.io/docs/user/ingress/#ingress-nginx

echo "
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    certmanager.k8s.io/acme-http01-edit-in-place: 'true'
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/custom-http-errors: 501,502,503,504
    nginx.ingress.kubernetes.io/default-backend: custom-default-backend
    nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
  name: apps-ingress
spec:
  rules:
  - host: proxy.example.com
    http:
      paths:
      - backend:
          serviceName: proxy-google
          servicePort: 80

---

apiVersion: v1
kind: Service
metadata:
  name: proxy-google
  namespace: default
spec:
  externalName: google.com
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  sessionAffinity: None
  type: ExternalName
" | kubectl apply -f -  

this works fine

Hi @aledbf Thanks for your quick answer. Yes, this is exactly what we have as a config. Can you give me more details on where to look to provide you the best troubleshooting information ?

Thanks,
Samuel

Can you give me more details on where to look to provide you the best troubleshooting information ?

Actually, you should provide the steps you followed to get "caused a 503 error for all hosts defined in the ingress"

@aledbf as I mentioned earlier, this is exactly the steps we took.

With the config above, the result we have is a 503 error for all hosts defined in the ingress.

If we change the ingress serviceName to a different service than the ExternalName everything start working again.

How do we inspect the config generated for nginx ?
Where the relevant logs would be located that could indicate the cause of the 503?

Edit

Found your troubleshooting guide https://kubernetes.github.io/ingress-nginx/troubleshooting/

I'll update you with more details later today

Edit 2

Found this in the logs:

E 2020-04-29T05:27:04.081489818Z 2020/04/29 05:27:03 [error] 1949#1949: init_worker_by_lua error: /usr/local/share/lua/5.1/resty/dns/resolver.lua:121: API disabled in the context of init_worker_by_lua*
E 2020-04-29T05:27:04.081559177Z stack traceback:
E 2020-04-29T05:27:04.081565518Z    [C]: in function 'udp'
E 2020-04-29T05:27:04.081570712Z    /usr/local/share/lua/5.1/resty/dns/resolver.lua:121: in function 'new'
E 2020-04-29T05:27:04.081575738Z    /etc/nginx/lua/util/dns.lua:97: in function 'dns_lookup'
E 2020-04-29T05:27:04.081595409Z    /etc/nginx/lua/balancer.lua:74: in function 'resolve_external_names'
E 2020-04-29T05:27:04.081600632Z    /etc/nginx/lua/balancer.lua:123: in function 'sync_backend'
E 2020-04-29T05:27:04.081605268Z    /etc/nginx/lua/balancer.lua:146: in function 'sync_backends'
E 2020-04-29T05:27:04.081610096Z    /etc/nginx/lua/balancer.lua:251: in function 'init_worker'
E 2020-04-29T05:27:04.081614534Z    init_worker_by_lua:3: in main chunk
E 2020-04-29T05:27:04.087808593Z 2020/04/29 05:27:03 [error] 1950#1950: init_worker_by_lua error: /usr/local/share/lua/5.1/resty/dns/resolver.lua:121: API disabled in the context of init_worker_by_lua*
E 2020-04-29T05:27:04.087857443Z stack traceback:

After looking into the docs, i think the issue might be a misc-configured resolver-addresses.

Thanks,
Samuel

@ElvinEfendi ping. Please check the dns error

@sadortun please use the "Check the Nginx Configuration" example and post the resolver line

@aledbf resolver 10.35.240.10 valid=30s ipv6=off;

FYI Currently the externalName service is NOT configured. Ill be able to perform more tests later tonight

@aledbf can you build a dev image out of https://github.com/kubernetes/ingress-nginx/pull/5481 for @sadortun to try?

@sadortun please use the image quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev

@aledbf Awesome ! It works !

This is very hard to reproduce, I'm curious about your setup. How many ingresses do you have in the cluster? How many Nginx workers do you have per ingress-nginx pod?

For this bug to be triggered, you have to have the backend data pushed to shared memory before all Nginx workers are up. And remaining workers should boot before the other workers resolve the external name. Having a slow DNS server can increase the likelihood of this.

Nothing special, a few ingreses in different namespaces, 2 nginx workers,

If you want to inverstigate further, just send me an email, we can schedule a meeting ill show you

Thanks for the offer @sadortun I'll kcco it this time unless it resurfaces it again.

Was this page helpful?
0 / 5 - 0 ratings