NGINX Ingress controller version: ingress-nginx-2.0.1
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:33:14Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-gke.9", GitCommit:"e1af17fd873e15a48769e2c7b9851405f89e3d0d", GitTreeState:"clean", BuildDate:"2020-04-06T20:56:54Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
uname -a):What happened:
We added a new host to ingress, and a service with externalName and it caused a 503 error for all hosts defined in the ingress.
What you expected to happen:
Having our proxy working !
How to reproduce it:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
certmanager.k8s.io/acme-http01-edit-in-place: "true"
certmanager.k8s.io/cluster-issuer: letsencrypt-prod
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/custom-http-errors: 501,502,503,504
nginx.ingress.kubernetes.io/default-backend: custom-default-backend
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
name: apps-ingress
namespace: default
spec:
rules:
- host: proxy.example.com
http:
paths:
- backend:
serviceName: proxy-google
servicePort: 80
tls:
- hosts:
- proxy.example.com
----
apiVersion: v1
kind: Service
metadata:
name: proxy-google
namespace: default
spec:
externalName: google.com
ports:
- port: 80
protocol: TCP
targetPort: 80
sessionAffinity: None
type: ExternalName
status:
loadBalancer: {}
Anything else we need to know:
I am not quite sure how to troubleshoot this, any advice are welcome ! Thanks
/kind bug
We added a new host to ingress, and a service with externalName and it caused a 503 error for all hosts defined in the ingress.
Can you be more specific? What are you doing exactly?
Create a local cluster with kind https://kind.sigs.k8s.io/docs/user/ingress/#ingress-nginx
echo "
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
certmanager.k8s.io/acme-http01-edit-in-place: 'true'
certmanager.k8s.io/cluster-issuer: letsencrypt-prod
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/custom-http-errors: 501,502,503,504
nginx.ingress.kubernetes.io/default-backend: custom-default-backend
nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
name: apps-ingress
spec:
rules:
- host: proxy.example.com
http:
paths:
- backend:
serviceName: proxy-google
servicePort: 80
---
apiVersion: v1
kind: Service
metadata:
name: proxy-google
namespace: default
spec:
externalName: google.com
ports:
- port: 80
protocol: TCP
targetPort: 80
sessionAffinity: None
type: ExternalName
" | kubectl apply -f -
this works fine
Hi @aledbf Thanks for your quick answer. Yes, this is exactly what we have as a config. Can you give me more details on where to look to provide you the best troubleshooting information ?
Thanks,
Samuel
Can you give me more details on where to look to provide you the best troubleshooting information ?
Actually, you should provide the steps you followed to get "caused a 503 error for all hosts defined in the ingress"
@aledbf as I mentioned earlier, this is exactly the steps we took.
With the config above, the result we have is a 503 error for all hosts defined in the ingress.
If we change the ingress serviceName to a different service than the ExternalName everything start working again.
How do we inspect the config generated for nginx ?
Where the relevant logs would be located that could indicate the cause of the 503?
Found your troubleshooting guide https://kubernetes.github.io/ingress-nginx/troubleshooting/
I'll update you with more details later today
Found this in the logs:
E 2020-04-29T05:27:04.081489818Z 2020/04/29 05:27:03 [error] 1949#1949: init_worker_by_lua error: /usr/local/share/lua/5.1/resty/dns/resolver.lua:121: API disabled in the context of init_worker_by_lua*
E 2020-04-29T05:27:04.081559177Z stack traceback:
E 2020-04-29T05:27:04.081565518Z [C]: in function 'udp'
E 2020-04-29T05:27:04.081570712Z /usr/local/share/lua/5.1/resty/dns/resolver.lua:121: in function 'new'
E 2020-04-29T05:27:04.081575738Z /etc/nginx/lua/util/dns.lua:97: in function 'dns_lookup'
E 2020-04-29T05:27:04.081595409Z /etc/nginx/lua/balancer.lua:74: in function 'resolve_external_names'
E 2020-04-29T05:27:04.081600632Z /etc/nginx/lua/balancer.lua:123: in function 'sync_backend'
E 2020-04-29T05:27:04.081605268Z /etc/nginx/lua/balancer.lua:146: in function 'sync_backends'
E 2020-04-29T05:27:04.081610096Z /etc/nginx/lua/balancer.lua:251: in function 'init_worker'
E 2020-04-29T05:27:04.081614534Z init_worker_by_lua:3: in main chunk
E 2020-04-29T05:27:04.087808593Z 2020/04/29 05:27:03 [error] 1950#1950: init_worker_by_lua error: /usr/local/share/lua/5.1/resty/dns/resolver.lua:121: API disabled in the context of init_worker_by_lua*
E 2020-04-29T05:27:04.087857443Z stack traceback:
After looking into the docs, i think the issue might be a misc-configured resolver-addresses.
Thanks,
Samuel
@ElvinEfendi ping. Please check the dns error
@sadortun please use the "Check the Nginx Configuration" example and post the resolver line
@aledbf resolver 10.35.240.10 valid=30s ipv6=off;
FYI Currently the externalName service is NOT configured. Ill be able to perform more tests later tonight
@aledbf can you build a dev image out of https://github.com/kubernetes/ingress-nginx/pull/5481 for @sadortun to try?
@sadortun please use the image quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev
@aledbf Awesome ! It works !
This is very hard to reproduce, I'm curious about your setup. How many ingresses do you have in the cluster? How many Nginx workers do you have per ingress-nginx pod?
For this bug to be triggered, you have to have the backend data pushed to shared memory before all Nginx workers are up. And remaining workers should boot before the other workers resolve the external name. Having a slow DNS server can increase the likelihood of this.
Nothing special, a few ingreses in different namespaces, 2 nginx workers,
If you want to inverstigate further, just send me an email, we can schedule a meeting ill show you
Thanks for the offer @sadortun I'll kcco it this time unless it resurfaces it again.
Most helpful comment
@sadortun please use the image
quay.io/kubernetes-ingress-controller/nginx-ingress-controller-amd64:dev