During no traffic and no ingress changes in the cluster, controller starts to emit lua socket errors:
2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timer
I'm wondering if this is caused by frequent DNS lookups on upstreams that have low TTL.
NGINX Ingress controller version:
NGINX Ingress controller
Release: v0.34.1
Build: v20200715-ingress-nginx-2.11.0-8-gda5fa45e2
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.1
Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.2", GitCommit:"fb7add51f767aae42655d39972210dc1c5dbd4b3", GitTreeState:"clean", BuildDate:"2020-06-01T22:20:10Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
Cloud provider or hardware configuration:
GKE - 1.15.12-gke.2 (zonal master cluster)
OS (e.g. from /etc/os-release):
GKE COS
Kernel (e.g. uname -a):
What happened:
During no-traffic and no ingress changes, the controller emits the following errors:
2020/08/22 06:21:40 [error] 36#36: *789345 lua udp socket read timed out, context: ngx.timer
2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timer
X - - [22/Aug/2020:06:26:07 +0000] "GET /ping HTTP/1.1" 200 2 "-" "curl/7.64.1" 423 0.003 [X] [] 10.40.7.4:8888 2 0.003 200 X X
It does not seem to affect traffic, as traffic passes through the controller.
Seen this reported in #5780 as well.
I have a hunch that this is related to one Ingress using ExternalName, where the TTL for the DNS record is low, and causes frequent DNS queries for the controller(?).
Relevant Ingress and Service definition:
apiVersion: v1
kind: Service
metadata:
name: s3-eu-central-1
namespace: nginx-cf
spec:
externalName: s3.eu-central-1.amazonaws.com
ports:
- name: https
port: 443
protocol: TCP
targetPort: 443
sessionAffinity: None
type: ExternalName
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx-cf
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/upstream-vhost: X.s3.amazonaws.com
nginx.ingress.kubernetes.io/configuration-snippet: |
rewrite ^/$ /index.html break;
nginx.ingress.kubernetes.io/rewrite-target: /$1
name: assets
namespace: nginx-cf
spec:
rules:
- host: "X"
http:
paths:
- backend:
serviceName: s3-eu-central-1
servicePort: 443
path: /(.*)
Running the controller i debug mode (v: 5), I noticed a stream of events related to the low TTL:
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 3., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 3., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.115] and ttl of 2., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.123] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer
What you expected to happen:
Some indication of the cause of the error
How to reproduce it:
Unable to forcefully reproduce it, comes and goes. Sometimes it is very noisy, even when there is _no_ traffic and no ingress changes:
$ kubectl --namespace nginx-cf logs nginx-cf-ingress-nginx-controller-59645b499b-jsnmq | grep "020/08/22 05:22" | wc -l
567
Anything else we need to know:
controller:
admissionWebhooks:
enabled: false # not supported on k8s <1.16
ingressClass: nginx-cf
extraArgs:
v: 2
default-ssl-certificate: "nginx-cf/nginx-cf"
config:
keep-alive: "900" #
use-forwarded-headers: "true"
# $http_cf_ray is added for debugging
log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id $http_cf_ray'
proxy-real-ip-cidr: "X"
server-snippet: |
# X-Forwarded-For is http, meaning that the
# request was over http and not https,
# instruct the client to redirect to the https endpoint
if ($http_x_forwarded_proto = "http") {
return 301 https://$host$request_uri;
}
resources:
limits:
memory: 3Gi
requests:
cpu: 100m
memory: 256Mi
service:
enableHttp: false
loadBalancerSourceRanges:
- X
type: "LoadBalancer"
externalTrafficPolicy: "Local"
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 500
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- nginx-cf # Must must releaseName in helm
topologyKey: "failure-domain.beta.kubernetes.io/zone"
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- nginx-cf # Must must releaseName in helm
topologyKey: "kubernetes.io/hostname"
defaultBackend:
enabled: true
replicaCount: 1
maxmindLicenseKey: X
/kind support
@toredash: The label(s) kind/support cannot be applied, because the repository doesn't have them
In response to this:
During no traffic and no ingress changes in the cluster, controller starts to emit lua socket errors:
2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timerI'm wondering if this is caused by frequent DNS lookups on upstreams that have low TTL.
NGINX Ingress controller version:
NGINX Ingress controller Release: v0.34.1 Build: v20200715-ingress-nginx-2.11.0-8-gda5fa45e2 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.1Kubernetes version (use
kubectl version):Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.2", GitCommit:"fb7add51f767aae42655d39972210dc1c5dbd4b3", GitTreeState:"clean", BuildDate:"2020-06-01T22:20:10Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}Environment:
Cloud provider or hardware configuration:
GKE - 1.15.12-gke.2 (zonal master cluster)OS (e.g. from /etc/os-release):
GKE COSKernel (e.g.
uname -a):- Install tools:
helm- Others:
What happened:
During no-traffic and no ingress changes, the controller emits the following errors:
2020/08/22 06:21:40 [error] 36#36: *789345 lua udp socket read timed out, context: ngx.timer 2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timer X - - [22/Aug/2020:06:26:07 +0000] "GET /ping HTTP/1.1" 200 2 "-" "curl/7.64.1" 423 0.003 [X] [] 10.40.7.4:8888 2 0.003 200 X XIt does not seem to affect traffic, as traffic passes through the controller.
Seen this reported in #5780 as well.
I have a hunch that this is related to one Ingress using ExternalName, where the TTL for the DNS record is low, and causes frequent DNS queries for the controller(?).
Relevant Ingress and Service definition:
apiVersion: v1 kind: Service metadata: name: s3-eu-central-1 namespace: nginx-cf spec: externalName: s3.eu-central-1.amazonaws.com ports: - name: https port: 443 protocol: TCP targetPort: 443 sessionAffinity: None type: ExternalName apiVersion: extensions/v1beta1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: nginx-cf nginx.ingress.kubernetes.io/backend-protocol: HTTPS nginx.ingress.kubernetes.io/upstream-vhost: X.s3.amazonaws.com nginx.ingress.kubernetes.io/configuration-snippet: | rewrite ^/$ /index.html break; nginx.ingress.kubernetes.io/rewrite-target: /$1 name: assets namespace: nginx-cf spec: rules: - host: "X" http: paths: - backend: serviceName: s3-eu-central-1 servicePort: 443 path: /(.*)Running the controller i debug mode (v: 5), I noticed a stream of events related to the low TTL:
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 4., context: ngx.timer [lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 4., context: ngx.timer [lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer [lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 3., context: ngx.timer [lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 3., context: ngx.timer [lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.115] and ttl of 2., context: ngx.timer [lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer [lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.123] and ttl of 4., context: ngx.timer [lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timerWhat you expected to happen:
Some indication of the cause of the error
How to reproduce it:
Unable to forcefully reproduce it, comes and goes. Sometimes it is very noisy, even when there is _no_ traffic and no ingress changes:
$ kubectl --namespace nginx-cf logs nginx-cf-ingress-nginx-controller-59645b499b-jsnmq | grep "020/08/22 05:22" | wc -l 567Anything else we need to know:
controller: admissionWebhooks: enabled: false # not supported on k8s <1.16 ingressClass: nginx-cf extraArgs: v: 2 default-ssl-certificate: "nginx-cf/nginx-cf" config: keep-alive: "900" # use-forwarded-headers: "true" # $http_cf_ray is added for debugging log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id $http_cf_ray' proxy-real-ip-cidr: "X" server-snippet: | # X-Forwarded-For is http, meaning that the # request was over http and not https, # instruct the client to redirect to the https endpoint if ($http_x_forwarded_proto = "http") { return 301 https://$host$request_uri; } resources: limits: memory: 3Gi requests: cpu: 100m memory: 256Mi service: enableHttp: false loadBalancerSourceRanges: - X type: "LoadBalancer" externalTrafficPolicy: "Local" autoscaling: enabled: true minReplicas: 1 maxReplicas: 3 targetCPUUtilizationPercentage: 500 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app.kubernetes.io/instance operator: In values: - nginx-cf # Must must releaseName in helm topologyKey: "failure-domain.beta.kubernetes.io/zone" requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app.kubernetes.io/instance operator: In values: - nginx-cf # Must must releaseName in helm topologyKey: "kubernetes.io/hostname" defaultBackend: enabled: true replicaCount: 1 maxmindLicenseKey: X/kind support
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
2020/08/22 06:21:40 [error] 36#36: *789345 lua udp socket read timed out, context: ngx.timer
This is not an ingress-nginx error perse but an indication you have some networking issue with DNS requests.
During no-traffic and no ingress changes, the controller emits the following errors:
By using service type=ExternalName ingress-nginx needs to check for changes periodically, using the ttl from the DNS response
Thanks for the information. Could I ask where the udp socket error is originating from in the lua code ? I tried looking around but wasn't able to find out where it originates from.
As you said, DNS issues. Enabled DNS caching in GKE (https://cloud.google.com/kubernetes-engine/docs/how-to/nodelocal-dns-cache) and no more udp issues.
@toredash glad you find out the reason