Ingress-nginx: lua udp socket read timed out, context: ngx.timer

Created on 22 Aug 2020 · 5Comments · Source: kubernetes/ingress-nginx

During no traffic and no ingress changes in the cluster, controller starts to emit lua socket errors:
2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timer

I'm wondering if this is caused by frequent DNS lookups on upstreams that have low TTL.

NGINX Ingress controller version:

NGINX Ingress controller
  Release:       v0.34.1
  Build:         v20200715-ingress-nginx-2.11.0-8-gda5fa45e2
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.1

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.2", GitCommit:"fb7add51f767aae42655d39972210dc1c5dbd4b3", GitTreeState:"clean", BuildDate:"2020-06-01T22:20:10Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:
GKE - 1.15.12-gke.2 (zonal master cluster)
OS (e.g. from /etc/os-release):
GKE COS
Kernel (e.g. uname -a):
Install tools:
helm
Others:

What happened:

During no-traffic and no ingress changes, the controller emits the following errors:

2020/08/22 06:21:40 [error] 36#36: *789345 lua udp socket read timed out, context: ngx.timer
2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timer
X - - [22/Aug/2020:06:26:07 +0000] "GET /ping HTTP/1.1" 200 2 "-" "curl/7.64.1" 423 0.003 [X] [] 10.40.7.4:8888 2 0.003 200 X X

It does not seem to affect traffic, as traffic passes through the controller.

Seen this reported in #5780 as well.

I have a hunch that this is related to one Ingress using ExternalName, where the TTL for the DNS record is low, and causes frequent DNS queries for the controller(?).

Relevant Ingress and Service definition:

apiVersion: v1
kind: Service
metadata:
  name: s3-eu-central-1
  namespace: nginx-cf
spec:
  externalName: s3.eu-central-1.amazonaws.com
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  sessionAffinity: None
  type: ExternalName
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx-cf
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    nginx.ingress.kubernetes.io/upstream-vhost: X.s3.amazonaws.com
    nginx.ingress.kubernetes.io/configuration-snippet: |
      rewrite ^/$ /index.html break;
    nginx.ingress.kubernetes.io/rewrite-target: /$1
  name: assets
  namespace: nginx-cf
spec:
  rules:
  - host: "X"
    http:
      paths:
      - backend:
          serviceName: s3-eu-central-1
          servicePort: 443
        path: /(.*)

Running the controller i debug mode (v: 5), I noticed a stream of events related to the low TTL:

[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 3., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 3., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.115] and ttl of 2., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.123] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer

What you expected to happen:

Some indication of the cause of the error

How to reproduce it:

Unable to forcefully reproduce it, comes and goes. Sometimes it is very noisy, even when there is _no_ traffic and no ingress changes:

$ kubectl --namespace nginx-cf logs nginx-cf-ingress-nginx-controller-59645b499b-jsnmq | grep "020/08/22 05:22" | wc -l
     567

Anything else we need to know:

controller:
  admissionWebhooks:
    enabled: false # not supported on k8s <1.16

  ingressClass: nginx-cf

  extraArgs:
    v: 2
    default-ssl-certificate: "nginx-cf/nginx-cf"

  config:
    keep-alive: "900" # 
    use-forwarded-headers: "true" 

    # $http_cf_ray is added for debugging
    log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id $http_cf_ray'

    proxy-real-ip-cidr: "X"
    server-snippet: |
      # X-Forwarded-For is http, meaning that the
      # request was over http and not https,
      # instruct the client to redirect to the https endpoint
      if ($http_x_forwarded_proto = "http") {
        return 301 https://$host$request_uri;
      }
  resources:
    limits:
      memory: 3Gi
    requests:
      cpu: 100m
      memory: 256Mi
  service:
    enableHttp: false
    loadBalancerSourceRanges: 
    - X
    type: "LoadBalancer"
    externalTrafficPolicy: "Local"

  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 3
    targetCPUUtilizationPercentage: 500
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/instance
              operator: In
              values:
              - nginx-cf # Must must releaseName in helm
          topologyKey: "failure-domain.beta.kubernetes.io/zone"
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/instance
            operator: In
            values:
            - nginx-cf # Must must releaseName in helm
        topologyKey: "kubernetes.io/hostname"
defaultBackend:
  enabled: true
  replicaCount: 1

maxmindLicenseKey: X

/kind support

kinbug

Source

toredash

All 5 comments

@toredash: The label(s) kind/support cannot be applied, because the repository doesn't have them

In response to this:

During no traffic and no ingress changes in the cluster, controller starts to emit lua socket errors:
2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timer

I'm wondering if this is caused by frequent DNS lookups on upstreams that have low TTL.

NGINX Ingress controller version:

NGINX Ingress controller
 Release:       v0.34.1
 Build:         v20200715-ingress-nginx-2.11.0-8-gda5fa45e2
 Repository:    https://github.com/kubernetes/ingress-nginx
 nginx version: nginx/1.19.1

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.2", GitCommit:"fb7add51f767aae42655d39972210dc1c5dbd4b3", GitTreeState:"clean", BuildDate:"2020-06-01T22:20:10Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:
GKE - 1.15.12-gke.2 (zonal master cluster)
OS (e.g. from /etc/os-release):
GKE COS
Kernel (e.g. uname -a):
Install tools:
helm
Others:

What happened:

During no-traffic and no ingress changes, the controller emits the following errors:

2020/08/22 06:21:40 [error] 36#36: *789345 lua udp socket read timed out, context: ngx.timer
2020/08/22 06:23:06 [error] 39#39: *790583 lua udp socket read timed out, context: ngx.timer
X - - [22/Aug/2020:06:26:07 +0000] "GET /ping HTTP/1.1" 200 2 "-" "curl/7.64.1" 423 0.003 [X] [] 10.40.7.4:8888 2 0.003 200 X X

It does not seem to affect traffic, as traffic passes through the controller.

Seen this reported in #5780 as well.

I have a hunch that this is related to one Ingress using ExternalName, where the TTL for the DNS record is low, and causes frequent DNS queries for the controller(?).

Relevant Ingress and Service definition:

apiVersion: v1
kind: Service
metadata:
 name: s3-eu-central-1
 namespace: nginx-cf
spec:
 externalName: s3.eu-central-1.amazonaws.com
 ports:
 - name: https
   port: 443
   protocol: TCP
   targetPort: 443
 sessionAffinity: None
 type: ExternalName
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
 annotations:
   kubernetes.io/ingress.class: nginx-cf
   nginx.ingress.kubernetes.io/backend-protocol: HTTPS
   nginx.ingress.kubernetes.io/upstream-vhost: X.s3.amazonaws.com
   nginx.ingress.kubernetes.io/configuration-snippet: |
     rewrite ^/$ /index.html break;
   nginx.ingress.kubernetes.io/rewrite-target: /$1
 name: assets
 namespace: nginx-cf
spec:
 rules:
 - host: "X"
   http:
     paths:
     - backend:
         serviceName: s3-eu-central-1
         servicePort: 443
       path: /(.*)

Running the controller i debug mode (v: 5), I noticed a stream of events related to the low TTL:

[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.47.99] and ttl of 2., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.113.69] and ttl of 3., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.us-east-1.amazonaws.com' with value of [52.216.104.13] and ttl of 3., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend utils-static-assets-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.115] and ttl of 2., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer
[lua] dns.lua:32: cache_set(): cache set for 's3.eu-central-1.amazonaws.com' with value of [52.219.74.123] and ttl of 4., context: ngx.timer
[lua] resty.lua:27: sync(): [round_robin] nodes have changed for backend nginx-cf-s3-eu-central-1-443, context: ngx.timer

What you expected to happen:

Some indication of the cause of the error

How to reproduce it:

Unable to forcefully reproduce it, comes and goes. Sometimes it is very noisy, even when there is _no_ traffic and no ingress changes:

$ kubectl --namespace nginx-cf logs nginx-cf-ingress-nginx-controller-59645b499b-jsnmq | grep "020/08/22 05:22" | wc -l
    567

Anything else we need to know:

controller:
 admissionWebhooks:
   enabled: false # not supported on k8s <1.16

 ingressClass: nginx-cf

 extraArgs:
   v: 2
   default-ssl-certificate: "nginx-cf/nginx-cf"

 config:
   keep-alive: "900" # 
   use-forwarded-headers: "true" 

   # $http_cf_ray is added for debugging
   log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id $http_cf_ray'

   proxy-real-ip-cidr: "X"
   server-snippet: |
     # X-Forwarded-For is http, meaning that the
     # request was over http and not https,
     # instruct the client to redirect to the https endpoint
     if ($http_x_forwarded_proto = "http") {
       return 301 https://$host$request_uri;
     }
 resources:
   limits:
     memory: 3Gi
   requests:
     cpu: 100m
     memory: 256Mi
 service:
   enableHttp: false
   loadBalancerSourceRanges: 
   - X
   type: "LoadBalancer"
   externalTrafficPolicy: "Local"

 autoscaling:
   enabled: true
   minReplicas: 1
   maxReplicas: 3
   targetCPUUtilizationPercentage: 500
 affinity:
   podAntiAffinity:
     preferredDuringSchedulingIgnoredDuringExecution:
     - weight: 100
       podAffinityTerm:
         labelSelector:
           matchExpressions:
           - key: app.kubernetes.io/instance
             operator: In
             values:
             - nginx-cf # Must must releaseName in helm
         topologyKey: "failure-domain.beta.kubernetes.io/zone"
     requiredDuringSchedulingIgnoredDuringExecution:
     - labelSelector:
         matchExpressions:
         - key: app.kubernetes.io/instance
           operator: In
           values:
           - nginx-cf # Must must releaseName in helm
       topologyKey: "kubernetes.io/hostname"
defaultBackend:
 enabled: true
 replicaCount: 1

maxmindLicenseKey: X

/kind support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 22 Aug 2020

2020/08/22 06:21:40 [error] 36#36: *789345 lua udp socket read timed out, context: ngx.timer

This is not an ingress-nginx error perse but an indication you have some networking issue with DNS requests.

During no-traffic and no ingress changes, the controller emits the following errors:

By using service type=ExternalName ingress-nginx needs to check for changes periodically, using the ttl from the DNS response

aledbf on 26 Aug 2020

Thanks for the information. Could I ask where the udp socket error is originating from in the lua code ? I tried looking around but wasn't able to find out where it originates from.

toredash on 28 Aug 2020

As you said, DNS issues. Enabled DNS caching in GKE (https://cloud.google.com/kubernetes-engine/docs/how-to/nodelocal-dns-cache) and no more udp issues.

toredash on 8 Sep 2020

👍1

@toredash glad you find out the reason

aledbf on 8 Sep 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings