Containers-roadmap: [EKS] [request]: Nodelocal DNS Cache

Created on 22 May 2019  路  24Comments  路  Source: aws/containers-roadmap

Tell us about your request
I would like an officially documented and supported method for installing the Kubernetes Node Local DNS Cache Addon.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Kubernetes clusters with a high request rate often experience high rates of failed DNS lookups. For example this affects us when using the AWS SDKs particularly with Alpine / musl-libc containers.

The _Nodelocal DNS Cache_ aims to resolve this (together with kernel patches in 5.1 to fix a conntrack race condition).

Nodelocal DNS Addon

Kubeadm is aiming to support Nodelocal-dns-cache in 1.15. k/k #70707

Are you currently working around this issue?
Retrying requests at the application level which fail due to DNS errors.

Additional context
Kubernetes DNS Issues include:

  • Linux Kernel bug in netfilter conntrack (fixed in kernel 5.1) [1][2]
  • Exacerbated by musl-libc behaviour of issuing parallel queries, which is widely used in Alpine Docker containers. musl-libc does not respect the resolv.conf single-request option and it appears this will not be changed [2][3]
  • The AWS EC2 limit of 1024 Packets Per Second Per ENI [5]

Attachments
[0] https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#retryDelayOptions-property
[1] https://lkml.org/lkml/2019/2/28/707
[2] https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/
[3] https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
[4] https://www.openwall.com/lists/musl/2015/10/22/15
[5] https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-limits

EKS Proposed

Most helpful comment

After way too much time spent on this I think I got this working. I'd love a review from somebody who knows what they're doing tho 馃槄 I promise I'll create a pretty blog post or a PR to the GitHub AWS EKS docs.

So, the yaml in k/k/cluster/addons/dns/nodelocaldns cannot just be applied. They need a couple values replaced first:

  • __PILLAR__DNS__DOMAIN__
  • __PILLAR__LOCAL__DNS__
  • __PILLAR__DNS__SERVER__

Not replacing those will lead to the lovely CrashLoopBackOff and much confusion.

Now the question comes up: what to replace those values with? After way too much wasted time I found out that the amazing eksctl already supports Node-Local DNS caches! They do have a very nice PR with a description showing what to replace those values with https://github.com/weaveworks/eksctl/pull/550. TL;DR:

  • __PILLAR__DNS__DOMAIN__ with cluster.local as per amazon-eks-ami's kubelet-config.json
  • __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 depending on your VPC CIDR( yes, really -- check out this awesome if in amazon-eks-ami). Or you could just do a kubectl -n kube-system get service kube-dns and check the cluster IP in there
  • __PILLAR__LOCAL__DNS__ with 169.254.20.10 which is like the default address that the nodelocal DNS will bind on each node

Applying the yaml will work then!

Buuut using netshoot and running kubectl run tmp-shell-no-host-net --generator=run-pod/v1 --rm -i --tty --image nicolaka/netshoot -- /bin/bash and a dig example.com will show that the nodelocal cache is not used.

Running kubectl run tmp-shell-host --generator=run-pod/v1 --rm -i --tty --overrides='{"spec": {"hostNetwork": true}}' --image nicolaka/netshoot -- /bin/bash and a netstat -lntp showed the 169.254.20.10:53 correctly bounded.

The cluster also needs to be changed to have kubelet point to the nodelocal DNS. The _Add clusterDNS: 169.254.20.10 to your nodegroup in the cluster config_ from the eksctl PR linked above.

Unfortunately I was using the Terrafrom community EKS module so this was not as simple. After some research it actually is pretty simple: just add --cluster-dns=169.254.20.10 to kubelet_extra_args which for me led to kubelet_extra_args = "--node-labels=kubernetes.io/lifecycle=spot,nodegroup=mygroup --cluster-dns=169.254.20.10".

Changes got applied, all existing nodes were manually terminated, new nodes came up. Redoing the above checks shows nodelocal is indeed used! 馃帀


Now, all that said, I don't know much about networking. Does the above look sane? Can this be run in production? This comment confirms it to be safe in 1.12 even( I highly recommend reading the whole discussion there).

All 24 comments

After way too much time spent on this I think I got this working. I'd love a review from somebody who knows what they're doing tho 馃槄 I promise I'll create a pretty blog post or a PR to the GitHub AWS EKS docs.

So, the yaml in k/k/cluster/addons/dns/nodelocaldns cannot just be applied. They need a couple values replaced first:

  • __PILLAR__DNS__DOMAIN__
  • __PILLAR__LOCAL__DNS__
  • __PILLAR__DNS__SERVER__

Not replacing those will lead to the lovely CrashLoopBackOff and much confusion.

Now the question comes up: what to replace those values with? After way too much wasted time I found out that the amazing eksctl already supports Node-Local DNS caches! They do have a very nice PR with a description showing what to replace those values with https://github.com/weaveworks/eksctl/pull/550. TL;DR:

  • __PILLAR__DNS__DOMAIN__ with cluster.local as per amazon-eks-ami's kubelet-config.json
  • __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 depending on your VPC CIDR( yes, really -- check out this awesome if in amazon-eks-ami). Or you could just do a kubectl -n kube-system get service kube-dns and check the cluster IP in there
  • __PILLAR__LOCAL__DNS__ with 169.254.20.10 which is like the default address that the nodelocal DNS will bind on each node

Applying the yaml will work then!

Buuut using netshoot and running kubectl run tmp-shell-no-host-net --generator=run-pod/v1 --rm -i --tty --image nicolaka/netshoot -- /bin/bash and a dig example.com will show that the nodelocal cache is not used.

Running kubectl run tmp-shell-host --generator=run-pod/v1 --rm -i --tty --overrides='{"spec": {"hostNetwork": true}}' --image nicolaka/netshoot -- /bin/bash and a netstat -lntp showed the 169.254.20.10:53 correctly bounded.

The cluster also needs to be changed to have kubelet point to the nodelocal DNS. The _Add clusterDNS: 169.254.20.10 to your nodegroup in the cluster config_ from the eksctl PR linked above.

Unfortunately I was using the Terrafrom community EKS module so this was not as simple. After some research it actually is pretty simple: just add --cluster-dns=169.254.20.10 to kubelet_extra_args which for me led to kubelet_extra_args = "--node-labels=kubernetes.io/lifecycle=spot,nodegroup=mygroup --cluster-dns=169.254.20.10".

Changes got applied, all existing nodes were manually terminated, new nodes came up. Redoing the above checks shows nodelocal is indeed used! 馃帀


Now, all that said, I don't know much about networking. Does the above look sane? Can this be run in production? This comment confirms it to be safe in 1.12 even( I highly recommend reading the whole discussion there).

I would also be interested in this feature being fleshed out/supported with EKS

nodelocaldns.yaml file actually has 5 variables which, and I'm confused by the difference between DNS__SERVER and CLUSTER__DNS

__PILLAR__DNS__DOMAIN__ == cluster.local
__PILLAR__DNS__SERVER__ == ??
__PILLAR__LOCAL__DNS__ == 169.254.20.10
__PILLAR__CLUSTER__DNS__ == <ClusterIP of Kube/CoreDNS service, e.g 172.20.0.10>
__PILLAR__UPSTREAM__SERVERS__ == /etc/resolv.conf

@ghostsquad that's my bad as I linked to the master version of _k/k/cluster/addons/dns/nodelocaldns_. I edited the link now to use the 1.16 version.

In master there's currently work happening for a new and improved version of NodeLocal DNS hence the new variables. As far as I know( and there is a high chance I am wrong) that's a work in progress and not yet ready/ released.

Thank you for the response!

I just got this set up myself and it's "working" great -- meaning, DNS requests are going to 169.254.20.10 and getting answers. But I'm not sure if I'm seeing a resolution to the conntrack problems... I still see insert_failed being incremented and latency doesn't seem to be down for repeat requests.

After way too much time spent on this I think I got this working. I'd love a review from somebody who knows what they're doing tho 馃槄 I promise I'll create a pretty blog post or a PR to the GitHub AWS EKS docs.

As promised, blog post about this is up on the AWS Containers Blog: EKS DNS at scale and spikeiness!

It's basically my first post here with more details and helpful debugging hints.

EKS DNS at scale and spikeiness!

Was this blog post removed?

Anyone still have this?

EKS DNS at scale and spikeiness!

Was this blog post removed?

Anyone still have this?

Yes the blog post is not viewable anymore for me too

Hi everyone - you can find instructions for installing the node-local DNS cache here: https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/

Anyone still have this?

A copy of the blog can be found at: https://www.vladionescu.me/posts/eks-dns.html

Hi everyone - you can find instructions for installing the node-local DNS cache here: https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/

@otterley would you use the instructions you pointed out where the link to set up of the resources of Local DNS cache in Kubernetes leads to this file in the Master Branch : https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

Or from the eks-dns post:

Anyone still have this?

A copy of the blog can be found at: https://www.vladionescu.me/posts/eks-dns.html

Which leads to a different Branch to set up the resources of LocalDNS Cache:
https://github.com/kubernetes/kubernetes/blob/release-1.15/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

We are currently running EKS version 1.14. (There is no problem to upgrade to 1.15 if needed)

Addons are not that tied to the k8s version.

IIRC for NodeLocal DNS there is a pre-1.16 version( which requires kubelet changes) and a post-1.16 version( which requires no kubelet changes). Very high chances I am wrong on this as I haven't kept up to date with the changes.

Hi, is there a way to use NodeLocal DNS cache with EKS managed node group? In that case it is impossible to set clusterDNS for a node group.

After way too much time spent on this I think I got this working. I'd love a review from somebody who knows what they're doing tho 馃槄 I promise I'll create a pretty blog post or a PR to the GitHub AWS EKS docs.

So, the yaml in k/k/cluster/addons/dns/nodelocaldns cannot just be applied. They need a couple values replaced first:

* `__PILLAR__DNS__DOMAIN__`

* `__PILLAR__LOCAL__DNS__`

* `__PILLAR__DNS__SERVER__`

Not replacing those will lead to the lovely CrashLoopBackOff and much confusion.

Now the question comes up: what to replace those values with? After way too much wasted time I found out that the amazing eksctl already supports Node-Local DNS caches! They do have a very nice PR with a description showing what to replace those values with weaveworks/eksctl#550. TL;DR:

* `__PILLAR__DNS__DOMAIN__` with `cluster.local` as per [amazon-eks-ami's kubelet-config.json](https://github.com/awslabs/amazon-eks-ami/blob/28845f97c05dacaf699a102faa690a4238b79f02/files/kubelet-config.json#L24)

* `__PILLAR__DNS__SERVER__` with ` 10.100.0.10` or `172.20.0.10` depending on your VPC CIDR( yes, really -- check out this awesome `if` in [amazon-eks-ami](https://github.com/awslabs/amazon-eks-ami/blob/ca61cc2bb6ef6fe982cc71ede7552a4a2c6b93e9/files/bootstrap.sh#L167-L170)). Or you could just do a `kubectl -n kube-system get service kube-dns` and check the cluster IP in there

* `__PILLAR__LOCAL__DNS__` with `169.254.20.10` which is like the default address that the nodelocal DNS will bind on each node

Applying the yaml will work then!

Buuut using netshoot and running kubectl run tmp-shell-no-host-net --generator=run-pod/v1 --rm -i --tty --image nicolaka/netshoot -- /bin/bash and a dig example.com will show that the nodelocal cache is not used.

Running kubectl run tmp-shell-host --generator=run-pod/v1 --rm -i --tty --overrides='{"spec": {"hostNetwork": true}}' --image nicolaka/netshoot -- /bin/bash and a netstat -lntp showed the 169.254.20.10:53 correctly bounded.

The cluster also needs to be changed to have kubelet point to the nodelocal DNS. The _Add clusterDNS: 169.254.20.10 to your nodegroup in the cluster config_ from the eksctl PR linked above.

Unfortunately I was using the Terrafrom community EKS module so this was not as simple. After some research it actually is pretty simple: just add --cluster-dns=169.254.20.10 to kubelet_extra_args which for me led to kubelet_extra_args = "--node-labels=kubernetes.io/lifecycle=spot,nodegroup=mygroup --cluster-dns=169.254.20.10".

Changes got applied, all existing nodes were manually terminated, new nodes came up. Redoing the above checks shows nodelocal is indeed used! 馃帀

Now, all that said, I don't know much about networking. Does the above look sane? Can this be run in production? This comment confirms it to be safe in 1.12 even( I highly recommend reading the whole discussion there).

I followed the instructions closely but running netshoot and dig example.com I still see 172.20.0.10. nodelocaldns pods are running without crashing and the logs are not showing any errors:

2020/08/05 13:33:12 2020-08-05T13:33:12.734Z [INFO] Setting up networking for node cache

cluster.local.:53 on 169.254.20.10

in-addr.arpa.:53 on 169.254.20.10

ip6.arpa.:53 on 169.254.20.10

.:53 on 169.254.20.10

2020-08-05T13:33:12.762Z [INFO] CoreDNS-1.2.6

2020-08-05T13:33:12.762Z [INFO] linux/amd64, go1.11.10,

CoreDNS-1.2.6

linux/amd64, go1.11.10

I am running EKS 1.14 and using TF to control this cluster. I am using kubelet_extra_args = "--cluster-dns=169.254.20.10" in my worker_groups_launch_template_mixed.

Any advice would be appreciated.

@aimanparvaiz hm... That's odd. Let's try to debug it.

Since the pod is still using 172.20.0.10, that means that the NodeLocalDNS "override" in the normal flow is not there. That makes me think that the kubelet is telling the pod to use 172.20.0.10 instead of 169.254.20.10.

  1. What version of NodeLocalDNS are you using, both container image + the version of yaml you applied?

    I know the master branch on k/k has a newer version. I haven't played with that(yet) and there may be differences in setup. The above instructions are for the yamls in the release-1.15 and release-1.16 branches( they're identical).

  2. Is that a new node? If you had a pre-existing EC2 and then ran a Terraform apply setting the kubelet_extra_args = "--cluster-dns=169.254.20.10", the settings may apply just to new EC2 instances --- but that depends a lot on how you manage your nodes.

  3. Is NodeLocalDNS bound on that host node? As per my blog post, if you run a netstat -lntp do you see a line with 169.254.20.10:53?

    kubectl run tmp-shell-host --generator=run-pod/v1 \
    --rm -it \
    --overrides='{"spec": {"hostNetwork": true}}' \
    --image nicolaka/netshoot -- /bin/bash
    
    # and then the expected output:
    
    netstat -lntp
    
      ...
      tcp   0   0   169.254.20.10:53   0.0.0.0:*   LISTEN   -
      ...
    

@aimanparvaiz hm... That's odd. Let's try to debug it.

Since the pod is still using 172.20.0.10, that means that the NodeLocalDNS "override" in the normal flow is not there. That makes me think that the kubelet is telling the pod to use 172.20.0.10 instead of 169.254.20.10.

1. What version of NodeLocalDNS are you using, both container image + the version of yaml you applied?
   I know the `master` branch on `k/k` has a newer version. I haven't played with that(yet) and there may be differences in setup. The above instructions are for the `yaml`s in the `release-1.15` and `release-1.16` branches( they're identical).

2. Is that a new node? If you had a pre-existing EC2 and then ran a Terraform apply setting the `kubelet_extra_args = "--cluster-dns=169.254.20.10"`, the settings may apply just to new EC2 instances --- but that depends a lot on how you manage your nodes.

3. Is NodeLocalDNS bound on that host node? As per [my blog post](https://www.vladionescu.me/posts/eks-dns.html), if you run a `netstat -lntp` do you see a line with `169.254.20.10:53`?
kubectl run tmp-shell-host --generator=run-pod/v1 \
  --rm -it \
  --overrides='{"spec": {"hostNetwork": true}}' \
  --image nicolaka/netshoot -- /bin/bash

# and then the expected output:

netstat -lntp

    ...
    tcp   0   0   169.254.20.10:53   0.0.0.0:*   LISTEN   -
    ...

@Vlaaaaaaad thanks for responding. I am using image: k8s.gcr.io/k8s-dns-node-cache:1.15.3 and I got yaml from master branch. (this might be the issue)

This is a new node, I updated TF and manually removed the older nodes. I am using this to specify new node

kubectl run --overrides='{"apiVersion": "v1", "spec": {"nodeSelector": { "kubernetes.io/hostname": "ip-A-B-C-D.region.compute.internal" }}}' tmp-shell-no-host-net --generator=run-pod/v1 \
        --rm -it \
        --image nicolaka/netshoot -- /bin/bash

On this same node localdns is bound correctly. Used the same override flag to specify host. I do see:
tcp 0 0 169.254.20.10:53 0.0.0.0:* LISTEN -
on that node.

@aimanparvaiz hm... That's odd. Let's try to debug it.
Since the pod is still using 172.20.0.10, that means that the NodeLocalDNS "override" in the normal flow is not there. That makes me think that the kubelet is telling the pod to use 172.20.0.10 instead of 169.254.20.10.

1. What version of NodeLocalDNS are you using, both container image + the version of yaml you applied?
   I know the `master` branch on `k/k` has a newer version. I haven't played with that(yet) and there may be differences in setup. The above instructions are for the `yaml`s in the `release-1.15` and `release-1.16` branches( they're identical).

2. Is that a new node? If you had a pre-existing EC2 and then ran a Terraform apply setting the `kubelet_extra_args = "--cluster-dns=169.254.20.10"`, the settings may apply just to new EC2 instances --- but that depends a lot on how you manage your nodes.

3. Is NodeLocalDNS bound on that host node? As per [my blog post](https://www.vladionescu.me/posts/eks-dns.html), if you run a `netstat -lntp` do you see a line with `169.254.20.10:53`?
kubectl run tmp-shell-host --generator=run-pod/v1 \
  --rm -it \
  --overrides='{"spec": {"hostNetwork": true}}' \
  --image nicolaka/netshoot -- /bin/bash

# and then the expected output:

netstat -lntp

    ...
    tcp   0   0   169.254.20.10:53   0.0.0.0:*   LISTEN   -
    ...

@Vlaaaaaaad thanks for responding. I am using image: k8s.gcr.io/k8s-dns-node-cache:1.15.3 and I got yaml from master branch. (this might be the issue)

This is a new node, I updated TF and manually removed the older nodes. I am using this to specify new node

kubectl run --overrides='{"apiVersion": "v1", "spec": {"nodeSelector": { "kubernetes.io/hostname": "ip-A-B-C-D.region.compute.internal" }}}' tmp-shell-no-host-net --generator=run-pod/v1 \
        --rm -it \
        --image nicolaka/netshoot -- /bin/bash

On this same node localdns is bound correctly. Used the same override flag to specify host. I do see:
tcp 0 0 169.254.20.10:53 0.0.0.0:* LISTEN -
on that node.

I grabbed yaml from release-1.15, unless I need to refresh nodes again, I am still seeing the same behavior.

@aimanparvaiz did you find the root cause after all? I remember this moving to Slack, but no conclusion. Maybe your solution will help other people too 馃檪

@aimanparvaiz did you find the root cause after all? I remember this moving to Slack, but no conclusion. Maybe your solution will help other people too 馃檪

@Vlaaaaaaad I am not sure if I can safely say that I found the root cause. I deployed the latest version of Nodelocal DNS Cache, swapped out eks nodes with newer ones and the errors stopped. Thanks for all your help along with Chance Zibolski. Here is the link to complete slack conversation if anyone is interested: https://kubernetes.slack.com/archives/C8SH2GSL9/p1596646078276000.

I'm using EKS's kubernetes 1.17, and I don't quite understand whether I can use the nodelocaldns yaml file from the master branch, or do I have to take the one from the release-1.17 branch.
this is the diff between the two:

$ diff nodelocaldns-1.17.yaml nodelocaldns-master.yaml 
100,102c100
<         forward . __PILLAR__UPSTREAM__SERVERS__ {
<                 force_tcp
<         }
---
>         forward . __PILLAR__UPSTREAM__SERVERS__
124,125c122,126
<        labels:
<           k8s-app: node-local-dns
---
>       labels:
>         k8s-app: node-local-dns
>       annotations:
>         prometheus.io/port: "9253"
>         prometheus.io/scrape: "true"
133a135,138
>       - effect: "NoExecute"
>         operator: "Exists"
>       - effect: "NoSchedule"
>         operator: "Exists"
136c141
<         image: k8s.gcr.io/k8s-dns-node-cache:1.15.7
---
>         image: k8s.gcr.io/dns/k8s-dns-node-cache:1.15.14

@Vlaaaaaaad any chance you know this ^^ ?

Hey @dorongutman! Apologies, I am rather busy with some personal projects and I forgot to answer this 馃槥

My blog post is in desperate need of an update, and right now I lack the bandwidth for that. I hope I'll get to it before the end of the year, but we'll see.

Hm... based on the updated NodeLocalDNS docs there are only a couple of variables that need changing. The other variables are replaced by NodeLocalDNS when it starts. Not at all confusing 馃槃
The ones that need changing seem to be the same ones as in my first comment on this issue:

  • __PILLAR__DNS__DOMAIN__ with cluster.local
  • __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 AKA the output of kubectl -n kube-system get service kube-dns
  • __PILLAR__LOCAL__DNS__ with 169.254.20.10

There also seems to be no need to set --cluster-dns anymore as NodeLocalDNS discovers the address dynamically and changes the node DNS config.

As I said, I've got no bandwidth to actually test the latest NodeLocalDNS --- this comment is just a bunch of assumptions from my side. If any of y'all has the time to test it and blog about it, I can help review!

So I got NodeLocalDNS working but I needed coredns to serve as a backup.
Adding to eksctl config to get multiple nameserver entries in /etc/resolv.conf:

       kubeletExtraConfig:
          clusterDNS: ["169.254.20.10","10.100.0.10"]

169.254.20.10 is NodeLocalDNS
10.100.0.10 is coredns svc IP

That works but when I tested failover by spinning down NodeLocalDNS pods, nothing get resolved. I expected that it would lookup 10.100.0.10 but nothing is showing up.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tabern picture tabern  路  3Comments

yinshiua picture yinshiua  路  3Comments

abby-fuller picture abby-fuller  路  3Comments

clareliguori picture clareliguori  路  3Comments

ORESoftware picture ORESoftware  路  3Comments