Eksctl: Node-local DNS cache support

Created on 14 Feb 2019  Â·  14Comments  Â·  Source: weaveworks/eksctl

TL;DR; I'd like to add a new configuration key for nodegroups to specify the value for the kubelet --cluster-dns flag.

Why do you want this feature?

Currently both internal and external DNS lookups are processed by the CoreDNS service deployed by EKS. This isn't ideal from a reliability perspective.

An internal lookup is for resolving cluster-internal names of Kubernetes services. It may fail in various conditions, like:

  • kube-proxy failed on either (1) the node on which your pod sent the DNS query is running or (2) the node on which the CoreDNS pods are running
  • kube-dns(CoreDNS) pod failed
  • Kubernetes endpoint-controller failed
  • Other node-to-node communication failure(EC2 or VPC issue)

An external lookup is for resolving names managed external to the cluster, like RDS DB clusters/instances, SQS endpoints, S3, your own services served via Route 53. It may fail in conditions like:

  • All the above conditions
  • The Amazon DNS in your VPC failed
  • Route 53 failed(Applied in a case that the name is served by it)

What feature/behavior/change do you want?

The only thing I want to propose for eksctl is - Add an ability to override the value for the --cluster-dns flag passed to the kubelet.

It allows us to deploy a node-local dns cache, that is proposed to resolve this issue. You can find the upstream proposal at https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md. It is implemented in https://github.com/kubernetes/kubernetes/pull/70555.

Seeing how it works, and the implementation, only thing we can't do it outside of eksctl is passing a "magic" IP address that forwards any DNS lookups to the node-local cache, to kubelet's --cluster-dns https://github.com/kubernetes/kubernetes/blob/a3877b1776cc55f5a32103d7a072a73e18c3d939/hack/local-up-cluster.sh#L704.

As we're leaning towards exposing new configurations only to config files, I would be glad if it adds a new field named e.g. clusterDNS. For example, it will be specified like:

nodeGroups:
- name: nodegroup1
  clusterDNS: 169.254.20.10
  # snip
areadd-ons areconfig-file arenodegroup

Most helpful comment

Ah, I am sorry, I missed that you already did this. Quoting your commit message below:

Add a new field named clusterDNS that accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the --cluster-dns flag of kubelet.

nodeGroups:
- name: nodegroup1
  clusterDNS: 169.254.20.10
  # snip

This, in combination with k8s-dns-node-cache deployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability.

The configuration key clusterDNS is intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload.
It would also be nice to add clusterDNS as a cluster-level config key later. But I believe it isn't a must-have in this change.

See the cluster/addons/dns/nodelocaldns in the upstream repository for more details.

A concrete steps to enable node-local DNS would look like the below:

  • Decide the which IP addr to be used for binding the node-local DNS. Typically this is 169.254.20.10
  • Add clusterDNS: 169.254.20.10 to your nodegroup in the cluster config
  • Deploy nodelocaldns.yaml, replacing:
    __PILLAR__LOCAL__DNS__ with 169.254.20.10, __PILLAR__DNS__DOMAIN__ with cluster.local, __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 according to your VPC CIDR
    See local-up-cluster.sh:
  sed -i -e "s/__PILLAR__DNS__DOMAIN__/${KUBE_DNS_NAME:-cluster.local}/g" nodelocaldns.yaml
  sed -i -e "s/__PILLAR__DNS__SERVER__/${KUBE_DNS_SERVER_IP:-10.0.0.10}/g" nodelocaldns.yaml
  sed -i -e "s/__PILLAR__LOCAL__DNS__/${KUBE_LOCAL_DNS_IP:-169.254.20.10}/g" nodelocaldns.yaml

All 14 comments

Btw, I believe the effectiveness of the node-local DNS cache, as I've been using it for more than a year in kube-aws https://github.com/kubernetes-incubator/kube-aws/pull/792.

Implementation-wise, kube-aws uses dnsmasq daemonset as a node-local cache, and the upstream uses CoreDNS daemonset instead, but for the same purpose.

@mumoshu What are your thoughts on the nodelocaldns addon? https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/nodelocaldns

Ah, I am sorry, I missed that you already did this. Quoting your commit message below:

Add a new field named clusterDNS that accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the --cluster-dns flag of kubelet.

nodeGroups:
- name: nodegroup1
  clusterDNS: 169.254.20.10
  # snip

This, in combination with k8s-dns-node-cache deployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability.

The configuration key clusterDNS is intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload.
It would also be nice to add clusterDNS as a cluster-level config key later. But I believe it isn't a must-have in this change.

See the cluster/addons/dns/nodelocaldns in the upstream repository for more details.

A concrete steps to enable node-local DNS would look like the below:

  • Decide the which IP addr to be used for binding the node-local DNS. Typically this is 169.254.20.10
  • Add clusterDNS: 169.254.20.10 to your nodegroup in the cluster config
  • Deploy nodelocaldns.yaml, replacing:
    __PILLAR__LOCAL__DNS__ with 169.254.20.10, __PILLAR__DNS__DOMAIN__ with cluster.local, __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 according to your VPC CIDR
    See local-up-cluster.sh:
  sed -i -e "s/__PILLAR__DNS__DOMAIN__/${KUBE_DNS_NAME:-cluster.local}/g" nodelocaldns.yaml
  sed -i -e "s/__PILLAR__DNS__SERVER__/${KUBE_DNS_SERVER_IP:-10.0.0.10}/g" nodelocaldns.yaml
  sed -i -e "s/__PILLAR__LOCAL__DNS__/${KUBE_LOCAL_DNS_IP:-169.254.20.10}/g" nodelocaldns.yaml

I apologize for resurrecting this thread, but the nodelocaldns.yaml file actually has 5 variables which, and I'm confused by the difference between DNS__SERVER and CLUSTER__DNS

__PILLAR__DNS__DOMAIN__ == cluster.local
__PILLAR__DNS__SERVER__ == ??
__PILLAR__LOCAL__DNS__ == 169.254.20.10
__PILLAR__CLUSTER__DNS__ == <ClusterIP of Kube/CoreDNS service, e.g 172.20.0.10>
__PILLAR__UPSTREAM__SERVERS__ == /etc/resolv.conf

@ghostsquad according to https://github.com/kubernetes/kubernetes/pull/84383

We have the following variables in the yaml:
__PILLAR__DNS__SERVER__ - set to kube-dns service IP.
__PILLAR__LOCAL__DNS__ - set to the link-local IP(169.254.20.10 by default).
__PILLAR__DNS__DOMAIN__ - set to the cluster domain(cluster.local by default).

The following variables will be set by the node-cache images - k8s.gcr.io/k8s-dns-node-cache:1.15.6 or later.
The values will be determined by reading the kube-dns configMap for custom
Upstream server configuration.
__PILLAR__CLUSTER__DNS__ - Upstream server for in-cluster queries.
__PILLAR__UPSTREAM__SERVERS__ - Upstream servers for external queries.

Also, we are making the listen ip address for the nodelocaldns cache to be both the kube-dns service ip and the link-local ip. So sending requests on either ip will get a response from the cache instance. If we use kube-dns service ip as the listen ip for the cache, we need a different ip for the cache to talk to kube-dns/coreDNS in case of cache misses. That is why we introduced "__PILLAR__CLUSTER__DNS__" A new service will be created with the same selectors as kube-dns. The clusterIP of this service will be filled in as "__PILLAR__CLUSTER__DNS__".

Using kube-dns service ip as the listen ip for the cache will not work in IPVS clusters. This is because IPVS creates its own interface and binds all the service IPs there. node-local-dns interface will not be able to bind that ip again.

@prameshj

Also, we are making the listen ip address for the nodelocaldns cache to be both the kube-dns service ip and the link-local ip. So sending requests on either ip will get a response from the cache instance. If we use kube-dns service ip as the listen ip for the cache, we need a different ip for the cache to talk to kube-dns/coreDNS in case of cache misses. That is why we introduced "PILLAR__CLUSTER__DNS" A new service will be created with the same selectors as kube-dns. The clusterIP of this service will be filled in as "PILLAR__CLUSTER__DNS".

Could you elaborate more this? Is it safe to set in both of them (__PILLAR__CLUSTER__DNS__, __PILLAR__UPSTREAM__SERVERS__) 172.20.0.10?

PILLAR__CLUSTER__DNS, PILLAR__UPSTREAM__SERVERS will be set by the node-cache image. It uses this config map yaml, substitutes these 2 variables and generates the corefile to use. We don't want both these set to kube-dns service IP. Is there a reason you want to set them to this value ?

Thanks. No I wasn't sure if it can be left as is for my use case (aws eks), because the existing documentation is out of date and uses older version of nodelocaldns manifest, without the stub implementation.

@prameshj

PILLAR__CLUSTER__DNS, PILLAR__UPSTREAM__SERVERS will be set by the node-cache image. It uses this config map yaml, substitutes these 2 variables and generates the corefile to use. We don't want both these set to kube-dns service IP. Is there a reason you want to set them to this value ?

Unless using ipvs, correct? The docs say if you use ipvs you need to set the PILLAR__CLUSTER__DNS ?
Is that still the case? (also using eks, but getting crashing - [FATAL] Error parsing flags - Invalid localip specified - "", Exiting).. thanks.

Could you reference the docs that say this? PILLAR__CLUSTER__DNS should be set to kube-dns service IP in IPVS mode.
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/#configuration does say to do this:

If kube-proxy is running in IPVS mode:

 sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/__PILLAR__DNS__SERVER__//g; s/__PILLAR__CLUSTER__DNS__/$kubedns/g" nodelocaldns.yaml
In this mode, node-local-dns pods listen only on <node-local-address>. 
Was this page helpful?
0 / 5 - 0 ratings