TL;DR; I'd like to add a new configuration key for nodegroups to specify the value for the kubelet --cluster-dns flag.
Why do you want this feature?
Currently both internal and external DNS lookups are processed by the CoreDNS service deployed by EKS. This isn't ideal from a reliability perspective.
An internal lookup is for resolving cluster-internal names of Kubernetes services. It may fail in various conditions, like:
An external lookup is for resolving names managed external to the cluster, like RDS DB clusters/instances, SQS endpoints, S3, your own services served via Route 53. It may fail in conditions like:
What feature/behavior/change do you want?
The only thing I want to propose for eksctl is - Add an ability to override the value for the --cluster-dns flag passed to the kubelet.
It allows us to deploy a node-local dns cache, that is proposed to resolve this issue. You can find the upstream proposal at https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md. It is implemented in https://github.com/kubernetes/kubernetes/pull/70555.
Seeing how it works, and the implementation, only thing we can't do it outside of eksctl is passing a "magic" IP address that forwards any DNS lookups to the node-local cache, to kubelet's --cluster-dns https://github.com/kubernetes/kubernetes/blob/a3877b1776cc55f5a32103d7a072a73e18c3d939/hack/local-up-cluster.sh#L704.
As we're leaning towards exposing new configurations only to config files, I would be glad if it adds a new field named e.g. clusterDNS. For example, it will be specified like:
nodeGroups:
- name: nodegroup1
clusterDNS: 169.254.20.10
# snip
Btw, I believe the effectiveness of the node-local DNS cache, as I've been using it for more than a year in kube-aws https://github.com/kubernetes-incubator/kube-aws/pull/792.
Implementation-wise, kube-aws uses dnsmasq daemonset as a node-local cache, and the upstream uses CoreDNS daemonset instead, but for the same purpose.
FYI: Even AWS recommends DNS cache https://aws.amazon.com/premiumsupport/knowledge-center/dns-resolution-failures-ec2-linux/?nc1=h_ls
@mumoshu What are your thoughts on the nodelocaldns addon? https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/nodelocaldns
Ah, I am sorry, I missed that you already did this. Quoting your commit message below:
Add a new field named clusterDNS that accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the --cluster-dns flag of kubelet.
nodeGroups:
- name: nodegroup1
clusterDNS: 169.254.20.10
# snip
This, in combination with k8s-dns-node-cache deployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability.
The configuration key clusterDNS is intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload.
It would also be nice to add clusterDNS as a cluster-level config key later. But I believe it isn't a must-have in this change.
See the cluster/addons/dns/nodelocaldns in the upstream repository for more details.
A concrete steps to enable node-local DNS would look like the below:
169.254.20.10clusterDNS: 169.254.20.10 to your nodegroup in the cluster config__PILLAR__LOCAL__DNS__ with 169.254.20.10, __PILLAR__DNS__DOMAIN__ with cluster.local, __PILLAR__DNS__SERVER__ with 10.100.0.10 or 172.20.0.10 according to your VPC CIDR sed -i -e "s/__PILLAR__DNS__DOMAIN__/${KUBE_DNS_NAME:-cluster.local}/g" nodelocaldns.yaml
sed -i -e "s/__PILLAR__DNS__SERVER__/${KUBE_DNS_SERVER_IP:-10.0.0.10}/g" nodelocaldns.yaml
sed -i -e "s/__PILLAR__LOCAL__DNS__/${KUBE_LOCAL_DNS_IP:-169.254.20.10}/g" nodelocaldns.yaml
Ref: https://github.com/weaveworks/eksctl/pull/550#issuecomment-467859114
I apologize for resurrecting this thread, but the nodelocaldns.yaml file actually has 5 variables which, and I'm confused by the difference between DNS__SERVER and CLUSTER__DNS
__PILLAR__DNS__DOMAIN__ == cluster.local
__PILLAR__DNS__SERVER__ == ??
__PILLAR__LOCAL__DNS__ == 169.254.20.10
__PILLAR__CLUSTER__DNS__ == <ClusterIP of Kube/CoreDNS service, e.g 172.20.0.10>
__PILLAR__UPSTREAM__SERVERS__ == /etc/resolv.conf
@ghostsquad according to https://github.com/kubernetes/kubernetes/pull/84383
We have the following variables in the yaml:
__PILLAR__DNS__SERVER__ - set to kube-dns service IP.
__PILLAR__LOCAL__DNS__ - set to the link-local IP(169.254.20.10 by default).
__PILLAR__DNS__DOMAIN__ - set to the cluster domain(cluster.local by default).
The following variables will be set by the node-cache images - k8s.gcr.io/k8s-dns-node-cache:1.15.6 or later.
The values will be determined by reading the kube-dns configMap for custom
Upstream server configuration.
__PILLAR__CLUSTER__DNS__ - Upstream server for in-cluster queries.
__PILLAR__UPSTREAM__SERVERS__ - Upstream servers for external queries.
Also, we are making the listen ip address for the nodelocaldns cache to be both the kube-dns service ip and the link-local ip. So sending requests on either ip will get a response from the cache instance. If we use kube-dns service ip as the listen ip for the cache, we need a different ip for the cache to talk to kube-dns/coreDNS in case of cache misses. That is why we introduced "__PILLAR__CLUSTER__DNS__" A new service will be created with the same selectors as kube-dns. The clusterIP of this service will be filled in as "__PILLAR__CLUSTER__DNS__".
Using kube-dns service ip as the listen ip for the cache will not work in IPVS clusters. This is because IPVS creates its own interface and binds all the service IPs there. node-local-dns interface will not be able to bind that ip again.
@prameshj
Also, we are making the listen ip address for the nodelocaldns cache to be both the kube-dns service ip and the link-local ip. So sending requests on either ip will get a response from the cache instance. If we use kube-dns service ip as the listen ip for the cache, we need a different ip for the cache to talk to kube-dns/coreDNS in case of cache misses. That is why we introduced "PILLAR__CLUSTER__DNS" A new service will be created with the same selectors as kube-dns. The clusterIP of this service will be filled in as "PILLAR__CLUSTER__DNS".
Could you elaborate more this? Is it safe to set in both of them (__PILLAR__CLUSTER__DNS__, __PILLAR__UPSTREAM__SERVERS__) 172.20.0.10?
PILLAR__CLUSTER__DNS, PILLAR__UPSTREAM__SERVERS will be set by the node-cache image. It uses this config map yaml, substitutes these 2 variables and generates the corefile to use. We don't want both these set to kube-dns service IP. Is there a reason you want to set them to this value ?
Thanks. No I wasn't sure if it can be left as is for my use case (aws eks), because the existing documentation is out of date and uses older version of nodelocaldns manifest, without the stub implementation.
@prameshj
PILLAR__CLUSTER__DNS, PILLAR__UPSTREAM__SERVERS will be set by the node-cache image. It uses this config map yaml, substitutes these 2 variables and generates the corefile to use. We don't want both these set to kube-dns service IP. Is there a reason you want to set them to this value ?
Unless using ipvs, correct? The docs say if you use ipvs you need to set the PILLAR__CLUSTER__DNS ?
Is that still the case? (also using eks, but getting crashing - [FATAL] Error parsing flags - Invalid localip specified - "", Exiting).. thanks.
Could you reference the docs that say this? PILLAR__CLUSTER__DNS should be set to kube-dns service IP in IPVS mode.
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/#configuration does say to do this:
If kube-proxy is running in IPVS mode:
sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/__PILLAR__DNS__SERVER__//g; s/__PILLAR__CLUSTER__DNS__/$kubedns/g" nodelocaldns.yaml
In this mode, node-local-dns pods listen only on <node-local-address>.
Most helpful comment
Ah, I am sorry, I missed that you already did this. Quoting your commit message below:
Add a new field named
clusterDNSthat accepts the IP address to the DNS server used for all the internal/external DNS lookups a.k.a the--cluster-dnsflag ofkubelet.This, in combination with
k8s-dns-node-cachedeployed as a daemonset on your cluster, allows all the DNS lookups from your pods to firstly routed to the node-local DNS server, which adds more reliability.The configuration key
clusterDNSis intentionally made per-nodegroup, not per-cluster, so that you can selectively use the node-local DNS. It, in combination with the proper use of node labels/taints, allows you to test the node-local DNS in only a subet of your workload.It would also be nice to add
clusterDNSas a cluster-level config key later. But I believe it isn't a must-have in this change.See the cluster/addons/dns/nodelocaldns in the upstream repository for more details.
A concrete steps to enable node-local DNS would look like the below:
169.254.20.10clusterDNS: 169.254.20.10to your nodegroup in the cluster config__PILLAR__LOCAL__DNS__with169.254.20.10,__PILLAR__DNS__DOMAIN__withcluster.local,__PILLAR__DNS__SERVER__with10.100.0.10or172.20.0.10according to your VPC CIDRSee local-up-cluster.sh: