Edit: see https://github.com/kubernetes/kops/pull/3563#issuecomment-335039130
Due to https://security.googleblog.com/2017/10/behind-masq-yet-more-dns-and-dhcp.html
We have an upgraded version of kube-dns that needs to be PR'ed and tested.
If you do not want to wait for a kops release update your kubernetes cluster accordingly
kubectl set image deployment/kube-dns -n kube-system \
kubedns=gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5 \
dnsmasq=gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.5 \
sidecar=gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.5
kubectl -n kube-system get po
EDIT: see headline for complete commands.
~Hotfix command should be kubectl set image deployment/kube-dns -n kube-system kubedns=gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.5 dnsmasq=gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.5~
Note: @justinsb missed the sidecar which is included in the above command https://github.com/kubernetes/kops/issues/3512#issue-262128556
PR is merged, and if testing goes well will be included in our next 1.8.0 alpha release
Awesome. Thank you guys!
Tested https://github.com/kubernetes/kops/pull/3511 on a fresh aws cluster after jumping through some hoops. Testing spinning up a 1.7.7 cluster now for kicks.
Tests above were done on current stable (1.7.2). Results the same as below minus obvious version numbers.
Also verified, looks good on 1.7.7 (gotta update my kubectl):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-15T08:51:09Z", GoVersion:"go1.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.7", GitCommit:"8e1552342355496b62754e61ad5f802a0f3f1fa7", GitTreeState:"clean", BuildDate:"2017-09-28T23:56:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
$ kops version
Version 1.8.0-alpha.2+6ea6e3aa3 (git-6ea6e3aa3)
$ kubectl get deployment -n kube-system kube-dns -o jsonpath='{.spec.template.spec.containers[?(@.name == "dnsmasq")].image}'
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.5
$ kubectl get deployment -n kube-system kube-dns -o jsonpath='{.spec.template.spec.containers[?(@.name == "sidecar")].image}'
gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.5
$ kops validate cluster
Using cluster from kubectl context: kops-dev-splain.example.com
Validating cluster kops-dev-splain.example.com
INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
master-us-east-1a Master m4.large 1 1 us-east-1a
master-us-east-1b Master m4.large 1 1 us-east-1b
master-us-east-1c Master m4.large 1 1 us-east-1c
nodes Node m4.large 2 2 us-east-1a,us-east-1b,us-east-1c
NODE STATUS
NAME ROLE READY
ip-10-25-122-41.ec2.internal master True
ip-10-25-43-61.ec2.internal master True
ip-10-25-53-205.ec2.internal node True
ip-10-25-93-4.ec2.internal master True
ip-10-25-95-195.ec2.internal node True
Your cluster kops-dev-splain.example.com is ready
Also verified on v1.8.0. Same results as above.
My cluster which is v1.5.7 does not have the same kube-dns that is listed above. I don't have the same images in my kube-dns pod. Getting the deployment and dumping / grepping for image shows:
"image": "gcr.io/google_containers/kubedns-amd64:1.9",
"image": "gcr.io/google_containers/kube-dnsmasq-amd64:1.4",
"image": "gcr.io/google_containers/dnsmasq-metrics-amd64:1.0",
"image": "gcr.io/google_containers/exechealthz-amd64:1.2",
What is the upgrade path for a cluster that is < 1.6.x?
I have asked on the dev list for a compatibility matrix and have not gotten an answer. I will need to look in kubernetes / kubernetes to see. With the other pod container names are you able to upgrade to 1.4.5
@snoby according to k8s 1.5.8 release (https://github.com/kubernetes/kubernetes/pull/53149/files) you need to update just dnsmasq container in kube-dns pod.
kubectl set image deployment/kube-dns -n kube-system dnsmasq=gcr.io/google_containers/k8s-dns-dnsmasq-amd64:1.14.5
after that you will see dnsmasq updated version
/ # dnsmasq --version
Dnsmasq version 2.78-security-prerelease Copyright (c) 2000-2017 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
This software comes with ABSOLUTELY NO WARRANTY.
Dnsmasq is free software, and you are welcome to redistribute it
under the terms of the GNU General Public License, version 2 or 3.
I have checked other containers that are in kube-dns pod.
They do not have dnsmasq binary.
Trying to collate this information into a file: https://github.com/kubernetes/kops/pull/3534
From Aaron on the dev list. For k8s 1.5 and below.
kubectl set image deployment/kube-dns dnsmasq=gcr.io/google_containers/k8s-dns-dnsmasq-amd64:1.14.5 --namespace=kube-system
I followed this for version 1.5.5
kubectl set image deployment/kube-dns dnsmasq=gcr.io/google_containers/k8s-dns-dnsmasq-amd64:1.14.5 --namespace=kube-system
Iam seeing that kube-dns pod keeps restarting . The kube events show the liveness check is failing
4m 4m 1 kube-dns-4101612645-bs7cp Pod spec.containers{dnsmasq-metrics} Normal Killing {kubelet } Killing container with docker id 513ace4768cd: Need to kill pod.
4m 4m 3 kube-dns-4101612645-bs7cp Pod spec.containers{dnsmasq} Warning Unhealthy {kubelet } Liveness probe failed: Get http://10.244.1.4:8080/healthz-dnsmasq: dial tcp 10.244.1.4:8080: getsockopt: connection refused
4m 4m 3 kube-dns-4101612645-bs7cp Pod spec.containers{kubedns} Warning Unhealthy {kubelet } Liveness probe failed: Get http://10.244.1.4:8080/healthz-kubedns: dial tcp 10.244.1.4:8080: getsockopt: connection refused
3m 3m 1 kube-dns-4101612645-bs7cp Pod spec.containers{dnsmasq} Normal Killing {kubelet } Killing container with docker id 4e112b5de586: Need to kill pod.
3m 3m 1 kube-dns-4101612645-bs7cp Pod spec.containers{kubedns} Normal Killing {kubelet} Killing container with docker id f7bc7265e2ad: Need to kill pod.
9m 9m 1 kube-dns-4101612645-pk02k Pod spec.containers{dnsmasq-metrics} Normal Killing {kubelet } Killing container with docker id 5a00dc4306f2: Need to kill pod.
9m 9m 1 kube-dns-4101612645-pk02k Pod spec.containers{healthz} Normal Killing {kubelet } Killing container with docker id aa65abcf7fb0: Need to kill pod.
9m 9m 3 kube-dns-4101612645-pk02k Pod spec.containers{dnsmasq} Warning Unhealthy {kubelet} Liveness probe failed: Get http://10.244.1.2:8080/healthz-dnsmasq: dial tcp 10.244.1.2:8080: getsockopt: connection refused
And once the pod stabilizes , if I see the dnsmasq version - it shows the old one
kubectl --kubeconfig=./kubecfg -n kube-system exec -it kube-dns-4101612645-nmm04 -c dnsmasq /bin/sh/ # dnsmasq -v
Dnsmasq version 2.76 Copyright (c) 2000-2016 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
This software comes with ABSOLUTELY NO WARRANTY.
Dnsmasq is free software, and you are welcome to redistribute it
under the terms of the GNU General Public License, version 2 or 3.
/ # exit
@varsharaja is it looking for the new config map?
@varsharaja I edited your comment so I could read it better, can we get the previous logs from the container?
@varsharaja I just setup a 1.5.7 cluster, ran your command and things look fine. What version of kops are using? It may be useful to get the rest of your kube-dns deployment if you don't mind.
```
$ kubectl exec -it --namespace kube-system kube-dns-4146767324-2clz9 -c dnsmasq sh
/ # ps aux
PID USER TIME COMMAND
1 root 0:00 /usr/sbin/dnsmasq --keep-in-foreground --cache-size=1000 --no-resolv --server=127.0.0.1#10053 --log-facility=-
5 root 0:00 sh
11 root 0:00 ps aux
/ # /usr/sbin/dnsmasq -v
Dnsmasq version 2.78-security-prerelease Copyright (c) 2000-2017 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
This software comes with ABSOLUTELY NO WARRANTY.
Dnsmasq is free software, and you are welcome to redistribute it
under the terms of the GNU General Public License, version 2 or 3.```
I made a mistake on one of the clusters and conflated the 1.5 and the 1.6/1.7 instructions. If you apply the change, make double sure that kubectl get pods -n kube-system | grep kube-dns is showing all the containers running, and not 2/3 or 3/4
@mikesplain : I havent used kops for the cluster upgrade. I tried changing only the dnsmasq container image with the kubectl command.
My kubernetes is in 1.5.5
Here is the kube-dns pod config :
10:40 $ kubectl --kubeconfig=./kubecfg describe pod kube-dns-4101612645-nmm04 -n kube-system
Name: kube-dns-4101612645-nmm04
Namespace: kube-system
Node:
Start Time: Wed, 04 Oct 2017 23:30:36 +0530
Labels: k8s-app=kube-dns
pod-template-hash=4101612645
Status: Running
IP: 10.244.1.6
Controllers: ReplicaSet/kube-dns-4101612645
Containers:
kubedns:
Container ID: docker://0dd8c3e235fd7d44dd982dbf2d92834c8bc58eafd367b620d6de19ce5934c16a
Image: gcr.io/google_containers/kubedns-amd64:1.9
Image ID: docker://sha256:26cf1ed9b14486b93acd70c060a17fea13620393d3aa8e76036b773197c47a05
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
State: Running
Started: Wed, 04 Oct 2017 23:30:37 +0530
Ready: True
Restart Count: 0
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1d8jv (ro)
Environment Variables:
PROMETHEUS_PORT: 10055
dnsmasq:
Container ID: docker://28deda2c3a30e609a1de31e336840b63c668bcc3d0119a11ebddadbc8aeb0b38
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
Image ID: docker://sha256:3ec65756a89b70b4095e43a340a6e2d5696cac7a93a29619ff5c4b6be9af2773
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
Requests:
cpu: 150m
memory: 10Mi
State: Running
Started: Wed, 04 Oct 2017 23:30:37 +0530
Ready: True
Restart Count: 0
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1d8jv (ro)
Environment Variables:
dnsmasq-metrics:
Container ID: docker://ddea9c4843dbb32301fade9b02c0ba7ee228cc28bb173d6dab1e8ad039428a4e
Image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0
Image ID: docker://sha256:5271aabced07deae353277e2b8bd5b2e30ddb0b4a5884a5940115881ea8753ef
Port: 10054/TCP
Args:
--v=2
--logtostderr
Requests:
memory: 10Mi
State: Running
Started: Wed, 04 Oct 2017 23:30:38 +0530
Ready: True
Restart Count: 0
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1d8jv (ro)
Environment Variables:
healthz:
Container ID: docker://14c88a91acac4064facd603981733204a9d9c7e19e40345d48ca69ac13755bb7
Image: gcr.io/google_containers/exechealthz-amd64:1.2
Image ID: docker://sha256:93a43bfb39bfe9795e76ccd75d7a0e6d40e2ae8563456a2a77c1b4cfc3bbd967
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
State: Running
Started: Wed, 04 Oct 2017 23:30:38 +0530
Ready: True
Restart Count: 0
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1d8jv (ro)
Environment Variables:
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-1d8jv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1d8jv
QoS Class: Burstable
Tolerations: CriticalAddonsOnly=:Exists
This the default dns deployment in the 1.5.5.
Even after I issue the following command :+1: kubectl set image deployment/kube-dns dnsmasq=gcr.io/google_containers/k8s-dns-dnsmasq-amd64:1.14.5 --namespace=kube-system
It says deployment image updated , I see the image getting pulled in events , the pod is created but then gets killed and it falls back to old image
@chrislovecnm
The spin up is within seconds .Let me try to get the logs from the syslog. Let me check on the configmap part as well
@varsharaja container logs via kubectl please
Here it is
kubectl --kubeconfig=./kubecfg -n kube-system logs kube-dns-4101612645-8375g -c kubedns
I1005 07:24:37.775741 1 dns.go:42] version: v1.6.0-alpha.0.680+3872cb93abf948-dirty
I1005 07:24:37.776007 1 server.go:107] Using https://10.0.0.1:443 for kubernetes master, kubernetes API:
I1005 07:24:37.777923 1 server.go:68] Using configuration read from ConfigMap: kube-system:kube-dns
I1005 07:24:37.778018 1 server.go:113] FLAG: --alsologtostderr="false"
I1005 07:24:37.778062 1 server.go:113] FLAG: --config-map="kube-dns"
I1005 07:24:37.778076 1 server.go:113] FLAG: --config-map-namespace="kube-system"
I1005 07:24:37.778080 1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I1005 07:24:37.778083 1 server.go:113] FLAG: --dns-port="10053"
I1005 07:24:37.778089 1 server.go:113] FLAG: --domain="cluster.local."
I1005 07:24:37.778095 1 server.go:113] FLAG: --federations=""
I1005 07:24:37.778099 1 server.go:113] FLAG: --healthz-port="8081"
I1005 07:24:37.778102 1 server.go:113] FLAG: --kube-master-url=""
I1005 07:24:37.778106 1 server.go:113] FLAG: --kubecfg-file=""
I1005 07:24:37.778109 1 server.go:113] FLAG: --log-backtrace-at=":0"
I1005 07:24:37.778113 1 server.go:113] FLAG: --log-dir=""
I1005 07:24:37.778117 1 server.go:113] FLAG: --log-flush-frequency="5s"
I1005 07:24:37.778121 1 server.go:113] FLAG: --logtostderr="true"
I1005 07:24:37.778125 1 server.go:113] FLAG: --stderrthreshold="2"
I1005 07:24:37.778128 1 server.go:113] FLAG: --v="0"
I1005 07:24:37.778131 1 server.go:113] FLAG: --version="false"
I1005 07:24:37.778136 1 server.go:113] FLAG: --vmodule=""
I1005 07:24:37.778196 1 server.go:155] Starting SkyDNS server (0.0.0.0:10053)
I1005 07:24:37.778417 1 server.go:165] Skydns metrics enabled (/metrics:10055)
I1005 07:24:37.778607 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I1005 07:24:37.778661 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
E1005 07:24:38.805492 1 sync.go:105] Error getting ConfigMap kube-system:kube-dns err: configmaps "kube-dns" not found
E1005 07:24:38.805514 1 dns.go:190] Error getting initial ConfigMap: configmaps "kube-dns" not found, starting with default values
I1005 07:24:38.807492 1 server.go:126] Setting up Healthz Handler (/readiness)
I1005 07:24:38.807572 1 server.go:131] Setting up cache handler (/cache)
I1005 07:24:38.807585 1 server.go:120] Status HTTP port 8081
Looks like it is look for some config map but I dont have any configmaps for kube-system
You need to create the config-map by hand. This was an issue upgrading to the newest version of kube-dns.
@varsharaja see https://github.com/kubernetes/kops/issues/2827#issuecomment-315323072 for instructions
Specifically
$ kubectl -n kube-system get configmap kube-dns
If not, then create an empty one:
$ kubectl create configmap -n kube-system kube-dns
@chrislovecnm: Thanks for the pointer, fixed the configmap issue. However the pod failed with the liveness-health check the first time. I had to edit the skydns-rc.yaml file to set the image name. Things worked fine after this.
Now I have the latest dnsmasq running.
@varsharaja did the kubectl command not work for you?
@chrislovecnm : When I gave the kubectl command, I see the already running kubedns pod getting killed, and new one starting. This new pod is killed within 1 min due to liveness check failure and if skydns-rc.yaml points to old dnsmasq image , the new pod + 1 has the old image for dnsmasq.
Once I make the change the rc file, the new image get loaded when i kill the kubedns pod.
There probably is a typo in the command to update the image for 1.6.
The command sets gcr.io/google_containers/k8s-dns-dnsmasq-amd64:1.14.5 as image. When trying that, the dnsmasq container ends up in a crash loop.
The image that was previously used was gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1 (note that it includes -nanny).
I'm not aware of the differences between the nanny and non-nanny images, but the nanny-version ( k8s-dns-dnsmasq-nanny-amd64:1.14.5) starts up and DNS seems to work within the cluster.
@meese can you pr the update to the docs or give me cli examples so I can update it pleas?
@chrislovecnm The correct command is:
kubectl set image deployment/kube-dns -n kube-system \
dnsmasq=gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.5
Sorry, my employer doesn't have the CLA accepted yet. I'll poke them again, so I can create a PR next time.
Closing as we have notes and a new release
Thanks @chrislovecnm. I saw this issue too. One of my colleagues did kubectl create configmap -n kube-system kube-dns and it resolved the following error due to which the kube-dns pod was in CrashLoopBackOff state:
$ kubectl describe pod <kube-dns pod's name> -n=kube-system
Warning Unhealthy 4m (x106 over 1h) kubelet, k8master Liveness probe failed: Get http://10.0.0.2:10054/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Most helpful comment
PR is merged, and if testing goes well will be included in our next 1.8.0 alpha release