Kubespray: CoreDNS pods left in crashloopbackoff state after running upgrade

Created on 27 Aug 2020 · 7Comments · Source: kubernetes-sigs/kubespray

Environment:

Cloud provider or hardware configuration: bare-metal
OS: Fedora CoreOS v31.20200517.3.0
Version of Ansible: 2.9.6
Version of Python: 3.7.7

Kubespray version (commit): 39fa9503d93115ead8c9369dcaf45c839f44cc8b

Network plugin used: calico and flannel

Full inventory with variables:
Relevant bits from k8s-cluster.ym group vars:
kube_version: v1.16.10
kube_network_plugin: calico
dns_mode: coredns

Command used to invoke ansible:
ansible-playbook -b -i $inventory kubespray/upgrade-cluster.yml -vv

Output of ansible run:
See gist
The interesting bit is:

stderr: |-
    W0826 09:01:43.472706  129030 defaults.go:199] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
    W0826 09:01:43.493193  129030 defaults.go:199] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
            [WARNING CoreDNSUnsupportedPlugins]: start version '1.6.5' not supported
            [WARNING CoreDNSMigration]: CoreDNS will not be upgraded: start version '1.6.5' not supported
    W0826 09:01:48.741707  129030 dns.go:245] the CoreDNS Configuration was not migrated: unable to migrate CoreDNS ConfigMap: start version '1.6.5' not supported. The existing CoreDNS Corefile configuration has been retained.
  stderr_lines: <omitted>

Anything else do we need to know:
CoreDNS version deployed is 1.6.5

Analysis
It seems running kubeadm upgrade, causes it to attempt migrating CoreDNS (whether or not it needs to).
But a specific kubeadm version seems to only support migration of some versions of CoreDNS (which would make sense).
In some scenario, kubeadm will not be able to migrate CoreDNS and will:

edit the config map
edit the deployment

This results in the following invalid deployment:

apiVersion: apps/v1
kind: Deployment
[...]
spec:
  [...]
  template:
  [...]
    spec:
      [...]
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: docker.io/coredns/coredns:1.6.5
      [...]
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile-backup
            path: Corefile-backup

The path to config file as specified to the container command and the path where the config is actually mounted are different.

I'm not sure what solution there is as upgrading kubeadm/kubernetes is not always easy in prod environments.
The best I can think of, is a workaround to detect this migration failure after kubeadm upgrade was run and to fix it the deployment then.

I did create such a patch and it is working. But perhaps someone has a more elegant solution in mind?

Cheers,

kinbug

Source

thegreenbear

All 7 comments

Yes we got a lot of issue with that, that's why we are now checking (when PR are created) and ensuring that CoreDNS version is supported by Corefile-migration bundled with Kubeadm

floryut on 27 Aug 2020

Also the bug with configmap being left erroneous was also fixed on kubernetes end https://github.com/kubernetes/kubernetes/pull/88811

I suggest to close this issue as it should not happen with recent version of either spray or kubernetes

floryut on 27 Aug 2020

Sounds good.
I am happy to see there is a fix to the root cause.

Do you know in which Kubernetes version this is fixed?

Should I still bother creating a PR with my proposed work around with a
TODO to remove it once older versions are not supported or you think it's
not worth it?
I'm just wondering if we're the only ones bothered by the issue or not :-)

Cheers,

On Thu, Aug 27, 2020, 13:39 Florian Ruynat notifications@github.com wrote:

Also the bug with configmap being left erroneous was also fixed on
kubernetes end kubernetes/kubernetes#88811
https://github.com/kubernetes/kubernetes/pull/88811

I suggest to close this issue as it should not happen with recent version
of either spray or kubernetes

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kubespray/issues/6596#issuecomment-681894876,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB2IFBOTZMVGTAEU5WI6E3TSCZAYJANCNFSM4QM2QFBQ
.

thegreenbear on 28 Aug 2020

Sounds good. I am happy to see there is a fix to the root cause. Do you know in which Kubernetes version this is fixed? Should I still bother creating a PR with my proposed work around with a TODO to remove it once older versions are not supported or you think it's not worth it? I'm just wondering if we're the only ones bothered by the issue or not :-) Cheers,
…
On Thu, Aug 27, 2020, 13:39 Florian Ruynat @.*> wrote: Also the bug with configmap being left erroneous was also fixed on kubernetes end kubernetes/kubernetes#88811 <kubernetes/kubernetes#88811> I suggest to close this issue as it should not happen with recent version of either spray or kubernetes — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#6596 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2IFBOTZMVGTAEU5WI6E3TSCZAYJANCNFSM4QM2QFBQ .

Looks like the fix in Kubernetes landed in 1.19, so pretty recent.

That's nice of you to have a patch and might be useful for some people (if they land on this issue while searching) so you may paste it here.
But I don't think we would merge it in master as it will be deprecated really soon (and we pin coreDNS version since 2.13 to be sure not to have this error) 😄

floryut on 28 Aug 2020

/close
@thegreenbear feel free to post your patch here, if anyone happens to need it.
otherwise tldr: CoreDNS version should be supported by corefile-migration lib bundled with kubernetes (since k8s 1.15) otherwise you will end up with weird thing during upgrade/deploy

floryut on 31 Aug 2020

👍1

@floryut: Closing this issue.

In response to this:

/close
@thegreenbear feel free to post your patch here, if anyone happens to need it.
otherwise tldr: CoreDNS version should be supported by corefile-migration lib bundled with kubernetes (since k8s 1.15) otherwise you will end up with weird thing during upgrade/deploy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.