Kubespray: Failed to upgrade kubernetes from version 1.13.3 to 1.14

Created on 20 Apr 2019 · 11Comments · Source: kubernetes-sigs/kubespray

Environment: On-Prem, ESXi

Cloud provider or hardware configuration: Virtual machine (4 core, 8GiB)

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

root@node20:~# uname -srm
Linux 4.4.0-138-generic x86_64

cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial



md5-db8ef6b6204b49ba42f6f08b7ca66f38



ansible 2.7.10
  config file = /root/satchpx/kubespray/ansible.cfg
  configured module search path = [u'/root/satchpx/kubespray/library']
  ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609]



md5-435f95c80e76c847a4d5ec414c9133cb



[all]
node20   ansible_host=70.0.95.22 ip=70.0.95.22
node21   ansible_host=70.0.95.26 ip=70.0.95.26
node22   ansible_host=70.0.95.18 ip=70.0.95.18
node23   ansible_host=70.0.95.19 ip=70.0.95.19
node24   ansible_host=70.0.95.14 ip=70.0.95.14

[kube-master]
node20

[kube-node]
node20
node21
node22
node23
node24

[etcd]
node20
node21
node22

[k8s-cluster:children]
kube-node
kube-master

Command used to invoke ansible:
ansible-playbook upgrade-cluster.yml -b -i inventory/mycluster/hosts.ini

Output of ansible run:
https://gist.github.com/satchpx/cb86eac9badfb588f02cb8264a642946

Anything else do we need to know:

kinbug lifecyclrotten

Source

satchpx

Most helpful comment

Had similar problem, couldn't upgrade cluster from 1.13.3 to 1.14.1, stuck on 2nd master.
But found solution, apparently there were some problems with inventory files.

downloaded latest release of kubespray v2.10.0
created new inventory cp -r inventory/sample inventory/myproject
diffed old inventory with new inventory diff -r old-inventory/myproject/ new-inventory/myproject/ , saw a lot of changes, changed accordingly what i wanted - like addons - ingress, helm, cluster name etc
applied upgrade
ansible-playbook upgrade-cluster.yml -i inventory/myproject/inventory.ini -e kube_version=v1.14.1

all went smooth :)

marech on 9 May 2019

👍2

All 11 comments

I see the same issue even doing a fresh install of kubernetes 1.14.0. I see the same error signature.
TASK [kubernetes/master : set kubeadm certificate key] *********************************************************************************************************************************** task path: /root/satchpx/kubespray/roles/kubernetes/master/tasks/kubeadm-setup.yml:114 Monday 22 April 2019 17:42:49 +0000 (0:00:00.056) 0:03:13.542 ********** fatal: [node20]: FAILED! => { "msg": "\"hostvars['kube-master']\" is undefined" }

Any help/ pointers would be appreciated.

satchpx on 22 Apr 2019

👍1

I'm doing an upgrade as well, getting the exact same error.. This is on master, but on v2.8.5 I hit another bug and it never gets to that point..

xeor on 22 Apr 2019

Same issue , on master

`TASK` `[kubernetes/master` : set kubeadm certificate key] ****************************************************************************************************************************************************************************
fatal: [master1]: FAILED! => {"msg": "\"hostvars['kube-master']\" is `undefined"}`

Failed to upgrade kubernetes from version 1.13.4 to 1.14
while trying to runupgrage-cluster.yml

mobilitie on 23 Apr 2019

👍1

I tried the v2.9.0 branch (git pull v2.9.0), and it did end up upgrading one of my masters to 1.14. I think it got past this error.. The rest of the nodes are still on 1.13 tho. The failing task is now TASK [kubernetes/master : kubeadm | Upgrade first master], but I think thats another bug..

xeor on 23 Apr 2019

👍2

I was able to successfully upgrade my cluster from 1.13.5 to 1.14.0 using the latest commit on master.
git rev-parse --short HEAD 15eb7db
If someone else can confirm this, we can go ahead and close this issue.

satchpx on 29 Apr 2019

👎1

Failed on second master this time

```TASK [kubernetes/master : kubeadm | Upgrade other masters] ************************************************************************************************************************************************************************
Tuesday 30 April 2019  02:16:20 +0000 (0:00:00.045)       0:27:39.499 *********
fatal: [master3]: FAILED! => {"changed": true, "cmd": ["timeout", "-k", "600s", "600s", "/usr/bin/kubeadm", "upgrade", "apply", "-y", "v1.14.1", "--config=/etc/kubernetes/kubeadm-config.yaml", "--ignore-preflight-errors=all", "--allow-experimental-upgrades", "--allow-release-candidate-upgrades", "--etcd-upgrade=false"], "delta": "0:00:00.130465", "end": "2019-04-30 02:16:20.892965", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2019-04-30 02:16:20.762500", "stderr": "\t[WARNING APIServerHealth]: the API Server is unhealthy; /healthz didn't return \"ok\"\n\t[WARNING ControlPlaneNodesReady]: couldn't list control-planes in cluster: Get https://XX.XX.XXX.XX:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D: dial tcp 10.40.214.17:6443: connect: connection refused\n[upgrade/version] FATAL: the --version argument is invalid due to these fatal errors:\n\n\t- Unable to fetch cluster version: Couldn't fetch cluster version from the API Server: Get https://XX.XX.XXX.XX7:6443/version?timeout=32s: dial tcp XX.XX.XXX.XX:6443: connect: connection refused\n\nPlease fix the misalignments highlighted above and try upgrading again", "stderr_lines": ["\t[WARNING APIServerHealth]: the API Server is unhealthy; /healthz didn't return \"ok\"", "\t[WARNING ControlPlaneNodesReady]: couldn't list control-planes in cluster: Get https://XX.XX.XXX.XX:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D: dial tcp 10.40.214.17:6443: connect: connection refused", "[upgrade/version] FATAL: the --version argument is invalid due to these fatal errors:", "", "\t- Unable to fetch cluster version: Couldn't fetch cluster version from the API Server: Get https://10.XX.XXX.XX:6443/version?timeout=32s: dial tcp XX.XX.XXX.XX:6443: connect: connection refused", "", "Please fix the misalignments highlighted above and try upgrading again"], "stdout": "[preflight] Running pre-flight checks.\n[upgrade] Making sure the cluster is healthy:\n[upgrade/config] Making sure the configuration is correct:\n[upgrade/version] You have chosen to change the cluster version to \"v1.14.1\"", "stdout_lines": ["[preflight] Running pre-flight checks.", "[upgrade] Making sure the cluster is healthy:", "[upgrade/config] Making sure the configuration is correct:", "[upgrade/version] You have chosen to change the cluster version to \"v1.14.1\""]}```

1st master upgraded to 1.14.1, then failed on 2nd master

ubuntu@master1:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION master1 Ready master 4d18h v1.14.1 master2 Ready master 4d18h v1.13.4 master3 Ready,SchedulingDisabled master 4d18h v1.14.1 node1 Ready node 4d18h v1.13.4 node2 Ready node 4d18h v1.13.4 node3 Ready node 4d18h v1.13.4

mobilitie on 30 Apr 2019

Had similar problem, couldn't upgrade cluster from 1.13.3 to 1.14.1, stuck on 2nd master.
But found solution, apparently there were some problems with inventory files.

downloaded latest release of kubespray v2.10.0
created new inventory cp -r inventory/sample inventory/myproject
diffed old inventory with new inventory diff -r old-inventory/myproject/ new-inventory/myproject/ , saw a lot of changes, changed accordingly what i wanted - like addons - ingress, helm, cluster name etc
applied upgrade
ansible-playbook upgrade-cluster.yml -i inventory/myproject/inventory.ini -e kube_version=v1.14.1

all went smooth :)

marech on 9 May 2019

👍2

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 7 Aug 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 6 Sep 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 6 Oct 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.