Kubespray: Link etcd certificates for calico-node error

Created on 6 Oct 2018 · 28Comments · Source: kubernetes-sigs/kubespray

i've got a error here

help me please.

failed: [node5] (item={u's': u'node-node5.pem', u'd': u'cert.crt'}) => {"changed": false, "item": {"d": "cert.crt", "s": "node-node5.pem"}, "msg": "Error while linking: [Errno 2] No such file or directory", "path": "/etc/calico/certs/cert.crt", "state": "absent"}

and my host.ini file is...

[k8s-cluster:children]
kube-master
kube-node

[all]
node1 ansible_host=209.XXX.188.XX ip=209.XXX.188.XX
node2 ansible_host=209.XXX.188.XXX ip=209.XXX.188.XX
node3 ansible_host=209.XXX.188.XXX ip=209.XXX.188.XX
node4 ansible_host=209.XXX.188.XXX ip=209.XXX.188.XX
node5 ansible_host=209.XXX.188.XXX ip=209.XXX.188.XX

[kube-master]
node1
node2
node3

[kube-node]
node4
node5

[etcd]
node1
node2
node3

[calico-rr]

[vault]
node1
node2
node3

Source

forkballpitch

Most helpful comment

I have reproduced this issue with ansible==2.7.0
As workaround you can install ansible==2.6.3

dkozlov on 10 Oct 2018

👍4 ❤1 🎉1

All 28 comments

@forkballpitch Could you provide the information listed in the issue template (OS, distrib,.., command-line) and the task name that raises the error?

mirwan on 7 Oct 2018

@mirwan i just cloned this source "https://github.com/kubernetes-incubator/kubespray.git"
and added more worker server.
if you need more information please tell me.
thank you!

os : ubuntu 16.04.4
kubespray version: latest
command line : ansible-playbook -b -v -i inventory/prod/hosts.ini cluster.yml

forkballpitch on 8 Oct 2018

I'm having the same problem. Ubuntu 16.04 clean installs on both kubespray host and kube nodes, kubespray pulled from git, command line:

ansible-playbook -i inventory/kube-cluster-01/hosts.ini cluster.yml

bartlaarhoven on 8 Oct 2018

First can you confirm that:

the (last) failed task reported is "Calico | Link etcd certificates for calico-node"
cert_management is set to "vault"
ansible-playbook has been executed with "-b"
the source file for the link does not actually exist (e.g. /etc/ssl/etcd/ssl/node-node5.pem)

If so, could you check if there was any failed task before (on etcd servers during cert generation, memory checks...) ?

mirwan on 8 Oct 2018

For me:

I attached the output of the failed last task: kubespray-failed-last-task.txt
cert_management was unset (commented out in inventory/kube-cluster-01/group_vars/all/all.yml
the command was ansible-playbook -i inventory/kube-cluster-01/hosts.ini cluster.yml, so no -b
on the failed nodes, the failed source files do not exist indeed

Other possibly related errors or warnings are:

TASK [kubernetes/secrets : Check_certs | Set 'sync_certs' to true on nodes] ***********************************************************************************************************************
Monday 08 October 2018  17:03:51 +0200 (0:00:04.885)       0:05:34.603 ********
 [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: inventory_hostname in groups['kube-node'] and inventory_hostname != groups['kube-
master'][0] and (not item in kubecert_node.files | map(attribute='path') | map("basename") | list or kubecert_node.files | selectattr("path", "equalto", "{{ kube_cert_dir }}/{{ item }}") |
map(attribute="checksum")|first|default('') != kubecert_master.files | selectattr("path", "equalto", "{{ kube_cert_dir }}/{{ item }}") | map(attribute="checksum")|first|default(''))

but also in the same task:

ok: [node5] => (item=node-node5-key.pem)

I didn't find any failed tasks.

Does this help?

bartlaarhoven on 8 Oct 2018

Additional notes:

node1, node2 and node3 have vault and etcd labels
the node-node5.pem file does exist on node1, node2 and node3 in /etc/ssl/etcd/ssl/ (and so do the other missing files)
on the other nodes like node5, the /etc/ssl/etcd/ssl directory contains ca.pem, node-node1-key.pem and node-node1.pem. That's it.

I'm completely new to ansible and trying kubespray for the first time, so I'd love to help out but I'm still figuring out how it works.

bartlaarhoven on 8 Oct 2018

First, I think you must used -b flag (the documentation is being updated that way).
Then, if cert_management is not set in group_vars, there is no need to populate the vault group as the cert management defaults to "script".
Anyway, if node5 cert and key do not exist, it certainly means that it was either not generated or not synced to node5. Can you look at the whole playbook output and see if the "Gen_certs | run cert generation script", "Gen_certs | Gather etcd node certs" and "Gen_certs | Write etcd node certs" tasks run properly?

mirwan on 8 Oct 2018

i have a somethin dont understand. first ini file is error file and second one has no error
error is "cat not find /etc/calico/certs/cert.crt"
i have kubespray pulled from git, command line:

ansible-playbook -b -v -i inventory/prod/hosts.ini cluster.yml

host.ini ( error in node4)

[k8s-cluster:children]
kube-master
kube-node

[all]
node1 ansible_host=~ ip=~
node2 ansible_host=~ ip=~
node3 ansible_host=~ ip=~
node4 ansible_host=~ ip=~

[kube-master]
node1
node2

[kube-node]
node1
node2
node3
node4

[etcd]
node1
node2
node3

[calico-rr]

host.ini (no error, i remove node1~3 in node part)

[k8s-cluster:children]
kube-master
kube-node

[all]
node1 ansible_host=~ ip=~
node2 ansible_host=~ ip=~
node3 ansible_host=~ ip=~
node4 ansible_host=~ ip=~

[kube-master]
node1
node2

[kube-node]

node4

[etcd]
node1
node2
node3

[calico-rr]

and it works~!
root@k-01:~/kubespray# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready master,node 14m v1.12.1
node2 Ready master,node 14m v1.12.1
node3 Ready node 14m v1.12.1
node4 Ready node 14m v1.12.1

forkballpitch on 9 Oct 2018

@forkballpitch I didn't think a server could in kube-node and in etcd/kube-master at the same time. The doc says it can, I will inquire.
@bartlaarhoven Maybe it is the same for you?

mirwan on 9 Oct 2018

Actually, mixing masters/etcd and workload (i.e. nodes) is not a best practice in production.
As far as you have enough servers, you should have nodes on one hand and masters and/or etcd on the other hand.
Our current CI only handles mixing master/etcd with nodes when deploying a less than or equal to 3 nodes cluster

mirwan on 9 Oct 2018

@forkballpitch Btw have you reset your servers (with reset.yml playbook) between your deployments with the 2 inventories? kubectl get nodes should not report node1 and node2 as nodes

mirwan on 9 Oct 2018

I've played around with Ansible and kubespray and opened #3486 as that is what fixed it for me.

bartlaarhoven on 9 Oct 2018

I have reproduced this issue with ansible==2.7.0
As workaround you can install ansible==2.6.3

dkozlov on 10 Oct 2018

👍4 ❤1 🎉1

@bartlaarhoven Regarding @dkozlov 's comment, what version of ansible are you using?

mirwan on 11 Oct 2018

@bartlaarhoven Regarding @dkozlov 's comment, what version of ansible are you using?

@mirwa, I'm having the same problem and I could confirm Kubespray revision 3b750cafc12f8289f23cfcf7e2780cdaee7385e9 returns this error when using Ansible 2.7.0.

It works with Ansible 2.6.3 as dkozlov said.
It also works with Ansible 2.6.5.

tadeugr on 11 Oct 2018

👍3

@dkozlov @mirwan I've used the most recent version of Ansible (fresh install)

ansible-playbook 2.7.0
  config file = None
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
  executable location = /usr/local/bin/ansible-playbook
  python version = 2.7.12 (default, Dec  4 2017, 14:50:18) [GCC 5.4.0 20160609]

bartlaarhoven on 12 Oct 2018

I was able to reproduce the issue with ansible 2.7.
It seems that ansible gets messed up at task "etcd : Gen_certs | Write etcd node certs" (cert from one node is written both on the node and another)
Btw, the "etcd : Gen_certs | Get etcd certificate serials" wrongly succeed for the node with the wrong cert.
I'm looking into it

mirwan on 12 Oct 2018

I think we currently hit that issue: https://github.com/ansible/ansible/issues/46600
Maybe there is a fix consisting using another ansible module...

mirwan on 12 Oct 2018

I have issues signing the collaboration document (as it should be from my company etc.) but I'd like to point again to my PR #3486 as that fixed it for me in Ansible 2.7 and it uses the same way of distributing certificates as in other parts of kubespray.

bartlaarhoven on 12 Oct 2018

@bartlaarhoven I'm currently testing your branch ;-)

mirwan on 12 Oct 2018

hey @mirwan any news on this topic? This is a show stopper for me...

caruccio on 15 Oct 2018

@caruccio There's only one step left before merging the PR#3486 (and I guess you know what's left to be done and certainly why this step cannot be skipped). In the meantime, downgrading to ansible 2.6 could do the trick.

mirwan on 15 Oct 2018

👍1

I see... I live in Brazil and I really known what bureaucracy means for life on earth.

caruccio on 15 Oct 2018

I'm still facing this problem on v2.7 and master
any updates on this?

thiguetta on 22 Oct 2018

@mirwan Do you have a contact point for me at TLF to get me another agreement?

bartlaarhoven on 22 Oct 2018

@thiguetta as said, it's a bug in ansible 2.7, it's not something we can fix in kubespray.
The only update we have is to use ansible 2.6.x until the ansible team fixes the issue.