Kubespray: ETCD : Source /etc/ssl/etcd/ssl/ca.pem not found

Created on 14 Jun 2018 · 29Comments · Source: kubernetes-sigs/kubespray

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Environment:

Cloud provider or hardware configuration:
Newly installed KVM Virtual machines

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
[root@node2 ~]# printf "$(uname -srm)\n$(cat /etc/os-release)\n"
Linux 3.10.0-693.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
[root@node2 ~]# exit
logout
Connection to 192.168.122.128 closed.
[root@localhost kubespray]# ssh 192.168.122.5
Last login: Thu Jun 14 18:54:43 2018 from 192.168.122.245
[root@node3 ~]#
[root@node3 ~]# printf "$(uname -srm)\n$(cat /etc/os-release)\n"
Linux 3.10.0-862.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
[root@node3 ~]#

Version of Ansible (ansible --version):
[root@localhost kubespray]# ansible --version
ansible 2.5.4
config file = /opt/kubespray/ansible.cfg
configured module search path = ['/opt/kubespray/library']
ansible python module location = /usr/lib/python3.6/site-packages/ansible
executable location = /usr/bin/ansible
python version = 3.6.5 (default, Apr 10 2018, 17:08:37) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
[root@localhost kubespray]#

Kubespray version (commit) (git rev-parse --short HEAD):
[root@localhost kubespray]# git rev-parse --short HEAD
0686b84
[root@localhost kubespray]#

Network plugin used:
No.

Copy of your inventory file:
[root@localhost kubespray]# cat inventory/mycluster/hosts.ini
[all]
node1 ansible_host=192.168.122.64 ip=192.168.122.64
node2 ansible_host=192.168.122.128 ip=192.168.122.128
node3 ansible_host=192.168.122.5 ip=192.168.122.5

[kube-master]
node1
node2

[kube-node]
node1
node2
node3

[etcd]
node1
node2
node3

[k8s-cluster:children]
kube-node
kube-master

[calico-rr]

[vault]
node1
node2
node3

[root@localhost kubespray]#

Command used to invoke ansible:
ansible-playbook -i inventory/mycluster/hosts.ini cluster.yml

Output of ansible run:

TASK [etcd : include_tasks] *****************************************************
Thursday 14 June 2018 18:55:38 +0530 (0:00:01.421) 0:06:02.708 **
included: /opt/kubespray/roles/etcd/tasks/upd_ca_trust.yml for node2, node3

TASK [etcd : Gen_certs | target ca-certificate store file] ******************************************
Thursday 14 June 2018 18:55:39 +0530 (0:00:00.505) 0:06:03.214 **
ok: [node2]
ok: [node3]

TASK [etcd : Gen_certs | add CA to trusted CA dir] *********************************************
Thursday 14 June 2018 18:55:39 +0530 (0:00:00.833) 0:06:04.048 ***
fatal: [node2]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}
fatal: [node3]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}

NO MORE HOSTS LEFT *********************************************************
to retry, use: --limit @/opt/kubespray/cluster.retry

PLAY RECAP ***********************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0
node1 : ok=15 changed=1 unreachable=0 failed=1
node2 : ok=162 changed=10 unreachable=0 failed=1
node3 : ok=158 changed=13 unreachable=0 failed=1

Thursday 14 June 2018 18:55:41 +0530 (0:00:01.416) 0:06:05.464 *******

...

Shubham Pardeshi

lifecyclrotten

Source

sppavilion

👍1

Most helpful comment

I encountered the same issue. In my case the master node was low on resources (memory) but the play didn't stop on that task (check if enough memory available). So basically I got no tasks ran on the master node, including the cert generation. Could this be your case?
I'm also using Ubuntu (16.04) on Azure.

lgg42 on 26 Jun 2018

👍3

All 29 comments

I'm encountering the same problem, my setup is basically the same, but Ubuntu, looks like the certs are not been generated.

Punkado on 17 Jun 2018

👍1

lgg42 on 26 Jun 2018

👍3

the same case =(

naumvd95 on 2 Jul 2018

@naumvd95 have you checked that all of the previous tasks on the masters node have been run before failing on the certificate stuff?

lgg42 on 2 Jul 2018

I kind of solve the problem changing the "bootstrap_os" to "none" in the all.yml. I don't know why it solved, but if i put Ubuntu there, does not work...

Punkado on 6 Jul 2018

If this is a memory problem (that was my case too) you can see this logs before the end :

TASK [kubernetes/preinstall : Stop if memory is too small for masters] ******************************
Wednesday 08 August 2018  09:25:57 +0200 (0:00:00.592)       0:06:30.520 ******
fatal: [infra-k8s-master-01]: FAILED! => {
    "assertion": "ansible_memtotal_mb >= 1500",
    "changed": false,
    "evaluated_to": false
}
fatal: [infra-k8s-master-02]: FAILED! => {
    "assertion": "ansible_memtotal_mb >= 1500",
    "changed": false,
    "evaluated_to": false
}

TASK [kubernetes/preinstall : Stop if memory is too small for nodes] ********************************
Wednesday 08 August 2018  09:25:57 +0200 (0:00:00.419)       0:06:30.939 ******
fatal: [infra-k8s-worker-01]: FAILED! => {
    "assertion": "ansible_memtotal_mb >= 1024",
    "changed": false,
    "evaluated_to": false
}
fatal: [infra-k8s-worker-02]: FAILED! => {
    "assertion": "ansible_memtotal_mb >= 1024",
    "changed": false,
    "evaluated_to": false
}
fatal: [infra-k8s-worker-03]: FAILED! => {
    "assertion": "ansible_memtotal_mb >= 1024",
    "changed": false,
    "evaluated_to": false
}

ltupin on 8 Aug 2018

@ltupin interesting, so your play did fail on the check memory task...

lgg42 on 8 Aug 2018

Not exactly, it continue after this fatal and at the end I get the certificate error:

TASK [kubernetes/secrets : Gen_certs | add CA to trusted CA dir] ************************************
Wednesday 08 August 2018  09:33:56 +0200 (0:00:00.458)       0:14:30.345 ******
fatal: [infra-k8s-etcd-01]: FAILED! => {"changed": false, "msg": "Source /etc/kubernetes/ssl/ca.pem not found"}
fatal: [infra-k8s-etcd-03]: FAILED! => {"changed": false, "msg": "Source /etc/kubernetes/ssl/ca.pem not found"}
fatal: [infra-k8s-etcd-02]: FAILED! => {"changed": false, "msg": "Source /etc/kubernetes/ssl/ca.pem not found"}

Exactly same as you.

ltupin on 8 Aug 2018

Hi,
any results on this? Have you had the chance to fix it?

My logs are as follows:

TASK [etcd : Gen_certs | add CA to trusted CA dir] *************
fatal: [aadigital2]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}
fatal: [aadigital3]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}

Any ideas to look at? The nodes should have enough memory and disk space. It seems that the certs are not generated at all.

Thanks!

mbecker on 11 Aug 2018

@mbecker what was the result of your task "Stop if memory is too small for masters" ??

lgg42 on 11 Aug 2018

HI @lgg42 ,

the logs for that tasks are as follows

TASK [kubernetes/preinstall : Stop if memory is too small for masters] *******
task path: /home/mbecker/***/kubespray/roles/kubernetes/preinstall/tasks/verify-settings.yml:52
skipping: [aadigital2] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}
ok: [aadigital1] => {
    "changed": false,
    "msg": "All assertions passed"
}
skipping: [aadigital3] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}

The nodes have 32GB RAM, so I do not think that this should be the problem ;-) Thanks for your help! Is there any chance to skip the generation / copying of certificates and maybe generate own certs and update the configuration of each component manually?

mbecker on 12 Aug 2018

@mbecker sorry that I can't be of more help here, I'm pretty far from a PC :relieved:
What about going through all the conditions (within all tasks) that must be meet to generate the certs? Chances are maybe one/several of them failed.
I'm guessing you already tried to run the playbook several times right? Is not the right way but just to rule out some other issues.

lgg42 on 12 Aug 2018

I have the same problem here:
On debian Stretch and tag/branch v2.8.0 or master.

TASK [etcd : Gen_certs | add CA to trusted CA dir] ********************************************************************************************************************************************
Wednesday 05 December 2018  13:55:16 +0100 (0:00:00.054)       0:02:54.371 **** 
fatal: [node4]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}
fatal: [node5]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}

node:
2 X86 64bit Cores
2GB memory

master:
4 X86 64bit Cores
4GB memory

henres on 5 Dec 2018

Same problem on 2.8.0 on Ubuntu 18.04.

WilliamDurin on 8 Dec 2018

@henres 2Gb for your nodes could be the reason (that was mine) can you try with more memory and check logs if you find trace I mentioned earlier in this thread?

ltupin on 9 Dec 2018

I'm seeing the same:

TASK [etcd : Gen_certs | add CA to trusted CA dir] *****************************************************************************************************************************************************************************************************************************
Thursday 13 December 2018  16:06:06 -0500 (0:00:00.157)       0:02:29.444 *****
fatal: [w2]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}
fatal: [w1]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}
fatal: [w3]: FAILED! => {"changed": false, "msg": "Source /etc/ssl/etcd/ssl/ca.pem not found"}

Worth noting, the nodes that are failing are not in the etcd group. Should this task be run on these nodes?

[kube-master]
m1
m2
m3

[etcd]
m1
m2
m3

[kube-node]
w1
w2
w3

[k8s-cluster:children]
kube-master
kube-node

macarpen on 13 Dec 2018

I was able to resolve this. In my case the deployment for the nodes which are in the etcd group was failing at the assertion stage (in particular they were failing on the assertion Stop if RBAC and anonymous-auth are not enabled when insecure port is disabled, resolved by setting kube_api_anonymous_auth=true). Once the failing assertion was fixed the original error went away.

macarpen on 14 Dec 2018

Having the same issue, though my etcd nodes are seperate from masters and workers (dedicated for each type). The etcd nodes fail the Stop if RBAC and anonymous-auth are not enabled when insecure port is disabled - I just fail to see how that should/could possibly have any effect since the etcd nodes are out-of-cluster entirely (without kubelet or anything I would presume).

headconnect on 21 Dec 2018

I can confirm the 'solution' for version 2.8.1: kube_api_anonymous_auth=true
Thanks @macarpen :)

I also discovered that the bug isn't there with the current commit version of the master branch (39d7503). It's working fine without kube_api_anonymous_auth=true

ghost on 4 Jan 2019

Same problem on 2.8.1 on Ubuntu 16.04.

sufuf3 on 16 Jan 2019

Also on 2.8.2 on Ubuntu 16.04

llarsson on 6 Feb 2019

I solved it for my case! My problem was that an unrelated error had made the master node stop running the playbook. So since it is the owner of the CA file, it could not generate it, and thus, the others could not obtain a copy of it.

Looking at the output people have posted here, it looks like this could actually be the case for more than just myself. And if so, @lgg42 was right about the intuition that caused the question "have you checked that all of the previous tasks on the masters node have been run before failing on the certificate stuff?"

llarsson on 6 Feb 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 7 May 2019

Same problem here.
For all my hosts in group kube-node, /etc/ssl/etcd/ssl/ca.pem is missing at this step

ledroide on 29 May 2019

👍1

/remove-lifecycle stale

ledroide on 31 May 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 3 Sep 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 3 Oct 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 2 Nov 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.