Kubespray: Error with Calico wait for etcd

Created on 20 Jul 2017 · 4Comments · Source: kubernetes-sigs/kubespray

Environment:

Ubuntu 16.04
Ansible Version 2.3.10

**Kubespray version: b5d3d47

**Network plugin used: Calico

Copy of your inventory file:

[kube-master]
node1       
node2       

[all]
node1       ansible_ssh_host=10.206.46.27
node2       ansible_ssh_host=10.206.47.78
node3       ansible_ssh_host=10.206.45.101
node4       ansible_ssh_host=10.206.45.105

[k8s-cluster:children]
kube-node       
kube-master     

[kube-node]
node1       
node2       
node3       
node4       

[etcd]
node1       
node2       
node3

Command used to invoke ansible:
ansible-playbook -i inventory/inventory.cfg cluster.yml -b -v -u ubuntu

Output of ansible run:

https://gist.github.com/mattdornfeld/a8ecf35e2fc18eb9fecb00c6441ca22d

Anything else do we need to know:

https://paste.pound-python.org/show/d6z4x2KVHY4z9pASpTrW/

Update:

I think i traced the issue to the following. The ip addresses specified in the etcd.env file on the nodes are different than the ones specified in the inventory.cfg file. Here is the etcd.env file

ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://10.206.47.130:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://10.206.47.130:2380
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_LISTEN_CLIENT_URLS=https://10.206.47.130:2379,https://127.0.0.1:2379
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://10.206.47.130:2380
ETCD_NAME=etcd2
ETCD_PROXY=off
ETCD_INITIAL_CLUSTER=etcd1=https://10.206.44.59:2380,etcd2=https://10.206.47.130:2380,etcd3=https://10.206.44.229:2380
ETCD_AUTO_COMPACTION_RETENTION=0

# TLS settings
ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node2.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node2-key.pem
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node2.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node2-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=true

You can see that the ip addresses are different between the two files. I have no idea where the ips in etcd.env came from. Any idea what the source of this is?

Source

mattdornfeld

Most helpful comment

I ran into the same issue.

Originally traced it by running ansible-playbook with -vvv to enable verbose output, at which point I realized that the etcd_access_addresses variable was having older IPs (from a previous terraform run -- i had since destroyed and recreated my stack).

As a newbie to Ansible I don't yet know its internals, but this definitely smelled and sounded like a caching issue, so after a bit of googling I came upon this article that explains how caching of facts is enabled.

Crucially, at the very end, it also mentions how to flush the cache:

Run your ansible-playbook with --flush-cache ;)
Problem solved.

gsaslis on 31 Aug 2017

❤5

All 4 comments

Have the same issue, on bare metal infrastructure with CentOs

GrigorievNick on 29 Jul 2017

@GrigorievNick I actually found the solution to this issue. Kubespray was caching the info about the ec2 instances in my /tmp directory. I think in ansible the way variable precedence works is that variables obtained from get_facts overwrite variables set in your inventory file. What had happened was I had created a kubernetes cluster, kubespray cached the info for the cluster, then I deleted it. When I tried to create a new kubernetes cluster kubespray used the IP addresses from the old cluster when installing etcd. The solution was to delete the cache. I think this might qualify as a bug.

_Sent from my Google Nexus 6 using FastHub_

mattdornfeld on 31 Jul 2017

👍3 🎉1

@mattdornfeld Thanks, it's work for me.

GrigorievNick on 31 Jul 2017

I ran into the same issue.

Crucially, at the very end, it also mentions how to flush the cache:

Run your ansible-playbook with --flush-cache ;)
Problem solved.

gsaslis on 31 Aug 2017

❤5

Was this page helpful?

0 / 5 - 0 ratings