Following this guide https://github.com/coreos/etcd/blob/master/Documentation/docker_guide.md
I created a 3 node cluster.
Initially it was working fine. However, upon restarting one of the nodes I get the following error now.
/usr/bin/docker run -ti -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.8 -name etcd0 -advertise-client-urls http://10.10.10.22:2379,http://10.10.10.22:4001 -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 -initial-advertise-peer-urls http://10.10.10.22:2380 -listen-peer-urls http://0.0.0.0:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster etcd0=http://10.10.10.22:2380,etcd1=http://10.10.10.23:2380,etcd2=http://10.10.10.44:2380 -initial-cluster-state new
2015/05/01 03:59:15 etcd: no data-dir provided, using default data-dir ./etcd0.etcd
2015/05/01 03:59:15 etcd: listening for peers on http://0.0.0.0:2380
2015/05/01 03:59:15 etcd: listening for client requests on http://0.0.0.0:2379
2015/05/01 03:59:15 etcd: listening for client requests on http://0.0.0.0:4001
2015/05/01 03:59:15 etcdserver: datadir is valid for the 2.0.1 format
2015/05/01 03:59:16 etcd: stopping listening for client requests on http://0.0.0.0:4001
2015/05/01 03:59:16 etcd: stopping listening for client requests on http://0.0.0.0:2379
2015/05/01 03:59:16 etcd: stopping listening for peers on http://0.0.0.0:2380
2015/05/01 03:59:16 etcd: member 9b3523b532ddb797 has already been bootstrapped
What does "has already been bootstrapped mean?". I've wrapped this up in a service file etcd.service and placed it in /etc/systemd/system. It overrides the original etcd.
[Unit]
Description=etcd container
After=docker.service
[Service]
Restart=always
RestartSec=10s
LimitNOFILE=40000
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill etcd
ExecStartPre=-/usr/bin/docker rm etcd
#ExecStartPre=/usr/bin/docker pull quay.io/coreos/etcd:v2.0.8
ExecStart=/usr/bin/docker run \
-v /usr/share/ca-certificates/:/etc/ssl/certs \
-p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.8 \
-name etcd0 \
-advertise-client-urls http://${COREOS_PRIVATE_IPV4}:2379,http://${COREOS_PRIVATE_IPV4}:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
-initial-advertise-peer-urls http://${COREOS_PRIVATE_IPV4}:2380 \
-listen-peer-urls http://0.0.0.0:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://10.10.10.22:2380,etcd1=http://10.10.10.23:2380,etcd2=http://10.10.10.44:2380 \
-initial-cluster-state new
ExecStop=/usr/bin/docker stop etcd
[Install]
WantedBy=multi-user.target
And I'm using the following boot script.. Although I should point out that although COREOS_PRIVATE_IPV4 is being set in /etc/environment. The etcd service isn't always starting on all nodes... that's another issue and I'm wondering why.
#!/bin/sh
workdir=$(mktemp --directory)
trap "rm --force --recursive ${workdir}" SIGINT SIGTERM EXIT
cat > "${workdir}/cloud-config.yml" <<EOF
#cloud-config
hostname: service2-2
ssh_authorized_keys:
- ssh-rsa [rsa key]
coreos:
units:
- name: etcd.service
command: start
- name: fleet.service
command: start
EOF
get_ipv4() {
IFACE="${1}"
PREFIX="${2}"
local ip
while [ -z "${ip}" ]; do
ip=$(ip -4 -o addr show dev "${IFACE}" scope global | gawk '{split ($4, out, "/"); print out[1]}' | grep "^$PREFIX")
sleep .1
done
echo "${ip}"
}
export COREOS_PRIVATE_IPV4=$(get_ipv4 ib0 "10")
export COREOS_PUBLIC_IPV4=$(get_ipv4 ib0 "72")
coreos-cloudinit --from-file="${workdir}/cloud-config.yml"
@hookenz It means exactly has already been bootstrapped. I think you lost the data-dir of that member somehow. You cannot restart a member that was in the cluster without the previous data-dir.
Ahh, I thought it'd just re-kick off the process. So do need to bind-mount to another directory on the host?
@hookenz
If you lost your data dir, you have to
You cannot simply use the previous configuration to restart the etcd process without removing the previous member.
Remember that if you lose a data-dir, then you lose that member.
For your information, If you lost your data dir and datas are not too large, you can also:
remove the old member via dynamic configuration API
add the new member via dynamic configuration API
remove all residuals datas : `rm -rf /var/lib/etcd2/*`
start etcd with initial-cluster=existing
The new member will be added to the cluster and download a set of datas.
We do so to re-provision our etcd cluster member.
@jmcollin78 Great! It's worked for me.
@jmcollin78 Did you mean start etcd with initial-cluster-state=existing ?
@tariq1890 sorry this thread is too old. I can't remember
Thanks for your response :). The documents suggest that it's initial-cluster-state=existing. Could you make that edit to your comment? I can delete this message after that.
Most helpful comment
For your information, If you lost your data dir and datas are not too large, you can also:
The new member will be added to the cluster and download a set of datas.
We do so to re-provision our etcd cluster member.