Etcd: "member 9b3523b532ddb797 has already been bootstrapped" error, then exits.

Created on 1 May 2015  路  8Comments  路  Source: etcd-io/etcd

Following this guide https://github.com/coreos/etcd/blob/master/Documentation/docker_guide.md
I created a 3 node cluster.

Initially it was working fine. However, upon restarting one of the nodes I get the following error now.

/usr/bin/docker run -ti -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.8 -name etcd0 -advertise-client-urls http://10.10.10.22:2379,http://10.10.10.22:4001 -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 -initial-advertise-peer-urls http://10.10.10.22:2380 -listen-peer-urls http://0.0.0.0:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster etcd0=http://10.10.10.22:2380,etcd1=http://10.10.10.23:2380,etcd2=http://10.10.10.44:2380 -initial-cluster-state new
2015/05/01 03:59:15 etcd: no data-dir provided, using default data-dir ./etcd0.etcd
2015/05/01 03:59:15 etcd: listening for peers on http://0.0.0.0:2380
2015/05/01 03:59:15 etcd: listening for client requests on http://0.0.0.0:2379
2015/05/01 03:59:15 etcd: listening for client requests on http://0.0.0.0:4001
2015/05/01 03:59:15 etcdserver: datadir is valid for the 2.0.1 format
2015/05/01 03:59:16 etcd: stopping listening for client requests on http://0.0.0.0:4001
2015/05/01 03:59:16 etcd: stopping listening for client requests on http://0.0.0.0:2379
2015/05/01 03:59:16 etcd: stopping listening for peers on http://0.0.0.0:2380
2015/05/01 03:59:16 etcd: member 9b3523b532ddb797 has already been bootstrapped

What does "has already been bootstrapped mean?". I've wrapped this up in a service file etcd.service and placed it in /etc/systemd/system. It overrides the original etcd.

[Unit]
Description=etcd container
After=docker.service

[Service]
Restart=always
RestartSec=10s
LimitNOFILE=40000
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill etcd
ExecStartPre=-/usr/bin/docker rm etcd
#ExecStartPre=/usr/bin/docker pull quay.io/coreos/etcd:v2.0.8
ExecStart=/usr/bin/docker run \
  -v /usr/share/ca-certificates/:/etc/ssl/certs \
  -p 4001:4001 -p 2380:2380 -p 2379:2379 --name etcd quay.io/coreos/etcd:v2.0.8 \
  -name etcd0 \
  -advertise-client-urls http://${COREOS_PRIVATE_IPV4}:2379,http://${COREOS_PRIVATE_IPV4}:4001 \
  -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
  -initial-advertise-peer-urls http://${COREOS_PRIVATE_IPV4}:2380 \
  -listen-peer-urls http://0.0.0.0:2380 \
  -initial-cluster-token etcd-cluster-1 \
  -initial-cluster etcd0=http://10.10.10.22:2380,etcd1=http://10.10.10.23:2380,etcd2=http://10.10.10.44:2380 \
  -initial-cluster-state new
ExecStop=/usr/bin/docker stop etcd

[Install]
WantedBy=multi-user.target

And I'm using the following boot script.. Although I should point out that although COREOS_PRIVATE_IPV4 is being set in /etc/environment. The etcd service isn't always starting on all nodes... that's another issue and I'm wondering why.

#!/bin/sh

workdir=$(mktemp --directory)
trap "rm --force --recursive ${workdir}" SIGINT SIGTERM EXIT

cat > "${workdir}/cloud-config.yml" <<EOF
#cloud-config

hostname: service2-2
ssh_authorized_keys:
  - ssh-rsa     [rsa key]
coreos:
  units:
    - name: etcd.service
      command: start
    - name: fleet.service
      command: start
EOF

get_ipv4() {
    IFACE="${1}"
    PREFIX="${2}"
    local ip
    while [ -z "${ip}" ]; do
        ip=$(ip -4 -o addr show dev "${IFACE}" scope global | gawk '{split ($4, out, "/"); print out[1]}' | grep "^$PREFIX")
        sleep .1
    done

    echo "${ip}"
}

export COREOS_PRIVATE_IPV4=$(get_ipv4 ib0 "10")
export COREOS_PUBLIC_IPV4=$(get_ipv4 ib0 "72")

coreos-cloudinit --from-file="${workdir}/cloud-config.yml"

Most helpful comment

For your information, If you lost your data dir and datas are not too large, you can also:

remove the old member via dynamic configuration API
add the new member via dynamic configuration API
remove all residuals datas : `rm -rf /var/lib/etcd2/*`
start etcd with initial-cluster=existing

The new member will be added to the cluster and download a set of datas.

We do so to re-provision our etcd cluster member.

All 8 comments

@hookenz It means exactly has already been bootstrapped. I think you lost the data-dir of that member somehow. You cannot restart a member that was in the cluster without the previous data-dir.

Ahh, I thought it'd just re-kick off the process. So do need to bind-mount to another directory on the host?

@hookenz

If you lost your data dir, you have to

  1. remove that member via dynamic configuration API
  2. start etcd with previous configuration

You cannot simply use the previous configuration to restart the etcd process without removing the previous member.

Remember that if you lose a data-dir, then you lose that member.

For your information, If you lost your data dir and datas are not too large, you can also:

remove the old member via dynamic configuration API
add the new member via dynamic configuration API
remove all residuals datas : `rm -rf /var/lib/etcd2/*`
start etcd with initial-cluster=existing

The new member will be added to the cluster and download a set of datas.

We do so to re-provision our etcd cluster member.

@jmcollin78 Great! It's worked for me.

@jmcollin78 Did you mean start etcd with initial-cluster-state=existing ?

@tariq1890 sorry this thread is too old. I can't remember

Thanks for your response :). The documents suggest that it's initial-cluster-state=existing. Could you make that edit to your comment? I can delete this message after that.

Was this page helpful?
0 / 5 - 0 ratings