Flannel: Flannel not picking up etcd_endpoints flag

Created on 27 Apr 2015 · 20Comments · Source: coreos/flannel

I am installing flannel on a coreos machine on aws using a cloud-config file.
I provide the etcd_endpoints flag and point it to my etcd cluster. Yet the journalctl shows flannel trying to retrieve from http://localhost:4001 and flannel install fails.

kinbug

Source

tanmaybinaykiya

Most helpful comment

@vaijab If you're using CoreOS, specifying etcd-endpoints is easy with cloud-config. See flannel section under https://coreos.com/os/docs/latest/cloud-config.html#coreos

But it's like this:

#cloud-config

coreos:
  flannel:
      etcd_endpoints: http://1.2.3.4:2379

eyakubovich on 31 Jul 2015

👍2

All 20 comments

The documentation regarding command-line flags (including the built-in help) is incorrect. I got this to work:

flanneld -ip-masq -etcd-endpoints https://127.0.0.1:2379 -etcd-cafile /etc/etcd/ca.crt -etcd-certfile /etc/etcd/host.crt -etcd-keyfile /etc/etcd/host.key

Note the _lack_ of equal signs. A lot of CoreOS' tools have similar problems. A lot of it has to do with the fact that people are used to GNU getopt-style command line parsing where "--opt=value" is equivalent to "--opt value". The Go "flag" library claims to do the same thing but clearly doesn't.

jcollie on 27 Apr 2015

Problem is that flanneld.service doesn't use /run/flannel/options.env file for env-vars before it starts flanneld container.

/opt/bin/flanneld --ip-masq=true (missing additional flags here)

mnemotiv on 10 Jun 2015

@tanmaybinaykiya @jcollie we tested ./flanneld -etcd-endpoints="http://192.168.198.130:2379" with go1.4.2 and it worked fine. Which version of go are you guys using?

MohdAhmad on 10 Jun 2015

@mnemotiv is this the flanneld.service that is shipped with coreos?

MohdAhmad on 10 Jun 2015

@MohdAhmad yes, with latest alpha 695

mnemotiv on 11 Jun 2015

this problem exists if I deploy server with cloud config:

...
  flannel:
    etcd-endpoints: https://127.0.0.1:2379
    etcd-cafile: /var/run/...../ca.crt
    etcd-certfile: /var/run/...../master.crt
    etcd-keyfile: /var/run/...../master.key
...

flanneld.service from journalctl

Jun 10 13:44:01 k8s-master-001 systemd[1]: Starting Network fabric for containers...
Jun 10 13:44:02 k8s-master-001 etcdctl[644]: Error:  cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379
Jun 10 13:44:02 k8s-master-001 systemd[1]: flanneld.service: control process exited, code=exited status=2
Jun 10 13:44:02 k8s-master-001 systemd[1]: Failed to start Network fabric for containers.
Jun 10 13:44:02 k8s-master-001 systemd[1]: Unit flanneld.service entered failed state.
Jun 10 13:44:02 k8s-master-001 systemd[1]: flanneld.service failed.
Jun 10 13:44:07 k8s-master-001 systemd[1]: flanneld.service holdoff time over, scheduling restart.

systemctl status flanneld

● flanneld.service - Network fabric for containers
   Loaded: loaded (/usr/lib64/systemd/system/flanneld.service; static; vendor preset: disabled)
  Drop-In: /etc/systemd/system/flanneld.service.d
           └─50-certs-config.conf, 51-network-config.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2015-06-11 06:30:30 UTC; 1s ago
     Docs: https://github.com/coreos/flannel
  Process: 31186 ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config {"Network":"10.244.0.0/16", "Backend": {"Type": "vxlan"}} (code=exited, status=4)
  Process: 31182 ExecStartPre=/usr/bin/touch /run/flannel/options.env (code=exited, status=0/SUCCESS)
  Process: 31180 ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR} (code=exited, status=0/SUCCESS)
  Process: 31176 ExecStartPre=/usr/bin/mkdir -p /run/flannel (code=exited, status=0/SUCCESS)
  Process: 31175 ExecStartPre=/sbin/modprobe ip_tables (code=exited, status=0/SUCCESS)

Jun 11 06:30:30 k8s-master-001 systemd[1]: Unit flanneld.service entered failed state.
Jun 11 06:30:30 k8s-master-001 systemd[1]: flanneld.service failed.

p.s. none of flanneld drop-ins in cloud-config doesn't use any env-vars at all which could overwrite any of flanneld-native env-vars - flanneld generates correct env-vars to its env-file, but it doesn't use it for argument flags for its container.

mnemotiv on 11 Jun 2015

I'm starting to think that's more etcdctl problem than flanneld...

mnemotiv on 11 Jun 2015

etcd2 works fine, fleet - fine, flannel with etcdctl - not

mnemotiv on 11 Jun 2015

@mnemotiv Sorry to have dropped the ball on this. cloud-config values get put into /run/flannel/options.env file which get passed to docker via --env-file. flannel knows to use FLANNELD_* env-vars together with cmd line args (basically merges them). Any env-vars set in drop-ins unfortunately will not propagate into the flannel daemon. They can only be used to influence the flanneld.service file itself (e.g. FLANNEL_VER and ETCD_SSL_DIR).

eyakubovich on 26 Jun 2015

@eyakubovich ok cool. as I saw on coreos documentation - flannel is being packed as docker container to save disk space for OS "for those who won't use flannel" - who won't use flannel? :)

mnemotiv on 29 Jun 2015

@mnemotiv it's partially to save space (only pay for what your use) but also to demonstrate that you can run infrastructure pieces in a container (although currently with Docker, it does require some acrobatics). So if somebody wants to run an overlay other than flannel, they can swap it out.

eyakubovich on 29 Jun 2015

@eyakubovich so how do I tell flannel that my etcd endpoint is on https://localhost:2379 ?

vaijab on 30 Jul 2015

@vaijab

FLANNELD_ETCD_ENDPOINTS=http://127.0.0.1:2379

just source this env-var as file or just as var in flanneld.service / docker run...

https://github.com/nodetemple/nodetemple-deprecated/blob/master/daemon/environment-setup
https://github.com/nodetemple/nodetemple-deprecated/blob/master/fleet/flanneld.service

mnemotiv on 30 Jul 2015

@mnemotiv I am not sure that actually works. You probably got an impression that it works, because flannel defaults to http://127.0.0.1:2379 anyway.

core@ip-10-50-0-221 ~ $ systemctl cat flanneld
# /usr/lib64/systemd/system/flanneld.service
[Unit]
Description=Network fabric for containers
Documentation=https://github.com/coreos/flannel
Requires=early-docker.service
After=etcd.service etcd2.service early-docker.service
Before=early-docker.target

[Service]
Type=notify
Restart=always
RestartSec=5
Environment="TMPDIR=/var/tmp/"
Environment="DOCKER_HOST=unix:///var/run/early-docker.sock"
Environment="FLANNEL_VER=0.5.1"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
LimitNOFILE=40000
LimitNPROC=1048576
ExecStartPre=/sbin/modprobe ip_tables
ExecStartPre=/usr/bin/mkdir -p /run/flannel
ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR}
ExecStartPre=/usr/bin/touch /run/flannel/options.env

ExecStart=/usr/libexec/sdnotify-proxy /run/flannel/sd.sock \
  /usr/bin/docker run --net=host --privileged=true --rm \
  --volume=/run/flannel:/run/flannel \
  --env=NOTIFY_SOCKET=/run/flannel/sd.sock \
  --env=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  --env=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  --env-file=/run/flannel/options.env \
  --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro \
  --volume=${ETCD_SSL_DIR}:/etc/ssl/etcd:ro \
  quay.io/coreos/flannel:${FLANNEL_VER} /opt/bin/flanneld --ip-masq=true

# Update docker options
ExecStartPost=/usr/bin/docker run --net=host --rm -v /run:/run \
  quay.io/coreos/flannel:${FLANNEL_VER} \
  /opt/bin/mk-docker-opts.sh -d /run/flannel_docker_opts.env -i

# /etc/systemd/system/flanneld.service.d/10-configuration.conf
[Service]
Environment="FLANNELD_INTERFACE=10.50.0.221"
Environment="FLANNELD_ETCD_ENDPOINTS=https://127.0.0.1:2379"
# /etc/systemd/system/flanneld.service.d/50-network-config.conf
[Service]
ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.10.0.0/16","Backend":{"Type":"vxlan"}}'


core@ip-10-50-0-221 ~ $ journalctl -fn -u flanneld.service
-- Logs begin at Thu 2015-07-30 13:53:22 UTC. --
Jul 30 13:54:36 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Unit entered failed state.
Jul 30 13:54:36 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Failed with result 'exit-code'.
Jul 30 13:54:36 ip-10-50-0-221.eu-west-1.compute.internal etcdctl[3843]: Error:  cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379
Jul 30 13:54:41 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Service hold-off time over, scheduling restart.
Jul 30 13:54:41 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: Starting Network fabric for containers...
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Control process exited, code=exited status=2
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: Failed to start Network fabric for containers.
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Unit entered failed state.
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Failed with result 'exit-code'.
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal etcdctl[3887]: Error:  cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379

vaijab on 30 Jul 2015

My files doesn't include env-vars for docker container of flannel - so you should add env-vars to container runtime. You can also run flannel not in a container - then it will fetch env-vars directly.

mnemotiv on 30 Jul 2015

@mnemotiv gotcha! Thanks, I have it working now. Flannel config and startup on CoreOS seems convoluted and rather complicated. But flannel is still awesome anyway.

vaijab on 30 Jul 2015

@vaijab If you're using CoreOS, specifying etcd-endpoints is easy with cloud-config. See flannel section under https://coreos.com/os/docs/latest/cloud-config.html#coreos

But it's like this:

#cloud-config

coreos:
  flannel:
      etcd_endpoints: http://1.2.3.4:2379

eyakubovich on 31 Jul 2015

👍2

@eyakubovich that's right, but I also had to drop an additional drop-in to make this all work, maybe documentation needs to be updated or flanneld.service to add EnvironmentFile=/run/flannel/options.env

- name: 10-env-config.conf
  content: |
    [Service]
    EnvironmentFile=/run/flannel/options.env

vaijab on 31 Jul 2015

I am not quite sure if this is applicable but I had the same issue working on a CentOS Atomic system. What ultimately resolved it was changing the following parameters in /etc/etcd/etcd.conf to the following:

ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"

After restarting the etcd service, this seemed to fix my problem. Hopefully this is helpful to someone.