I am installing flannel on a coreos machine on aws using a cloud-config file.
I provide the etcd_endpoints flag and point it to my etcd cluster. Yet the journalctl shows flannel trying to retrieve from http://localhost:4001 and flannel install fails.
The documentation regarding command-line flags (including the built-in help) is incorrect. I got this to work:
flanneld -ip-masq -etcd-endpoints https://127.0.0.1:2379 -etcd-cafile /etc/etcd/ca.crt -etcd-certfile /etc/etcd/host.crt -etcd-keyfile /etc/etcd/host.key
Note the _lack_ of equal signs. A lot of CoreOS' tools have similar problems. A lot of it has to do with the fact that people are used to GNU getopt-style command line parsing where "--opt=value" is equivalent to "--opt value". The Go "flag" library claims to do the same thing but clearly doesn't.
Problem is that flanneld.service doesn't use /run/flannel/options.env file for env-vars before it starts flanneld container.
/opt/bin/flanneld --ip-masq=true (missing additional flags here)
@tanmaybinaykiya @jcollie we tested ./flanneld -etcd-endpoints="http://192.168.198.130:2379" with go1.4.2 and it worked fine. Which version of go are you guys using?
@mnemotiv is this the flanneld.service that is shipped with coreos?
@MohdAhmad yes, with latest alpha 695
this problem exists if I deploy server with cloud config:
...
flannel:
etcd-endpoints: https://127.0.0.1:2379
etcd-cafile: /var/run/...../ca.crt
etcd-certfile: /var/run/...../master.crt
etcd-keyfile: /var/run/...../master.key
...
flanneld.service from journalctl
Jun 10 13:44:01 k8s-master-001 systemd[1]: Starting Network fabric for containers...
Jun 10 13:44:02 k8s-master-001 etcdctl[644]: Error: cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379
Jun 10 13:44:02 k8s-master-001 systemd[1]: flanneld.service: control process exited, code=exited status=2
Jun 10 13:44:02 k8s-master-001 systemd[1]: Failed to start Network fabric for containers.
Jun 10 13:44:02 k8s-master-001 systemd[1]: Unit flanneld.service entered failed state.
Jun 10 13:44:02 k8s-master-001 systemd[1]: flanneld.service failed.
Jun 10 13:44:07 k8s-master-001 systemd[1]: flanneld.service holdoff time over, scheduling restart.
systemctl status flanneld
● flanneld.service - Network fabric for containers
Loaded: loaded (/usr/lib64/systemd/system/flanneld.service; static; vendor preset: disabled)
Drop-In: /etc/systemd/system/flanneld.service.d
└─50-certs-config.conf, 51-network-config.conf
Active: activating (auto-restart) (Result: exit-code) since Thu 2015-06-11 06:30:30 UTC; 1s ago
Docs: https://github.com/coreos/flannel
Process: 31186 ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config {"Network":"10.244.0.0/16", "Backend": {"Type": "vxlan"}} (code=exited, status=4)
Process: 31182 ExecStartPre=/usr/bin/touch /run/flannel/options.env (code=exited, status=0/SUCCESS)
Process: 31180 ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR} (code=exited, status=0/SUCCESS)
Process: 31176 ExecStartPre=/usr/bin/mkdir -p /run/flannel (code=exited, status=0/SUCCESS)
Process: 31175 ExecStartPre=/sbin/modprobe ip_tables (code=exited, status=0/SUCCESS)
Jun 11 06:30:30 k8s-master-001 systemd[1]: Unit flanneld.service entered failed state.
Jun 11 06:30:30 k8s-master-001 systemd[1]: flanneld.service failed.
p.s. none of flanneld drop-ins in cloud-config doesn't use any env-vars at all which could overwrite any of flanneld-native env-vars - flanneld generates correct env-vars to its env-file, but it doesn't use it for argument flags for its container.
I'm starting to think that's more etcdctl problem than flanneld...
etcd2 works fine, fleet - fine, flannel with etcdctl - not
@mnemotiv Sorry to have dropped the ball on this. cloud-config values get put into /run/flannel/options.env
file which get passed to docker via --env-file
. flannel knows to use FLANNELD_*
env-vars together with cmd line args (basically merges them). Any env-vars set in drop-ins unfortunately will not propagate into the flannel daemon. They can only be used to influence the flanneld.service
file itself (e.g. FLANNEL_VER
and ETCD_SSL_DIR
).
@eyakubovich ok cool. as I saw on coreos documentation - flannel is being packed as docker container to save disk space for OS "for those who won't use flannel" - who won't use flannel? :)
@mnemotiv it's partially to save space (only pay for what your use) but also to demonstrate that you can run infrastructure pieces in a container (although currently with Docker, it does require some acrobatics). So if somebody wants to run an overlay other than flannel, they can swap it out.
@eyakubovich so how do I tell flannel that my etcd endpoint is on https://localhost:2379
?
@vaijab
FLANNELD_ETCD_ENDPOINTS=http://127.0.0.1:2379
just source this env-var as file or just as var in flanneld.service / docker run...
https://github.com/nodetemple/nodetemple-deprecated/blob/master/daemon/environment-setup
https://github.com/nodetemple/nodetemple-deprecated/blob/master/fleet/flanneld.service
@mnemotiv I am not sure that actually works. You probably got an impression that it works, because flannel defaults to http://127.0.0.1:2379
anyway.
core@ip-10-50-0-221 ~ $ systemctl cat flanneld
# /usr/lib64/systemd/system/flanneld.service
[Unit]
Description=Network fabric for containers
Documentation=https://github.com/coreos/flannel
Requires=early-docker.service
After=etcd.service etcd2.service early-docker.service
Before=early-docker.target
[Service]
Type=notify
Restart=always
RestartSec=5
Environment="TMPDIR=/var/tmp/"
Environment="DOCKER_HOST=unix:///var/run/early-docker.sock"
Environment="FLANNEL_VER=0.5.1"
Environment="ETCD_SSL_DIR=/etc/ssl/etcd"
LimitNOFILE=40000
LimitNPROC=1048576
ExecStartPre=/sbin/modprobe ip_tables
ExecStartPre=/usr/bin/mkdir -p /run/flannel
ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR}
ExecStartPre=/usr/bin/touch /run/flannel/options.env
ExecStart=/usr/libexec/sdnotify-proxy /run/flannel/sd.sock \
/usr/bin/docker run --net=host --privileged=true --rm \
--volume=/run/flannel:/run/flannel \
--env=NOTIFY_SOCKET=/run/flannel/sd.sock \
--env=AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
--env=AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
--env-file=/run/flannel/options.env \
--volume=/usr/share/ca-certificates:/etc/ssl/certs:ro \
--volume=${ETCD_SSL_DIR}:/etc/ssl/etcd:ro \
quay.io/coreos/flannel:${FLANNEL_VER} /opt/bin/flanneld --ip-masq=true
# Update docker options
ExecStartPost=/usr/bin/docker run --net=host --rm -v /run:/run \
quay.io/coreos/flannel:${FLANNEL_VER} \
/opt/bin/mk-docker-opts.sh -d /run/flannel_docker_opts.env -i
# /etc/systemd/system/flanneld.service.d/10-configuration.conf
[Service]
Environment="FLANNELD_INTERFACE=10.50.0.221"
Environment="FLANNELD_ETCD_ENDPOINTS=https://127.0.0.1:2379"
# /etc/systemd/system/flanneld.service.d/50-network-config.conf
[Service]
ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{"Network":"10.10.0.0/16","Backend":{"Type":"vxlan"}}'
core@ip-10-50-0-221 ~ $ journalctl -fn -u flanneld.service
-- Logs begin at Thu 2015-07-30 13:53:22 UTC. --
Jul 30 13:54:36 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Unit entered failed state.
Jul 30 13:54:36 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Failed with result 'exit-code'.
Jul 30 13:54:36 ip-10-50-0-221.eu-west-1.compute.internal etcdctl[3843]: Error: cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379
Jul 30 13:54:41 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Service hold-off time over, scheduling restart.
Jul 30 13:54:41 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: Starting Network fabric for containers...
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Control process exited, code=exited status=2
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: Failed to start Network fabric for containers.
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Unit entered failed state.
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal systemd[1]: flanneld.service: Failed with result 'exit-code'.
Jul 30 13:54:42 ip-10-50-0-221.eu-west-1.compute.internal etcdctl[3887]: Error: cannot sync with the cluster using endpoints http://127.0.0.1:4001, http://127.0.0.1:2379
My files doesn't include env-vars for docker container of flannel - so you should add env-vars to container runtime. You can also run flannel not in a container - then it will fetch env-vars directly.
@mnemotiv gotcha! Thanks, I have it working now. Flannel config and startup on CoreOS seems convoluted and rather complicated. But flannel is still awesome anyway.
@vaijab If you're using CoreOS, specifying etcd-endpoints
is easy with cloud-config. See flannel section under https://coreos.com/os/docs/latest/cloud-config.html#coreos
But it's like this:
#cloud-config
coreos:
flannel:
etcd_endpoints: http://1.2.3.4:2379
@eyakubovich that's right, but I also had to drop an additional drop-in to make this all work, maybe documentation needs to be updated or flanneld.service
to add EnvironmentFile=/run/flannel/options.env
- name: 10-env-config.conf
content: |
[Service]
EnvironmentFile=/run/flannel/options.env
I am not quite sure if this is applicable but I had the same issue working on a CentOS Atomic system. What ultimately resolved it was changing the following parameters in /etc/etcd/etcd.conf to the following:
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
After restarting the etcd service, this seemed to fix my problem. Hopefully this is helpful to someone.
I believe this is fixed now and can be closed.
Most helpful comment
@vaijab If you're using CoreOS, specifying
etcd-endpoints
is easy with cloud-config. See flannel section under https://coreos.com/os/docs/latest/cloud-config.html#coreosBut it's like this: