Kubespray: self deployment / pull mode

Created on 30 Jun 2016 · 29Comments · Source: kubernetes-sigs/kubespray

Configure the local node.
There will probably be some caveats like the certs/tokens management

feature lifecyclrotten

Source

Smana

Most helpful comment

@v1k0d3n yes, good idea to use the etcd cluster to store the secrets and configuration shared by nodes/masters

ant31 on 5 Jul 2016

👍2

All 29 comments

https://github.com/kubespray/kargo/issues/321

Smana on 30 Jun 2016

For the pull mode what do you think about:

1 - Create the etcd cluster in a way that it can be scaled, we need to discuss about what would be the proper way : static or discovery.
How can we scale up in the case of an autoscaling.

2 - Use the etcd cluster to store the inventory and use a dynamic inventory.
example (https://gist.github.com/justenwalker/09698cfd6c3a6a49075b)

3 - A big issue is the secrets management, where to store the certs/tokens?. how to sync between the nodes?, do we have to create a cert by node? ...

4 - Use ansible pull-mode

Smana on 2 Jul 2016

For public cloud consumers, etcd discovery is probably optimal since it almost never results in a broken cluster. For anyone deploying in-house, they might be reluctant to use discovery. An initial cluster array is adequate already.

Dynamic inventory and deploying etcd via ansible creates a chicken and egg problem. You can't use an inventory from etcd until etcd is up. Also, you need to make a way to populate this etcd. I would vote against adding complexity just for the sake of finding an innovative way to consume etcd.

Secrets management is a topic I've dealt with in previous projects. We currently have 1 master host which knows all the information. If you want to move to client-pull mode, all clients need to know where host(s) are located that know the secrets. Secret file storage should be replicated and transmitted using an encrypted method (ansible's SSH/rsync transport is totally fine).

I think you should add a new role for secrets and the first alphabetical node actually generates the secrets, while the others take a fully copy. All other nodes only take the secrets as-needed. It's important to ensure that scale-up/scale-down scenarios are covered.

mattymo on 2 Jul 2016

👍1

Thank you mattymo for your answer.

We can let the user choose the way he wants to deploy the etcd cluster.
The pull mode would just be an option.

I understand that etcd would become a strong dependency but when for instance a new node is added he needs to know about the cluster topology (where is the api, the etcd, ...).
If you think about another option, we can evaluate it too.

Regarding the secrets,

all clients need to know where host(s) are located that know the secrets.

This is the reason why we need an inventory
What you describe is exactly current kargo's behaviour and it works just fine.
We can probably keep it if we have an inventory somewhere (e.g. etcd).

Smana on 2 Jul 2016

i'm probably missing something, but why not consider DNS discovery with SRV records vs. etcd discovery?

v1k0d3n on 2 Jul 2016

This is one of the discovery option that offers etcd and i'm actually considering it @v1k0d3n

Smana on 2 Jul 2016

@rustyrobot , i need your input here too :)

Smana on 2 Jul 2016

it's always been the easiest for me when building and tearing down etcd clusters for kubernetes during testing (granted, i've been pulled away from doing this in recent months so some of the syntax may have changed with etcd2/3).

i just created srv records on my dns server:

; Kubernetes ETCD Server Cluster Information
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd01.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd02.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd03.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd04.domain.com.
_etcd-server._tcp.domain.com.   300     IN      SRV 0 0 2380    kubetcd05.domain.com.

; Kubernetes ETCD Client Cluster Information
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd01.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd02.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd03.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd04.domain.com.
_etcd-client._tcp.domain.com.   300     IN      SRV 0 0 2379    kubetcd05.domain.com.

; 10.1.1.0/24 - A Records: Kubernetes/Etcd Members
kubetcd01              IN      A       10.1.1.21
kubetcd02              IN      A       10.1.1.22
kubetcd03              IN      A       10.1.1.23
kubetcd04              IN      A       10.1.1.24
kubetcd05              IN      A       10.1.1.25

and then configure the etcd cluster for dns discovery (example for kubetcd01)...

# [member]
ETCD_NAME=kubetcd01
ETCD_DATA_DIR="/var/lib/etcd/default.etcd" 
ETCD_SNAPSHOT_COUNTER="1000" 
ETCD_ELECTION_TIMEOUT="1000" 
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://127.0.0.1:4001,http://kubetcd01.domain.com:2379,http://kubetcd01.domain.com:4001" 
ETCD_LISTEN_PEER_URLS="http://kubetcd01.domain.com:2380" 

#[cluster]
ETCD_DISCOVERY_SRV="domain.com" 
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://kubetcd01.domain.com:2380" 
ETCD_INITIAL_CLUSTER_TOKEN="domain-etcd" 
ETCD_INITIAL_CLUSTER_STATE="new" 
ETCD_ADVERTISE_CLIENT_URLS="http://kubetcd01.domain.com:2379,http://kubetcd02.domain.com:2379,http://kubetcd03.domain.com:2379,http://kubetcd04.domain.com:2379,http://kubetcd05.domain.com:2379"

v1k0d3n on 2 Jul 2016

I would not use ansible pull if possible. Also about the all in one image
there are 2 images:
1 with de deployment scripts. 1 with all tools

I ll detail more later
Le 2 juil. 2016 14:50, "Brandon B. Jozsa" [email protected] a
écrit :

it's always been the easiest for me, and i bring up and tear down etcd
clusters for kubernetes all the time (granted, i've been pulled away from
doing this in recent months so some of the syntax may have changed with
etcd2/3).

i just created srv records on my dns server:

; Kubernetes ETCD Server Cluster Information
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd01.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd02.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd03.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd04.domain.com.
_etcd-server._tcp.domain.com. 300 IN SRV 0 0 2380 kubetcd05.domain.com.

; Kubernetes ETCD Client Cluster Information
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd01.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd02.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd03.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd04.domain.com.
_etcd-client._tcp.domain.com. 300 IN SRV 0 0 2379 kubetcd05.domain.com.

and then configure the etcd cluster for dns discovery (example for
kubetcd01)...

[member]

ETCD_NAME=kubetcd01
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_SNAPSHOT_COUNTER="1000"
ETCD_ELECTION_TIMEOUT="1000"
ETCD_LISTEN_CLIENT_URLS="http://127.0.0.1:2379,http://127.0.0.1:4001,http://kubetcd01.domain.com:2379,http://kubetcd01.domain.com:4001"
ETCD_LISTEN_PEER_URLS="http://kubetcd01.domain.com:2380"

[cluster]

ETCD_DISCOVERY_SRV="jinkit.com"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://kubetcd01.domain.com:2380"
ETCD_INITIAL_CLUSTER_TOKEN="domain-etcd"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_ADVERTISE_CLIENT_URLS="http://kubetcd01.domain.com:2379,http://kubetcd02.domain.com:2379,http://kubetcd03.domain.com:2379,http://kubetcd04.domain.com:2379,http://kubetcd05.domain.com:2379"

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/kubespray/kargo/issues/320#issuecomment-230100265,
or mute the thread
https://github.com/notifications/unsubscribe/AA_vbBdsxWn2jp3XpNn5eMOMi3uDq03Mks5qRl6jgaJpZM4JB0LA
.

ant31 on 2 Jul 2016

@ant31 yes, please do :)

Smana on 2 Jul 2016

@v1k0d3n how would you delete or add members ?

Smana on 2 Jul 2016

i would let the users control that on the DNS side, and use proxy on the etcd members. destroy, and/or rebuild...add via dns. i mean, it's RAFT...so 3 or 5 members is ideal. how many members do you really want over that? my biggest stumbling block right now with kargo is I have this great srv/dns framework in place that i can't use to bring up etcd. :(

v1k0d3n on 2 Jul 2016

@Smana @v1k0d3n What I know about current state of etcd, is there is no other way to manage it except of having a static list of etcd members and synchronizing it with a etcd cluster by explicitly calling etcd member add node and etcd member remove node, also documentation explicitly states that discovery should be used only for cluster bootstrapping, after cluster is created, discovery becomes kind off useless. Also public discovery is not always an option, when we are talking about data centers with firewall in front (which may block some (all) traffic due to security reasons), in this case deploying own HA discovery system is another issue.

There is a good video on life cycle management of etcd from CoreOS fest which took place in Berlin. Basically the presenter had to reinvent new tool on top of etcd, in order to do proper cluster management, it's not a trivial task, and I would suggest to go with static list as the most simple and straight forward solution, until something like that is supported by etcd natively.

yoojinl on 4 Jul 2016

If we use ansible-pull it will add ansible dependency on every host instead of one (the deployment host). It's the opposite of the target (lower requirements / host modifications)
We should have
- 1 image with the deployment script: kargo-deploy
- 1 image with with every binaries/tools (all-in-one)

-> the all-in one is optional and that's an another subject

The only requirement on host would be docker:
The base idea is to run:

docker run -e options=... -v /:/rootfs/ --rm kargo-deploy -- init

The image kargo-deploy, contains ansible + kargo scripts, we mount the host volume into the container, with privileged access we can configure it.

Each node should be able to configure himself,
We can give the list of other nodes, or some options (like the etcd-cluster addresses)
We assume that a container_engine is running (docker / rkt / ...)

To configure the container_engine, I propose to keep and use current playbooks. Maybe later we can switch to shell-script instead of ansible to remove the 'python' dependency from hosts.

ant31 on 4 Jul 2016

i think i'm losing track of what's being discussed in this thread, which is why I started #324 @Smana. giving users an option for how _they_ want to bootstrap etcd distances ourselves away from _which method_ is better and why. in my use case; i'm very specifically looking for a DNS SRV bootstrap discovery method for etcd, and i like the approach of "bring your own [xyz component]" to the project.

if users are tied to hard dependancies like ansible-pull, kpm built-in, etc, or if the project becomes less democratic and more opinionated about the etcd bootstrap method i feel like the target audience will become more narrow over time.

v1k0d3n on 4 Jul 2016

Discussion have deviated on etcd, maybe we should open a new issue to solve this question of etcd.

This issue is to how to switch kargo from push to pull.
The idea is to:

scale deployment to large cluster
allow auto-scale of nodes
remove host dependencies (python etc)

if users are tied to hard dependancies like ansible-pull, kpm built-in, etc,

That's the opposite of what we trying to solve with this issue.
We want that the only requirement on hosts is aa 'container engine' (docker/rkt)

This is why using ansible-pull is out! I don't want to install ansible on every hosts.
We have to find something else than ansible-pull (shell-script?) to deploy the kargo image

ant31 on 4 Jul 2016

@ant31 i agree with you for the docker image which deploys the node where it resides.
Actually that's why i've opened the issue https://github.com/kubespray/kargo/issues/321
That said how would you configure the local node without using the pull mode (inside the container of course) ?
The main issue is not to run inside a container, this is easy to do but how to configure the local node.

Smana on 5 Jul 2016

The pull is not mandatory in the case of a docker image (the ansible playbooks is inside the container) but we need to get the inventory from somewhere and automatically (when the node starts)

Smana on 5 Jul 2016

Maybe I'm missing something, but why not stand up an etcd cluster with discovery and store secrets in etcd?

v1k0d3n on 5 Jul 2016

@v1k0d3n yes, good idea to use the etcd cluster to store the secrets and configuration shared by nodes/masters

ant31 on 5 Jul 2016

👍2

~~Please refer to https://github.com/kubespray/kargome~~~~

Smana on 7 Jul 2016

private repo?

v1k0d3n on 7 Jul 2016

@v1k0d3n Sorry, i've changed my mind and i closed the repo, i'll try to do a PR instead.

Smana on 8 Jul 2016

hello! i'm curious if this is still in the works? i'd be interested in contributing :)

billyoung on 25 Mar 2017

@billyoung we thinking about something, but no real work is done as i know.

Atoms on 7 Jun 2018

/lifecycle stale

Atoms on 12 Sep 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 11 Apr 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 11 May 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 11 May 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

No package matching 'python-httplib2' found available, installed or updated

ionsquare · 4Comments

Feature Request - Configure private docker registries

sermilrod · 4Comments

How to force updating certs when updating supplementary_addresses_in_ssl_keys?

servo1x · 4Comments

iptables should be set to true when using calico

dylanzr · 3Comments

netchecker: Error occurred while checking the agents. Details: unknown (get agents.network-checker.ext netchecker-agent-xxxxx)

TurboTim · 4Comments