Kubespray: Umbrella: reduce kubespray options

Created on 11 Oct 2018 · 26Comments · Source: kubernetes-sigs/kubespray

Kubespray has today cumulated many options. On one hand, the composability and choices are core and high differentiator for Kubespray, on the other the effort to maintain them impact the general quality as it's not realistic to test so many flavors.

Rules should be transparent

[ ] Document, create a guideline of what it's accepted and conditions to retirements. We should not block additional plugins, operating system, container-engine, but we should have a higher exigence of maintenance by the vendor, or agree as a community to maintain it before merging.
[ ] Document list of application deployment maintained by the community and retires others
[ ] Document supported installations

Current options

OS:

[x] Coreos
[x] Centos
[x] RHEL
[ ] Atomic
[x] OpenSuse
[x] Debian
[x] Fedora
[x] ubuntu1604
[x] ubuntu1804

Network plugins:

[x] calico
[x] canal
[x] flannel
[x] weave
[x] cilium
[x] contiv

dns_mode:

[ ] dnsmasq_kubedns
[ ] kubedns
[x] coredns
[ ] coredns_dual
[x] manual
[x] none

resolvconf_mode:

[x] host_resolvconf
[x] docker_dns

binaries_deployment_type:

[x] host
[ ] docker
[ ] rkt

etcd_certs management:

[x] script
[ ] vault

etcd:

[x] single etcd
[x] etcd-event and etcd separated

container_engine:

[x] docker
[x] cri-o

lifecyclrotten

Source

ant31

👍1

Most helpful comment

My thoughs:

OS

All current OS are ok and in general low maintaince or contributed

[x] All current

Network plugins:

Most network plugins are ok but we should add owner file on each and start gathering some data: usage (survey?), issues opened...

[x] All current

dns_mode:

Too many choices with low impact for the user. I propose to be more opiniated here.

I don't see why we have coredns_dual, isn't already loadbalanced by the svc and with scaling coredns we have HA configuration ?
kubedns is deprecated is favor of coredns
dnsmasq ?
I would limit it to:
[x] coredns
[x] manual

resolvconf_mode:

I don't know which one, but I think we could pick a single choice

[x] host_resolvconf

binaries_deployment_type:

I'd keep only Host, and drop docker/rkt.
It's a bit a regression and more trouble in upgrades/os dependent but in the end there are not that many binaries: etcd, etcdctl, kubeadm, kubelet, kubectl and they are already extracted from a container.
With the increasing number of container_engine (docker, crio, containerd...), not all provide a great way to manage container outside kubernetes, and we'd have to handle to many different tools (dockercli, criotools...)
As an alternative, maybe a standalone kubelet (disconnected from the master, e.g for etcd nodes) with static pod could do the job (but that would be a bit experimental)?

[x] host
( and in anycases removing rkt)

etcd_certs management:

Too many issues with Vault ! looking at what other projects are doing, and the direction (etcdadm), keeping vault is too much maintainance and should be out of the scope.

[x] script

etcd

Both mode make sense, it mostly depend on the cluster size.

container_engine:

[ ] All

proposed depraction list:

deploy_mode:
- docker
- rkt
vault
dns_mode:
- kubedns
- kubedns_dnsmasq
- coredns_dual
- kubedns
- none
resolveconf
- docker_resolvconf

ant31 on 11 Oct 2018

👍3

All 26 comments

My thoughs:

OS

All current OS are ok and in general low maintaince or contributed

[x] All current

Network plugins:

Most network plugins are ok but we should add owner file on each and start gathering some data: usage (survey?), issues opened...

[x] All current

dns_mode:

Too many choices with low impact for the user. I propose to be more opiniated here.

I don't see why we have coredns_dual, isn't already loadbalanced by the svc and with scaling coredns we have HA configuration ?
kubedns is deprecated is favor of coredns
dnsmasq ?
I would limit it to:
[x] coredns
[x] manual

resolvconf_mode:

I don't know which one, but I think we could pick a single choice

[x] host_resolvconf

binaries_deployment_type:

[x] host
( and in anycases removing rkt)

etcd_certs management:

Too many issues with Vault ! looking at what other projects are doing, and the direction (etcdadm), keeping vault is too much maintainance and should be out of the scope.

[x] script

etcd

Both mode make sense, it mostly depend on the cluster size.

container_engine:

[ ] All

proposed depraction list:

deploy_mode:
- docker
- rkt
vault
dns_mode:
- kubedns
- kubedns_dnsmasq
- coredns_dual
- kubedns
- none
resolveconf
- docker_resolvconf

ant31 on 11 Oct 2018

👍3

I agree on all of these, except resolvconf_mode docker_dns. It's really the best option for quick testing environments because hosts don't completely need to resolve cluster DNS and there might be a complicated resolv.conf already in place that is hard to manage. That's why this mode was set up.

With regards to dropping docker based etcd/kubelet, I think we should strongly consider moving to package-based or binary downloads, instead of copying from the container to host.

mattymo on 23 Oct 2018

@riverzhang @bradbeam @rsmitty You folks are opinionated about deployment, so you should weigh in on this proposed deprecation list.

mattymo on 23 Oct 2018

👍1

Do we need another section for kubernetes certs management, or are we planning to transition exclusively to kubeadm's bootstrap token/join based cert provisioning (my personal preference)?

chadswen on 9 Nov 2018

I am still in favor of docker based installation of etcd. We use containerized installations of all applications/software in our datacenter to prevent headaches if/when we change operating systems.

woopstar on 29 Nov 2018

@woopstar the compiled binaries are not operating system related and don't provide any benefits compared to the containers
having containerized installation as the number of container engines grow will become impossible or will force to install say 'docker' on top of 'containerd / crio' etc. An alternative to this mess, would be to use static pod + standalone kubelet, but it had complexity for not much.

Also specifically for etcd, installation and management via kubespray would become more an 'option' than forced.

wget https://.........../etcd-v$version   > /opt/bin/etcd
/opt/bin/etcd --config /etc/etcd/etcd-config.yaml

docker pull etcd:$version 
docker run -d -v /etc/etcd:/etcd/etcd quay.io/etcd:$version  etcd --config /etc/etcd/etcd-config.yaml

is there a lot of difference? Plus both are managed by systemd so in the end start/stop/restart are exactly the same in both cases.

systemctl start etcd-node-1

ant31 on 29 Nov 2018

Sure. Running the binary on the host changes the kernel/namespace environment the binary executes from (CentOS, Ubuntu - kernel versions etc).

Using Docker containers, I can guarantee that etcd runs with the same libs as it uses "FROM " while it does not care if the underlying system is CentOS, Ubuntu, CoreOS etc.

woopstar on 29 Nov 2018

Then you move the problem to docker it self. It's much more a pain to maintain than a statically compiled binary.

Just search how many issues we had just about upgrading docker, ppa, rpm, incompatible version or whatever issues with managing docker.

ant31 on 29 Nov 2018

cc @kubernetes-sigs/kubespray-maintainers

ant31 on 29 Nov 2018

how many issues we had just about upgrading docker, ppa, rpm, incompatible version

I see what you mean. We just mitigate that problem totally by running CoreOS where all versions of all libs, kernel, applications etc. are locked to a specific version for a release. This gives us zero headaches.

woopstar on 29 Nov 2018

I think the current network plugins should be updated to include kube-router and multus

mirwan on 30 Nov 2018

I think the current network plugins should be updated to include kube-router and multus

Indeed.

Another thing is:

Do we want to support much backwards compability ? Right now we have 3 templates for kubeadm deployment. <1.10 (alpha1), 1.10-1.11 (alpha2) and >1.12(alpha3). I would suggest to only support 1 release back in each of our releases, and thus thereby deleting alpha1 deployments.

We use too much time to keep templates in sync and backwards comp.

CoreDNS should be "coredns_dual" according to this: https://github.com/kubernetes-sigs/kubespray/pull/2462#issuecomment-373638998 - We could rename it to "coredns" though.

woopstar on 30 Nov 2018

@woopstar yes , we need to delete alpha1 deployments when kubernetes 1.13 release.
then support kubeadm config beta1.

riverzhang on 30 Nov 2018

About resolvconf_mode, I think maybe we should keep docker_dns until some changes are made.
Maybe I hit a corner case but setting resolvconf_mode to host_resolvconf completely removes the original content of resolv.conf on my bare-metal non-dhcp environment.
So after a reset (or if the cluster.yml playbook fails at some point after the modification of /etc/resolv.conf), I just can't do anything until I restore /etc/resolv.conf manually...

mirwan on 10 Dec 2018

@mirwan yes we ll keep both.

ant31 on 11 Dec 2018

We should add prechecks that trigger when an option on a deprecation path is used. Those should print warnings in deprecation releases, and fail once the options are fully removed.

chadswen on 13 Dec 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 11 Apr 2019

Container engine wise, there's also containerd next to docker and cri-o (and arguably rkt, maybe also stuff like kubevirt or katacontainers, but that's more of a special case and not something that could be expected to run initially in most cases).

https://kubernetes.io/docs/setup/cri/ lists also frakti, but that one seems to die out slowly.

One basic "option" that's missing so far is to run kubespray behind a (corporate?) proxy server versus running with direct internet access. In many cases this will already work, but it doesn't seem to be tested systematically yet.

MarkusTeufelberger on 14 Apr 2019

@MarkusTeufelberger there's already proxy options
Yes, I've listed what we have 'now', ad and not the next things about katacontainer...
Also kubevirt isn't an engine for pod.

ant31 on 15 Apr 2019

Is there anyone opposing the rkt deprecation? It would be good to announced it 2.10 so we can remove it from 2.11.

Miouge1 on 15 Apr 2019

@ant31 - yeah, the proxy options are there, but they are not mentioned in this ticket and are also not really tested as far as I can tell.

MarkusTeufelberger on 25 Apr 2019

/remove-lifecycle stale

MarkusTeufelberger on 27 Apr 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 26 Jul 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 25 Aug 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 24 Sep 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.