Kubespray has today cumulated many options. On one hand, the composability and choices are core and high differentiator for Kubespray, on the other the effort to maintain them impact the general quality as it's not realistic to test so many flavors.
OS:
Network plugins:
dns_mode:
resolvconf_mode:
binaries_deployment_type:
etcd_certs management:
etcd:
container_engine:
My thoughs:
All current OS are ok and in general low maintaince or contributed
Most network plugins are ok but we should add owner file on each and start gathering some data: usage (survey?), issues opened...
Too many choices with low impact for the user. I propose to be more opiniated here.
I don't know which one, but I think we could pick a single choice
I'd keep only Host, and drop docker/rkt.
It's a bit a regression and more trouble in upgrades/os dependent but in the end there are not that many binaries: etcd, etcdctl, kubeadm, kubelet, kubectl and they are already extracted from a container.
With the increasing number of container_engine (docker, crio, containerd...), not all provide a great way to manage container outside kubernetes, and we'd have to handle to many different tools (dockercli, criotools...)
As an alternative, maybe a standalone kubelet (disconnected from the master, e.g for etcd nodes) with static pod could do the job (but that would be a bit experimental)?
Too many issues with Vault ! looking at what other projects are doing, and the direction (etcdadm), keeping vault is too much maintainance and should be out of the scope.
Both mode make sense, it mostly depend on the cluster size.
I agree on all of these, except resolvconf_mode docker_dns. It's really the best option for quick testing environments because hosts don't completely need to resolve cluster DNS and there might be a complicated resolv.conf already in place that is hard to manage. That's why this mode was set up.
With regards to dropping docker based etcd/kubelet, I think we should strongly consider moving to package-based or binary downloads, instead of copying from the container to host.
@riverzhang @bradbeam @rsmitty You folks are opinionated about deployment, so you should weigh in on this proposed deprecation list.
Do we need another section for kubernetes certs management, or are we planning to transition exclusively to kubeadm's bootstrap token/join based cert provisioning (my personal preference)?
I am still in favor of docker based installation of etcd. We use containerized installations of all applications/software in our datacenter to prevent headaches if/when we change operating systems.
@woopstar the compiled binaries are not operating system related and don't provide any benefits compared to the containers
having containerized installation as the number of container engines grow will become impossible or will force to install say 'docker' on top of 'containerd / crio' etc. An alternative to this mess, would be to use static pod + standalone kubelet, but it had complexity for not much.
Also specifically for etcd, installation and management via kubespray would become more an 'option' than forced.
wget https://.........../etcd-v$version > /opt/bin/etcd
/opt/bin/etcd --config /etc/etcd/etcd-config.yaml
vs
docker pull etcd:$version
docker run -d -v /etc/etcd:/etcd/etcd quay.io/etcd:$version etcd --config /etc/etcd/etcd-config.yaml
is there a lot of difference? Plus both are managed by systemd so in the end start/stop/restart are exactly the same in both cases.
systemctl start etcd-node-1
Sure. Running the binary on the host changes the kernel/namespace environment the binary executes from (CentOS, Ubuntu - kernel versions etc).
Using Docker containers, I can guarantee that etcd runs with the same libs as it uses "FROM
Then you move the problem to docker it self. It's much more a pain to maintain than a statically compiled binary.
Just search how many issues we had just about upgrading docker, ppa, rpm, incompatible version or whatever issues with managing docker.
cc @kubernetes-sigs/kubespray-maintainers
how many issues we had just about upgrading docker, ppa, rpm, incompatible version
I see what you mean. We just mitigate that problem totally by running CoreOS where all versions of all libs, kernel, applications etc. are locked to a specific version for a release. This gives us zero headaches.
I think the current network plugins should be updated to include kube-router and multus
I think the current network plugins should be updated to include kube-router and multus
Indeed.
Another thing is:
Do we want to support much backwards compability ? Right now we have 3 templates for kubeadm deployment. <1.10 (alpha1), 1.10-1.11 (alpha2) and >1.12(alpha3). I would suggest to only support 1 release back in each of our releases, and thus thereby deleting alpha1 deployments.
We use too much time to keep templates in sync and backwards comp.
CoreDNS should be "coredns_dual" according to this: https://github.com/kubernetes-sigs/kubespray/pull/2462#issuecomment-373638998 - We could rename it to "coredns" though.
@woopstar yes , we need to delete alpha1 deployments when kubernetes 1.13 release.
then support kubeadm config beta1.
About resolvconf_mode, I think maybe we should keep docker_dns until some changes are made.
Maybe I hit a corner case but setting resolvconf_mode to host_resolvconf completely removes the original content of resolv.conf on my bare-metal non-dhcp environment.
So after a reset (or if the cluster.yml playbook fails at some point after the modification of /etc/resolv.conf), I just can't do anything until I restore /etc/resolv.conf manually...
@mirwan yes we ll keep both.
We should add prechecks that trigger when an option on a deprecation path is used. Those should print warnings in deprecation releases, and fail once the options are fully removed.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Container engine wise, there's also containerd next to docker and cri-o (and arguably rkt, maybe also stuff like kubevirt or katacontainers, but that's more of a special case and not something that could be expected to run initially in most cases).
https://kubernetes.io/docs/setup/cri/ lists also frakti, but that one seems to die out slowly.
One basic "option" that's missing so far is to run kubespray behind a (corporate?) proxy server versus running with direct internet access. In many cases this will already work, but it doesn't seem to be tested systematically yet.
@MarkusTeufelberger there's already proxy options
Yes, I've listed what we have 'now', ad and not the next things about katacontainer...
Also kubevirt isn't an engine for pod.
Is there anyone opposing the rkt deprecation? It would be good to announced it 2.10 so we can remove it from 2.11.
@ant31 - yeah, the proxy options are there, but they are not mentioned in this ticket and are also not really tested as far as I can tell.
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
My thoughs:
OS
All current OS are ok and in general low maintaince or contributed
Network plugins:
Most network plugins are ok but we should add owner file on each and start gathering some data: usage (survey?), issues opened...
dns_mode:
Too many choices with low impact for the user. I propose to be more opiniated here.
I would limit it to:
resolvconf_mode:
I don't know which one, but I think we could pick a single choice
binaries_deployment_type:
I'd keep only Host, and drop docker/rkt.
It's a bit a regression and more trouble in upgrades/os dependent but in the end there are not that many binaries: etcd, etcdctl, kubeadm, kubelet, kubectl and they are already extracted from a container.
With the increasing number of container_engine (docker, crio, containerd...), not all provide a great way to manage container outside kubernetes, and we'd have to handle to many different tools (dockercli, criotools...)
As an alternative, maybe a standalone kubelet (disconnected from the master, e.g for etcd nodes) with static pod could do the job (but that would be a bit experimental)?
( and in anycases removing rkt)
etcd_certs management:
Too many issues with Vault ! looking at what other projects are doing, and the direction (etcdadm), keeping vault is too much maintainance and should be out of the scope.
etcd
Both mode make sense, it mostly depend on the cluster size.
container_engine:
proposed depraction list: