What would you like to be added:
kubeadm join is the recommended way for non-first control plane nodes and worker nodes. We should set kubeadm_control_plane to true by default. Not sure if it makes sense to keep the legacy 'kubeadm init everywhere' use case around. Is there any edge cases with the control plane mode?
There is a valid use case to have an "external" etcd cluster not managed by kubeadm, specially when etcd is not deployed on the control plane nodes. Currently, etcd setup is fairly manual, fragile (like during upgrades), and hard to debug. https://github.com/kubernetes-sigs/etcdadm is supposed to make etcd management easier. In the long run, kubeadm will eventually use etcdadm under the hood. It would be a good idea to implement it for the "external" etcd use case as well. Moreover, adding support for BYO etcd cluster (#6398) should be fairly easy if we go down that path.
kubespray officially supports only systemd-based linux distros. We should not have two cgroup managers (see https://github.com/kubernetes/kubeadm/issues/1394#issuecomment-462878219 for technical details).
This is a backward incompatible change, so maybe default it for new install but keep the current setting for the upgrades?
There is still some hardcoded docker commands in the code (network plugins, etcd, node role, ...). One of kubespray's goals is to "Deploy a Production Ready Kubernetes Cluster", so it should NOT have a container engine capable of building new container image by default, for security purposes. Containerd would be a more secure default setting. In order to make that transition, we need to use crictl where docker is used today.
Why is this needed:
We need to address technical debt. Code-base is wide, some areas are old and not maintained. I'd like to take the opportunity for the next major release to lean to the maximum the code-base and make the CI more agile to get quicker feedback.
/cc @floryut, @Miouge1, @mattymo, @LuckySB
Remove docker requirements
There is still some hardcoded docker commands in the code (network plugins, etcd, node role, ...). One of kubespray's goals is to "Deploy a Production Ready Kubernetes Cluster", so it should NOT have a container engine capable of building new container image by default, for security purposes. Containerd would be a more secure default setting. In order to make that transition, we need to use
crictlwheredockeris used today.
I'm all in for that, a PR was raised a long time ago to set containerd as the default runtime (but was drop as too much work and too much breaking change), but that would allow us to get rid of a lot of docker default commands and at the same time move toward something more CRI oriented.
RELEASE.md says:
Kubespray doesn't follow semver. [...] Breaking changes, if any introduced by changed defaults or non-contrib ansible roles' playbooks, shall be described in the release notes.
AFAIK we already did non-backwards compatible change in the v2.x of Kubespray (when moving to kubeadm for instance). The "production ready" party is a lot about providing a path for people to move from v2.X and v2.(X+1).
What I'm saying is that we can do breaking changes (like changing default container engine) as long as they are accepted by the community and well documented.
@EppO I thought non-kubeadm was removed in #3811 is there some other things that need clean-up? kubeadm is the only supported deployment method since v2.9.
For the GitLab CI rules: and only:changes, last I checked GitLab CI (via Failfast) is unaware of the target branch, and therefore doesn't know against what to compare, the fallback mechanism explained here is problematic for PRs with multiple commits.
Another area to consider, is that Prow has support for such features (see run_if_changed in https://github.com/kubernetes/test-infra/blob/master/prow/jobs.md)
For conformance tests, there is sonobuoy_enabled: true available and I think it's enabled on 2 CI jobs currently: config and output
@MarkusTeufelberger has some very valuable input on role design and molecule, raised a couple of issues around it. Examples: #4622 #3961
RELEASE.md says:
Kubespray doesn't follow semver. [...] Breaking changes, if any introduced by changed defaults or non-contrib ansible roles' playbooks, shall be described in the release notes.
AFAIK we already did non-backwards compatible change in the v2.x of Kubespray (when moving to kubeadm for instance). The "production ready" party is a lot about providing a path for people to move from v2.X and v2.(X+1).
Good to know. I was more worried about end-users that may not know this and end up breaking some production clusters while trying to upgrade, hence a 3.0 proposal that is more explicit on that kind of breaking changes.
@EppO I thought non-kubeadm was removed in #3811 is there some other things that need clean-up? kubeadm is the only supported deployment method since v2.9.
I missed it because I didn't change my inventory for a while and some deprecated options are still there. I think it would be beneficial for end-users as well to list deprecated inventory options for each releases. I guess I'm not the only one with some old settings :)
For the GitLab CI
rules:andonly:changes, last I checked GitLab CI (via Failfast) is unaware of the target branch, and therefore doesn't know against what to compare, the fallback mechanism explained here is problematic for PRs with multiple commits.
Another area to consider, is that Prow has support for such features (seerun_if_changedin https://github.com/kubernetes/test-infra/blob/master/prow/jobs.md)
I hear you. We can't use pipelines for merge requests because we don't create the merge request in GitLab, so that's a dead end. But I'm convinced we should architect the CI around better changes detection to get a quicker feedback loop, if prow is an option, we should look at it.
For conformance tests, there is
sonobuoy_enabled: trueavailable and I think it's enabled on 2 CI jobs currently: config and output
I guess we have some work to do in that area then :)
The maximum supported Kubernetes version is 1.16.99, but the server version is v1.18.5. Sonobuoy will continue but unexpected results may occur.
Ideally we should run conformance tests regularly to test various setup combinations and not wait release time to pass the full conformance tests. That's why I was suggesting to separate them from the install/upgrade use cases.
etcd_kubeadm_enabled: false
What about etcd? Should we change that default to true? It makes etcd upgrades impossible outside of kubernetes upgrades, kubeadm still doesn't support upgrading etcd without the kubernetes components AFAIK.
- Add CI job to test scale playbook
I also though about that, scale and remove needs some love from CI
Flip default of var kubeadm_control_plane to true and remove "experimental" from code?
etcd_kubeadm_enabled: true
makes all etcdctl related use cases not to work.
There is also no backup procedure with kubeadm managed etcd
I started looking at it
Flip default of var kubeadm_control_plane to true and remove "experimental" from code?
That's actually what I was referring to with "Drop non-kubeadm deployment" but I mixed two different use cases: since 2.9 kubespray is _always_ using kubeadm to provision the cluster but it doesn't use kubeadm join on the non-first control plane nodes by default (just another run of kubeadm init).
I think the join model is the good way forward.
Personally I'd like to drop a few features that are relatively exotic or easy to work around/implement yourself such as downloading binaries and rsync'ing them around instead of just fetching them on each node. This could really simplify the download role.
Another bigger architectural change could be to change kubespray into a collection (maybe even adding some roles to https://github.com/ansible-collections/community.kubernetes eventually and/or using them here?) and in general switching to Ansible 2.10.
Personally I'd like to drop a few features that are relatively exotic or easy to work around/implement yourself such as downloading binaries and rsync'ing them around instead of just fetching them on each node. This could really simplify the
downloadrole.
I'd prefer to rely on the distro package manager when applicable instead of downloading all the stuff but if you have a better design for the download role, feel free to submit a PR.
Another bigger architectural change could be to change kubespray into a collection (maybe even adding some roles to https://github.com/ansible-collections/community.kubernetes eventually and/or using them here?) and in general switching to Ansible 2.10.
Ansible 2.10 is not released yet and we need to be careful on what ansible version is available on each supported distros.
Regarding the usage of kubespray, I know @Miouge1 wanted to promote the container image use case, where you build your own custom image with your inventory and custom playbooks. That makes definitely sense in a CI pipeline.
Reducing scope and configurability of Kubespray would be nice.
List of features that could be removed:
The more I think about it, the more I'm convinced kubespray should only provision kubernetes clusters on top of kubeadm, so we should only support the following 2 use cases on the etcd front:
That means removing the etcd_deployment_type mode kubespray supports today. We would still test the BYO etcd use case in the CI though.
The more I think about it, the more I'm convinced kubespray should only provision kubernetes clusters on top of kubeadm, so we should only support the following 2 use cases on the etcd front:
- BYO etcd (either by using etcdadm or other means, out of scope of kubespray)
- etcd managed by kubeadm
That means removing the
etcd_deployment_typemode kubespray supports today. We would still test the BYO etcd use case in the CI though.
The same we could formulate in some kind of design statement how Kubespray embrace, use and extend kubeadm. Not workaround it
We need to address technical debt. Code-base is wide, some areas are old and not maintained. I'd like to take the opportunity for the next major release to lean to the maximum the code-base and make the CI more agile to get quicker feedback.
Helm 3.x was released since Kubespray 2.x. It no longer requires a tiller pod and is integrated into k8s rbac. I think it would be better for Kubespray to refocus on its core competency: deploying production Kubernetes. Can include the most widely used plugins (CNI\CSI) in this. But apps that have a decent helm chart should now be deployed using that. Helm vs Ansible for deploying apps to Kubernetes is a no-brainer. Thanks to its state, Helm is truly declarative, Ansible is not. For example, uninstall a helm release and your app is removed from k8s, undefine an addon in Kubespray (eg cert_manager_enabled=false), and it remains. Most helm charts are better maintained than the addons in this project. I get the desire for Kubespray to be a one-stop-shop, so could either replace the addons with simple readme guidance explaining how to install the former addons using helm, or if workable could install the helm client and version-pinned helm charts using Kubespray.
Would significantly simplify this project and the maintenance burden.
I think we are very close to be able to use kubeadm managed etcd as the default.
What do you think about that?
Maybe we could deal with Helm apps in a separate github project ?
This project would only focus on :
Some attached CI would not require kubsepray deployment : only inventory plus any kubernetes should be enough. This would avoid people to rewrite their own helm addons playbooks and roles.
EDIT: first mentioned dashboard as helm chart, bad example this is plain yaml, I removed it. Btw we may think about setting the dashboard out of kubespray scope in favor to Helm :)
EDIT 2 : after searching a bit it seems there is no helm chart for dashboard
Most helpful comment
Helm 3.x was released since Kubespray 2.x. It no longer requires a tiller pod and is integrated into k8s rbac. I think it would be better for Kubespray to refocus on its core competency: deploying production Kubernetes. Can include the most widely used plugins (CNI\CSI) in this. But apps that have a decent helm chart should now be deployed using that. Helm vs Ansible for deploying apps to Kubernetes is a no-brainer. Thanks to its state, Helm is truly declarative, Ansible is not. For example, uninstall a helm release and your app is removed from k8s, undefine an addon in Kubespray (eg cert_manager_enabled=false), and it remains. Most helm charts are better maintained than the addons in this project. I get the desire for Kubespray to be a one-stop-shop, so could either replace the addons with simple readme guidance explaining how to install the former addons using helm, or if workable could install the helm client and version-pinned helm charts using Kubespray.
Would significantly simplify this project and the maintenance burden.