鈿狅笍 Cluster API maintainers can ask to turn an issue-proposal into a CAEP when necessary, this is to be expected for large changes that impact multiple components, breaking changes, or new large features.
Goals
Non-Goals/Future Work
User Story
As an operator, I want kubeadm to have better support for Cluster API's use cases to reduce the number of failed machines in my infrastructure.
Detailed Description
In a number of environments, machines can intermittently fail to bootstrap. The most common of these are control plane joins, which lead to temporary changes in etcd and API server availability, mediated by the speed of the underlying infrastructure and the particulars of infrastructure load balancers.
Some ugly hacks have been introduced, notably #2763 to retry kubeadm operations. As a long term solution, Cluster API should be a good kubeadm citizen and make changes to kubeadm to do the appropriate retries to cover the variety of infrastructure providers supported by Cluster API. In addition, the KCP controller re-implements some of the
Contract changes [optional]
Data model changes [optional]
[Describe contract changes between Cluster API controllers, if applicable.]
/kind proposal
cc @neolit123 @fabriziopandini
I would like to add:
Final thought.
Despite all the improvements we can add to kubeadm, a CLI cannot prove the same guarantees a reconciliation loop does. So, it is necessary that also ClusterAPI implements/improve the capability to detect failures in the CLIs and replace failed nodes
agreed to all of @fabriziopandini 's points.
kubeadm follows the philosophy of a CLI tool (like ssh, ftp, etc) and it cannot anticipate all of the infrastructure related failures. but having a sane / best-effort amount of retries in the CLI tool makes sense.
Support the effort to move kubeadm out-of-tree
hopefully scheduled for 1.19. depends a lot on sig-release and partly on sig-arch!
Make kubeadm retry operations based on data gathered from Cluster API users
this can be useful, no doubt. like i've mentioned today, interestingly we have not seen major complains about the failures CAPI is seeing. users are applying custom amount of timeout around their cluster creation on custom infrastructure (e.g. "i know what my GCE running cluster needs").
Consider implementing machine-readable for kubeadm to support #2554
@randomvariable can you expand on this point?
we have a tracking issue to support machine readable output. but not sure how does this relate to the failures. to my understanding one of the major issues we have in CAPI is that we cannot get signal if kubeadm join returned > 0.
Re-factor relevant parts of kubeadm into a library consumable by the bootstrap and kubeadm control plane controllers
there is a tracking issue for that as well. it will be a long process and the timeline is unclear.
after the move, we can start working on that but for a period of time the exposed library will be unstable.
For #2254 we will likely have some component on the machine call back to an infrastructure API notification service (or back to the management cluster) to provide information about the failure. Providing users with access to the log is one case, but providing a readable output, which may actually be something along the lines of expanding the range of error codes, could update a specific condition on the machine related to the exact kubeadm failure. A controller could then take appropriate remediative action. I agree this is long-term however.
/area dependency
Removing this as a proposal, rather seems like a future cleanup
/kind cleanup
WRT:
Some ugly hacks have been introduced, notably #2763 to retry kubeadm operations.
in 1.19 kubeadm merged a number of fixes and backported them to 1.17, 1.18:
https://github.com/kubernetes/kubeadm/issues/2091
https://github.com/kubernetes/kubeadm/issues/2092
https://github.com/kubernetes/kubeadm/issues/2093
https://github.com/kubernetes/kubeadm/issues/2094
/assign @fabriziopandini
for evaluation of this part.
adding up-to-date comments to the rest of the tasks:
Support the effort to move kubeadm out-of-tree
[1] timeline is unclear, we are blocked on the lack of policy for component extractions out of k/k.
we have stakeholders such as sig-arch and sig-release who see this as low-prio.
Make kubeadm retry operations based on data gathered from Cluster API users
fixes above should conform this task.
Consider implementing machine-readable for kubeadm to support #2554
we did not merge any PRs in 1.19 for MRO as the contributor was busy with other tasks, but the boilerplate is in place.
Re-factor relevant parts of kubeadm into a library consumable by the bootstrap and kubeadm control plane controllers
this is very long term, potentially after [1]
Migrate to kubeadm v1beta2
v1beta1 is scheduled for removal in kubeadm 1.20 and my proposal would be to keep us on track for this effort.
/milestone v0.4.0
@neolit123 thanks for the update!
xref my comment from https://github.com/kubernetes-sigs/cluster-api/issues/3323#issuecomment-656771293:
We should stop exposing the kubeadm v1betax types in our KubeadmConfig/KubeadmControlPlane specs, and instead use our own types. This would allow us to separate what users fill in from which kubeadm API version we end up using in our bootstrap data. As @detiber pointed out, we still have to know which version of the kubeadm types to use when generating our kubeadm yaml file and when interacting with the kubeadm-config ConfigMap.
We should stop exposing the kubeadm v1betax types in our KubeadmConfig/KubeadmControlPlane specs, and instead use our own types. This would allow us to separate what users fill in from which kubeadm API version we end up using in our bootstrap data. As @detiber pointed out, we still have to know which version of the kubeadm types to use when generating our kubeadm yaml file and when interacting with the kubeadm-config ConfigMap.
Some questions: If we go this route, it sounds like we would be hand picking what gets exposed in the capi equivalent of the kubeadm types, right? Is the idea to provide a better capi abstraction to the user? Would the mapping be along the lines capi types <--> kubeadm types (v1betax) <--> kubeadm configmap? Given the large (and potentially increasing) number of fields that kubeadm exposes, wouldn't this approach lead to the issue of keeping capi types in sync with kubeadm types?
Also, it would be great to keep this issue in mind for any redesigns: https://github.com/kubernetes-sigs/cluster-api/issues/1584
If we go this route, it sounds like we would be hand picking what gets exposed in the capi equivalent of the kubeadm types, right? Is the idea to provide a better capi abstraction to the user?
Yes, but I think we could probably expose the majority of them in a way that makes more sense for our users. For example, with KubeadmControlPlane, we expose the full kubeadm ClusterConfiguration, which includes a field for KubernetesVersion... but we control the control plane version in KubeadmControlPlane.Spec.Version. It makes more sense to me not to expose the full ClusterConfiguration because there are fields we control elsewhere.
Given the large (and potentially increasing) number of fields that kubeadm exposes, wouldn't this approach lead to the issue of keeping capi types in sync with kubeadm types?
I do recognize this adds another layer and will likely duplicate a lot of fields between CAPI and kubeadm. However, we are currently locked to the kubeadm v1beta1 API version, and that version supports a fixed range of Kubernetes versions. v1beta1 eventually won't support newer Kubernetes versions. We know we'll eventually have to move to kubeadm v1beta2, and we'll want to move to newer API versions whenever they're available as well. I think it makes more sense for CAPI to insulate the user from kubeadm API versions: as a user, I don't want to think "My target version is Kubernetes v1.21.x - which kubeadm API version do I need?"
One use case we would want to support:
Discussing with @andrewsykim , there's scenarios with quite a few of the CNIs where you don't want kube-proxy deployed. kubeadm supports this using only the CLI flag "--skip-phase", so if we are going to provide our own types, it we should see which of these CLI flags we may want to expose as API types.
i though we had a ticket about the support to skip phases via the kubeadm configuration, but apparent we don't.
i can see this being a string slice under JoinConfiguration or InitConfiguration.
i can see this being a string slice under JoinConfiguration or InitConfiguration.
馃挴 yes please, ^^^^^^
For reference, another use case for kubeadm as a library would be to simplify CABPK retry logic from a windows perspective: https://github.com/kubernetes-sigs/cluster-api/pull/3616#discussion_r494571110. Currently the proposed solution is for the InfraMachine Spec to have an OsTpye that can be looked up but CABPK controller and provide the correct retry script. This additional logic and script in CABPK would be not be needed if kubeadm could be called as a library.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remote-lifecycle stale
/assign @randomvariable @yastij
We'll pick this up as a requirement for the node agent proposal in v1alpha4
/lifecycle frozen
I know we don't have a label for it, but just for tracking
/area node-agent
@randomvariable: The label(s) area/node-agent cannot be applied, because the repository doesn't have them
In response to this:
I know we don't have a label for it, but just for tracking
/area node-agent
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@randomvariable We can add one under https://github.com/kubernetes/test-infra/blob/57ffba1efeed46ad1eb03a4f7ea58c2bc530966b/label_sync/labels.yaml#L1469
Most helpful comment
https://github.com/kubernetes/kubeadm/issues/2261