kustomize re-orders config causing fail when crd or admission webhook used

Created on 25 Feb 2019 · 28Comments · Source: kubernetes-sigs/kustomize

If we look @ cert-manager we see that it has in its output:

crd
objects
webhooks
objects using webooks + crd

thus the order must be preserved.

So if I apply kustomize across their input and add

apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  namespace: kube-system
  name: selfsigning-issuer
spec:
  selfSigned: {}

to the 'end', i get a failure

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterissuers.admission.certmanager.k8s.io": the server could not find the requested resource

the attached output file shows. The selfsigning-issuer ClusterIssuer is emitted in the middle, before its webhook is defined

cert-manager-to-be-applied.yaml.gz

Source

donbowman

👍7

Most helpful comment

This could be avoided when the original input source order would be respected as kubectl apply -f does it. For this reason I agree with @anguslees that the original order should be preserved.

I strongly agree with this. Baking in dependency re-ordering requires understanding all types.

liggitt on 23 May 2019

👍3

All 28 comments

possible solutions?

kubectl gets a 'sort' or 'order' concept
kubectl gets a 'begin/commit' phase like sql
kustomize maintains order
kubectl understands crd + webhooks, orders crd first, then webhook creation first
kustomize understands order and emits crd first then webhook
kubectl gets a 'filter' argument, i can run 3 times with kubectl apply --filter=crd ; kubectl apply --filter=webhook first

others?

donbowman on 25 Feb 2019

In the short term I have a workaround:
https://github.com/Agilicus/yaml_filter

kustomize build . | yaml_filter.py -i CustomResourceDefinition,ValidatingWebhookConfiguration |kubectl apply -f -
kustomize build . | yaml_filter.py -o CustomResourceDefinition,ValidatingWebhookConfiguration |kubectl apply -f -

donbowman on 25 Feb 2019

kustomize already sorts things like namespaces for this reason. CRDs and Webhooks should also be sorted IMHO.

pwittrock on 25 Feb 2019

I think we just need to add MutatingWebhooks here: https://github.com/kubernetes-sigs/kustomize/blob/92fc368ede090d7198611b3a320a58fd222ad3e8/pkg/gvk/gvk.go#L73

pwittrock on 25 Feb 2019

This issue is not resolved by that commit.
the commit doesn't even add the test case above.

donbowman on 1 Mar 2019

Re-opened.

kubernetes/enhancements#865 proposed as a general solution to order.

Open to other ideas.

@donbowman You mention that the PR that attempted to close this issue did not include the data you provided, and it did not.

But the test case you mention was a large tarball of resources - which can be a lot to digest.

The readme suggests a good approach for reporting undesired output, but i realize that it's buried in a bunch of commentary.

basically, if you convert your data to a unit test in the pkg/target package, e.g. this config map generator test, then we all get multiple wins.

1) The test unambiguously records the bad behavior currently exhibited by the code. The test must _pass_ (so we can merge it), but it should be commented to show why the actual output is _undesired_, and what would be desired instead (e.g. _resource Foo must be declared in the output before resource Bar_). These tests are formulated to be very easy to build and read.

2) When a purported fix comes along, _it must modify this test, encoding the desired behavior_, or the PR cannot be submitted. So proper modification of the test serves as proof that the bug is closeable.

3) The test remains as part of the regression suite forever in either form - either the bad behavior stands as a proven thing, or the fixed behavior stands to protect itself.

please consider converting your data to such a test. It's not that i don't understand the problem, it's that i want help with regression coverage.

monopole on 5 Mar 2019

Fwiw, I suggest replacing #822 with something that orders Webhook declarations last.

In a single invocation (and without more human-driven direction), we have to assume that we aren't able to perform the webhook until after Deployments/etc have started, and we also have to assume that other resources in the same invocation don't require the (mutating) webhook to act on them (that would be a bootstrapping cycle).

(Edit: or better yet for kustomize: preserve original document order)

anguslees on 5 Mar 2019

👍1

I changed the order so that ValidatingWebhookConfiguration now comes last.
This makes a cert-manager.yaml installation work with kustomize as well.

However unnecessary problems caused by ordering may still appear in theory when using objects with a CustomResourceDefinition's kind where one has a hard dependency to the other.
This could be avoided when the original input source order would be respected as kubectl apply -f does it. For this reason I agree with @anguslees that the original order should be preserved.

The question is: why was a kustomize-specific order introduced in the first place?

_Edit: Also I didn't touch the MutatingWebhookConfiguration order since I had no example to test at hand but I think it must come last as well._

mgoltzsche on 23 May 2019

This could be avoided when the original input source order would be respected as kubectl apply -f does it. For this reason I agree with @anguslees that the original order should be preserved.

I strongly agree with this. Baking in dependency re-ordering requires understanding all types.

liggitt on 23 May 2019

👍3

What was the downside to leaving ordering up to the user, based on the order of the manifest appeared in the kustomization.yaml resources array? What motivated explicit ordering in gvk.go?

brichins on 28 May 2019

I guess one of the reasons was to make it more declarative/deterministic (but I don't know - one of the original authors should answer this - @monopole?).
After having thought about it a little longer I think that this ordering decision may not be as bad as I first thought because even with kubectl apply -f you hit the limits of what can be deployed using a single command quickly: Usually when you want to apply a custom resource the corresponding crd, API service and deployment(s) need to be available first. Hence, if you don't want to apply everything using --validate=false, you need to apply your crd, API service and deployment first and wait for it to become available before you apply resources that rely on it. This looks like some kind of dependency/deployment management requirement and one could argue that it is a matter of higher level tooling. Still such a feature could be added to kubectl - especially since kubectl is aware of all resource types while kustomize cannot be.
In absence of such a feature within kubectl kustomize forces the user to split her artifacts into smaller separately applied packages or write additional order configuration unnecessarily.
(However I haven't seen such a hard dependency between custom resources yet that would require to break with the current kustomize behaviour and the latter can always be worked around.)

_Is there already tooling that features the kind of deployment described above (waiting for apiservice/deployment to become available before applying another yaml/package)? (I don't mean helm + shell scripts)_

mgoltzsche on 28 May 2019

To just carry on forward with what @brichins had suggested, can MutatingWebhookConfiguration be added to orderLast
This is to support scenarios where if MutatingWebhookConfiguration is set with failurePolicy: Fail and would prevent deployments to be deployed as that deployment is the endpoint which the MutatingWebhookConfiguration is referring to.

jerrinss5 on 30 May 2019

This is to support scenarios where if MutatingWebhookConfiguration is set with failurePolicy: Fail and would prevent deployments to be deployed as that deployment is the endpoint which the MutatingWebhookConfiguration is referring to.

that is not a good configuration, and creation order is not a reasonable way to mitigate that. see the namespaceSelector option in webhooks for the ability to exclude the namespace containing the webhook's deployment from depending on the webhook

liggitt on 30 May 2019

@liggitt thanks for the info.

jerrinss5 on 1 Jun 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 30 Aug 2019

With @monopole's refactorings here this issue seems to be solved completely and can be closed.

mgoltzsche on 31 Aug 2019

@mgoltzsche @pwittrock @donbowman We are seeking feedback for this KEP.

The corresponding PR Improve kustomize build --reorder option.

The current issue has been reproduced here. Note that the sort has been configured in such way that kustomize build . --reorder=none and kustomize build . --reorder=kubectlapply have the same output.

jbrette on 3 Sep 2019

I will take a look. What i ended up doing was:
https://github.com/Agilicus/kustomize-plugins/tree/master/agilicus/v1/crdgenerator
create a Generator that I use in a 2-pass mode.
E.g. i run kustomize twice, once it creates all the crd.
I would have preferred a Transformer, but they cannot delete objects, only modify.

donbowman on 3 Sep 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 3 Oct 2019

/remove-lifecycle rotten

invidian on 4 Oct 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 2 Jan 2020

/remove-lifecycle stale

invidian on 2 Jan 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 1 Apr 2020

/remove-lifecycle stale

invidian on 1 Apr 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 30 Jun 2020

/remove-lifecycle stale

jeffhubLR on 30 Jun 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 28 Sep 2020

/remove-lifecycle stale

invidian on 28 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings