kustomize re-orders config causing fail when crd or admission webhook used

Created on 25 Feb 2019  路  28Comments  路  Source: kubernetes-sigs/kustomize

If we look @ cert-manager we see that it has in its output:

  • crd
  • objects
  • webhooks
  • objects using webooks + crd

thus the order must be preserved.

So if I apply kustomize across their input and add

apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  namespace: kube-system
  name: selfsigning-issuer
spec:
  selfSigned: {}

to the 'end', i get a failure

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "clusterissuers.admission.certmanager.k8s.io": the server could not find the requested resource

the attached output file shows. The selfsigning-issuer ClusterIssuer is emitted in the middle, before its webhook is defined

cert-manager-to-be-applied.yaml.gz

Most helpful comment

This could be avoided when the original input source order would be respected as kubectl apply -f does it. For this reason I agree with @anguslees that the original order should be preserved.

I strongly agree with this. Baking in dependency re-ordering requires understanding all types.

All 28 comments

possible solutions?

  • kubectl gets a 'sort' or 'order' concept
  • kubectl gets a 'begin/commit' phase like sql
  • kustomize maintains order
  • kubectl understands crd + webhooks, orders crd first, then webhook creation first
  • kustomize understands order and emits crd first then webhook
  • kubectl gets a 'filter' argument, i can run 3 times with kubectl apply --filter=crd ; kubectl apply --filter=webhook first

others?

In the short term I have a workaround:
https://github.com/Agilicus/yaml_filter

kustomize build . | yaml_filter.py -i CustomResourceDefinition,ValidatingWebhookConfiguration |kubectl apply -f -
kustomize build . | yaml_filter.py -o CustomResourceDefinition,ValidatingWebhookConfiguration |kubectl apply -f -

kustomize already sorts things like namespaces for this reason. CRDs and Webhooks should also be sorted IMHO.

This issue is not resolved by that commit.
the commit doesn't even add the test case above.

Re-opened.

kubernetes/enhancements#865 proposed as a general solution to order.

Open to other ideas.

@donbowman You mention that the PR that attempted to close this issue did not include the data you provided, and it did not.

But the test case you mention was a large tarball of resources - which can be a lot to digest.

The readme suggests a good approach for reporting undesired output, but i realize that it's buried in a bunch of commentary.

basically, if you convert your data to a unit test in the pkg/target package, e.g. this config map generator test, then we all get multiple wins.

1) The test unambiguously records the bad behavior currently exhibited by the code. The test must _pass_ (so we can merge it), but it should be commented to show why the actual output is _undesired_, and what would be desired instead (e.g. _resource Foo must be declared in the output before resource Bar_). These tests are formulated to be very easy to build and read.

2) When a purported fix comes along, _it must modify this test, encoding the desired behavior_, or the PR cannot be submitted. So proper modification of the test serves as proof that the bug is closeable.

3) The test remains as part of the regression suite forever in either form - either the bad behavior stands as a proven thing, or the fixed behavior stands to protect itself.

please consider converting your data to such a test. It's not that i don't understand the problem, it's that i want help with regression coverage.

Fwiw, I suggest replacing #822 with something that orders Webhook declarations last.

In a single invocation (and without more human-driven direction), we have to assume that we aren't able to perform the webhook until after Deployments/etc have started, and we also have to assume that other resources in the same invocation don't require the (mutating) webhook to act on them (that would be a bootstrapping cycle).

(Edit: or better yet for kustomize: preserve original document order)

I changed the order so that ValidatingWebhookConfiguration now comes last.
This makes a cert-manager.yaml installation work with kustomize as well.

However unnecessary problems caused by ordering may still appear in theory when using objects with a CustomResourceDefinition's kind where one has a hard dependency to the other.
This could be avoided when the original input source order would be respected as kubectl apply -f does it. For this reason I agree with @anguslees that the original order should be preserved.

The question is: why was a kustomize-specific order introduced in the first place?

_Edit: Also I didn't touch the MutatingWebhookConfiguration order since I had no example to test at hand but I think it must come last as well._

This could be avoided when the original input source order would be respected as kubectl apply -f does it. For this reason I agree with @anguslees that the original order should be preserved.

I strongly agree with this. Baking in dependency re-ordering requires understanding all types.

What was the downside to leaving ordering up to the user, based on the order of the manifest appeared in the kustomization.yaml resources array? What motivated explicit ordering in gvk.go?

I guess one of the reasons was to make it more declarative/deterministic (but I don't know - one of the original authors should answer this - @monopole?).
After having thought about it a little longer I think that this ordering decision may not be as bad as I first thought because even with kubectl apply -f you hit the limits of what can be deployed using a single command quickly: Usually when you want to apply a custom resource the corresponding crd, API service and deployment(s) need to be available first. Hence, if you don't want to apply everything using --validate=false, you need to apply your crd, API service and deployment first and wait for it to become available before you apply resources that rely on it. This looks like some kind of dependency/deployment management requirement and one could argue that it is a matter of higher level tooling. Still such a feature could be added to kubectl - especially since kubectl is aware of all resource types while kustomize cannot be.
In absence of such a feature within kubectl kustomize forces the user to split her artifacts into smaller separately applied packages or write additional order configuration unnecessarily.
(However I haven't seen such a hard dependency between custom resources yet that would require to break with the current kustomize behaviour and the latter can always be worked around.)

_Is there already tooling that features the kind of deployment described above (waiting for apiservice/deployment to become available before applying another yaml/package)? (I don't mean helm + shell scripts)_

To just carry on forward with what @brichins had suggested, can MutatingWebhookConfiguration be added to orderLast
This is to support scenarios where if MutatingWebhookConfiguration is set with failurePolicy: Fail and would prevent deployments to be deployed as that deployment is the endpoint which the MutatingWebhookConfiguration is referring to.

This is to support scenarios where if MutatingWebhookConfiguration is set with failurePolicy: Fail and would prevent deployments to be deployed as that deployment is the endpoint which the MutatingWebhookConfiguration is referring to.

that is not a good configuration, and creation order is not a reasonable way to mitigate that. see the namespaceSelector option in webhooks for the ability to exclude the namespace containing the webhook's deployment from depending on the webhook

@liggitt thanks for the info.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

With @monopole's refactorings here this issue seems to be solved completely and can be closed.

@mgoltzsche @pwittrock @donbowman We are seeking feedback for this KEP.

The corresponding PR Improve kustomize build --reorder option.

The current issue has been reproduced here. Note that the sort has been configured in such way that kustomize build . --reorder=none and kustomize build . --reorder=kubectlapply have the same output.

I will take a look. What i ended up doing was:
https://github.com/Agilicus/kustomize-plugins/tree/master/agilicus/v1/crdgenerator
create a Generator that I use in a 2-pass mode.
E.g. i run kustomize twice, once it creates all the crd.
I would have preferred a Transformer, but they cannot delete objects, only modify.

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TechnicalMercenary picture TechnicalMercenary  路  3Comments

natasha41575 picture natasha41575  路  3Comments

bugbuilder picture bugbuilder  路  3Comments

sidps picture sidps  路  5Comments

laupow picture laupow  路  4Comments