Starting with Kubernetes 1.13 the API dry run is enabled by default. Flux could run kubectl apply --server-dry-run before trying to apply the manifest. We could log the validation errors in such a way that's easy to detect with a log parser like Fluentd/CloudWatch/Stackdriver/etc (#1340). We could also expose a Prometheus metric with the validation errors count (#2199).
To avoid custom resources no match errors, the validation and apply should be done in stages:
server-dry-run on the CRDsserver-dry-run on all manifest @squaremo @hiddeco should we proceed with applying the manifests if the dry run fails?
I am familiar with the --server-dry-run flag but not with the logic behind it, is the server side validation output _guaranteed_ the same as (failing) to apply it?
I think it behaves the same as apply:
Every stage runs as normal, except for the final storage stage. Admission controllers are run to check that the request is valid, mutating controllers mutate the request, merge is performed on PATCH, fields are defaulted, and schema validation occurs. The changes are not persisted to the underlying storage, but the final object which would have been persisted is still returned to the user, along with the normal status code. If the request would trigger an admission controller which would have side effects, the request will be failed rather than risk an unwanted side effect.
See:
I think the server dry run should be opt-in via a Flux command flag. Not every validation controller has support for it e.g. https://github.com/open-policy-agent/gatekeeper/issues/128
If the request would trigger an admission controller which would have side effects, the request will be failed rather than risk an unwanted side effect.
This is a big :heavy_plus_sign: compared to what we have now, and given that there is no difference, it would not make sense to still try to apply the resources that would fail.
Question remains if we want to apply a partial set (by filtering out what makes it fail), or skip the whole apply. I am inclined to choose for the latter as we strive to maintain a valid state.
I vote for skipping the apply all together if the validation fails.
Looks like we need a two stage validation/apply procedure since the custom resources will fail if the CRDs are not applied.
CRD + CR:
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: tests.k8s.io
annotations:
helm.sh/resource-policy: keep
spec:
group: k8s.io
version: v1
versions:
- name: v1
served: true
storage: true
names:
plural: tests
singular: test
kind: Test
categories:
- all
scope: Namespaced
---
apiVersion: k8s.io/v1
kind: Test
metadata:
name: test
namespace: test
spec:
some: value
Dry run result:
kubectl apply --server-dry-run -f ./test.yaml
customresourcedefinition.apiextensions.k8s.io/tests.k8s.io created (server dry run)
error: unable to recognize "test.yaml": no matches for kind "Test" in version "k8s.io/v1"
We've encountered the problem of deployments failing because of validation errors a couple of times - and not having a good way to communicate back to the right person, has been a bit problematic.
Logging this in an easy detectable way would be a great first step. Would it be possible to consider an option with a webhook also?
We have a service; release-manager which is responsible for moving files around in git, and also report back to developers with progress etc. If we could configure flux to trigger a webhook in our release-manager and have it communicate directly to our developers with the problem via, e.g. Slack, instead of having them to inspect our log management tool. That would be pretty cool.
In order to validate the content of a commit before pushing it to our gitops master branch (e.g. in a pull request) I would find it very helpful to be able to call fluxd in a dryrun only way. Could that be possible as well ?
@tobias-jenkner, I'm in the same boat. I'd like to pass the dry run output (from a CLI command?) to kubeval in a CI pipeline.
What we thought that could be a good idea, is to have flux --dry-run or even plugin some more validations for other than source branch, in this case, it could be integrated to git-flow process, like
So basically we are also in the same boat.
The API server dry-run was implemented in the GitOps toolkit and can be enabled with validation: server https://toolkit.fluxcd.io/components/kustomize/kustomization/
Most helpful comment
In order to validate the content of a commit before pushing it to our gitops master branch (e.g. in a pull request) I would find it very helpful to be able to call fluxd in a dryrun only way. Could that be possible as well ?