Flux: Validate manifests with API server dry run

Created on 22 Jul 2019 · 12Comments · Source: fluxcd/flux

Starting with Kubernetes 1.13 the API dry run is enabled by default. Flux could run kubectl apply --server-dry-run before trying to apply the manifest. We could log the validation errors in such a way that's easy to detect with a log parser like Fluentd/CloudWatch/Stackdriver/etc (#1340). We could also expose a Prometheus metric with the validation errors count (#2199).

To avoid custom resources no match errors, the validation and apply should be done in stages:

extract all CRDs from the manifest
run server-dry-run on the CRDs
if the validation succeeds apply the CRDs
run server-dry-run on all manifest
if the validation succeeds apply all manifests

enhancement

Source

stefanprodan

👍2

Most helpful comment

In order to validate the content of a commit before pushing it to our gitops master branch (e.g. in a pull request) I would find it very helpful to be able to call fluxd in a dryrun only way. Could that be possible as well ?

tobias-jenkner on 19 Apr 2020

👍5

All 12 comments

@squaremo @hiddeco should we proceed with applying the manifests if the dry run fails?

stefanprodan on 22 Jul 2019

I am familiar with the --server-dry-run flag but not with the logic behind it, is the server side validation output _guaranteed_ the same as (failing) to apply it?

hiddeco on 22 Jul 2019

I think it behaves the same as apply:

Every stage runs as normal, except for the final storage stage. Admission controllers are run to check that the request is valid, mutating controllers mutate the request, merge is performed on PATCH, fields are defaulted, and schema validation occurs. The changes are not persisted to the underlying storage, but the final object which would have been persisted is still returned to the user, along with the normal status code. If the request would trigger an admission controller which would have side effects, the request will be failed rather than risk an unwanted side effect.

See:

stefanprodan on 22 Jul 2019

👍1

I think the server dry run should be opt-in via a Flux command flag. Not every validation controller has support for it e.g. https://github.com/open-policy-agent/gatekeeper/issues/128

stefanprodan on 22 Jul 2019

👍2

If the request would trigger an admission controller which would have side effects, the request will be failed rather than risk an unwanted side effect.

This is a big :heavy_plus_sign: compared to what we have now, and given that there is no difference, it would not make sense to still try to apply the resources that would fail.

Question remains if we want to apply a partial set (by filtering out what makes it fail), or skip the whole apply. I am inclined to choose for the latter as we strive to maintain a valid state.

hiddeco on 22 Jul 2019

👍1

I vote for skipping the apply all together if the validation fails.

stefanprodan on 22 Jul 2019

👍1

Looks like we need a two stage validation/apply procedure since the custom resources will fail if the CRDs are not applied.

CRD + CR:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tests.k8s.io
  annotations:
    helm.sh/resource-policy: keep
spec:
  group: k8s.io
  version: v1
  versions:
    - name: v1
      served: true
      storage: true
  names:
    plural: tests
    singular: test
    kind: Test
    categories:
      - all
  scope: Namespaced
---
apiVersion: k8s.io/v1
kind: Test
metadata:
  name: test
  namespace: test
spec:
  some: value

Dry run result:

kubectl apply --server-dry-run -f ./test.yaml
customresourcedefinition.apiextensions.k8s.io/tests.k8s.io created (server dry run)
error: unable to recognize "test.yaml": no matches for kind "Test" in version "k8s.io/v1"

stefanprodan on 22 Jul 2019

We've encountered the problem of deployments failing because of validation errors a couple of times - and not having a good way to communicate back to the right person, has been a bit problematic.

Logging this in an easy detectable way would be a great first step. Would it be possible to consider an option with a webhook also?

We have a service; release-manager which is responsible for moving files around in git, and also report back to developers with progress etc. If we could configure flux to trigger a webhook in our release-manager and have it communicate directly to our developers with the problem via, e.g. Slack, instead of having them to inspect our log management tool. That would be pretty cool.

kaspernissen on 16 Aug 2019

tobias-jenkner on 19 Apr 2020

👍5

@tobias-jenkner, I'm in the same boat. I'd like to pass the dry run output (from a CLI command?) to kubeval in a CI pipeline.

marshallford on 5 Jun 2020

What we thought that could be a good idea, is to have flux --dry-run or even plugin some more validations for other than source branch, in this case, it could be integrated to git-flow process, like

create branch feature/xxxx1
commit changes to the branch, get flux validated it
ideally even integrated to GitHub status checks
then team reviews changes
then merge to source branch, and got synced to target cluster

So basically we are also in the same boat.

chaliy on 24 Aug 2020

The API server dry-run was implemented in the GitOps toolkit and can be enabled with validation: server https://toolkit.fluxcd.io/components/kustomize/kustomization/

stefanprodan on 31 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings