Flux: Validate manifests with API server dry run

Created on 22 Jul 2019  路  12Comments  路  Source: fluxcd/flux

Starting with Kubernetes 1.13 the API dry run is enabled by default. Flux could run kubectl apply --server-dry-run before trying to apply the manifest. We could log the validation errors in such a way that's easy to detect with a log parser like Fluentd/CloudWatch/Stackdriver/etc (#1340). We could also expose a Prometheus metric with the validation errors count (#2199).

To avoid custom resources no match errors, the validation and apply should be done in stages:

  • extract all CRDs from the manifest
  • run server-dry-run on the CRDs
  • if the validation succeeds apply the CRDs
  • run server-dry-run on all manifest
  • if the validation succeeds apply all manifests
enhancement

Most helpful comment

In order to validate the content of a commit before pushing it to our gitops master branch (e.g. in a pull request) I would find it very helpful to be able to call fluxd in a dryrun only way. Could that be possible as well ?

All 12 comments

@squaremo @hiddeco should we proceed with applying the manifests if the dry run fails?

I am familiar with the --server-dry-run flag but not with the logic behind it, is the server side validation output _guaranteed_ the same as (failing) to apply it?

I think it behaves the same as apply:

Every stage runs as normal, except for the final storage stage. Admission controllers are run to check that the request is valid, mutating controllers mutate the request, merge is performed on PATCH, fields are defaulted, and schema validation occurs. The changes are not persisted to the underlying storage, but the final object which would have been persisted is still returned to the user, along with the normal status code. If the request would trigger an admission controller which would have side effects, the request will be failed rather than risk an unwanted side effect.

See:

I think the server dry run should be opt-in via a Flux command flag. Not every validation controller has support for it e.g. https://github.com/open-policy-agent/gatekeeper/issues/128

If the request would trigger an admission controller which would have side effects, the request will be failed rather than risk an unwanted side effect.

This is a big :heavy_plus_sign: compared to what we have now, and given that there is no difference, it would not make sense to still try to apply the resources that would fail.

Question remains if we want to apply a partial set (by filtering out what makes it fail), or skip the whole apply. I am inclined to choose for the latter as we strive to maintain a valid state.

I vote for skipping the apply all together if the validation fails.

Looks like we need a two stage validation/apply procedure since the custom resources will fail if the CRDs are not applied.

CRD + CR:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tests.k8s.io
  annotations:
    helm.sh/resource-policy: keep
spec:
  group: k8s.io
  version: v1
  versions:
    - name: v1
      served: true
      storage: true
  names:
    plural: tests
    singular: test
    kind: Test
    categories:
      - all
  scope: Namespaced
---
apiVersion: k8s.io/v1
kind: Test
metadata:
  name: test
  namespace: test
spec:
  some: value

Dry run result:

kubectl apply --server-dry-run -f ./test.yaml
customresourcedefinition.apiextensions.k8s.io/tests.k8s.io created (server dry run)
error: unable to recognize "test.yaml": no matches for kind "Test" in version "k8s.io/v1"

We've encountered the problem of deployments failing because of validation errors a couple of times - and not having a good way to communicate back to the right person, has been a bit problematic.

Logging this in an easy detectable way would be a great first step. Would it be possible to consider an option with a webhook also?

We have a service; release-manager which is responsible for moving files around in git, and also report back to developers with progress etc. If we could configure flux to trigger a webhook in our release-manager and have it communicate directly to our developers with the problem via, e.g. Slack, instead of having them to inspect our log management tool. That would be pretty cool.

In order to validate the content of a commit before pushing it to our gitops master branch (e.g. in a pull request) I would find it very helpful to be able to call fluxd in a dryrun only way. Could that be possible as well ?

@tobias-jenkner, I'm in the same boat. I'd like to pass the dry run output (from a CLI command?) to kubeval in a CI pipeline.

What we thought that could be a good idea, is to have flux --dry-run or even plugin some more validations for other than source branch, in this case, it could be integrated to git-flow process, like

  • create branch feature/xxxx1
  • commit changes to the branch, get flux validated it
  • ideally even integrated to GitHub status checks
  • then team reviews changes
  • then merge to source branch, and got synced to target cluster

So basically we are also in the same boat.

The API server dry-run was implemented in the GitOps toolkit and can be enabled with validation: server https://toolkit.fluxcd.io/components/kustomize/kustomization/

Was this page helpful?
0 / 5 - 0 ratings

Related issues

astraldragon picture astraldragon  路  3Comments

Alphapage picture Alphapage  路  3Comments

phoppe93 picture phoppe93  路  4Comments

audrey-brightloom picture audrey-brightloom  路  3Comments

anwarchk picture anwarchk  路  4Comments