Kustomize: patchStrategicMerge elides null and {}

Created on 18 Jul 2020  路  14Comments  路  Source: kubernetes-sigs/kustomize

@monopole - sorry for the delay, i was trying to narrow it down, but probably should bring y'all up to date

This is the follow on to #2697

Dropping the -emptyDir: {} leads to a malformed volume declaration in the Deployment object
Dropping creationTimestamp: null might not matter semantically, but it seems odd that a patch that merges is able to remove something

cat >kustomization.yaml <<EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- resource.yaml
patchesStrategicMerge:
- patch.yaml
EOF
cat >resource.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sas-crunchy-data-postgres-operator
spec:
  replicas: 1
  template:
    metadata:
      creationTimestamp: null
    spec:
      serviceAccountName: postgres-operator
      containers:
      - name: apiserver
        image: sas-crunchy-data-operator-api-server
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8443
        envFrom: []
        volumeMounts:
        - name: security-ssh
          mountPath: /security-ssh
        - mountPath: /tmp
          name: tmp
      imagePullSecrets: []
      volumes:
      - emptyDir: {}
        name: security-ssh
      - emptyDir: {}
        name: tmp
EOF
cat >patch.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sas-crunchy-data-postgres-operator
  labels:
    workload.sas.com/class: stateless
spec:
  template:
    metadata:
      labels:
        workload.sas.com/class: stateless
    spec:
      tolerations:
        - key: "workload.sas.com/class"
          operator: "Equal"
          value: "stateful"
          effect: "NoSchedule"
        - key: "workload.sas.com/class"
          operator: "Equal"
          value: "stateless"
          effect: "NoSchedule"
---
EOF

if i kustomize build . on this, - emptyDir: {} as well as creationTimestamp: null gets removed from the object that is built, even though i'm not patching that part of the object.

if i remove/comment the patchesStrategicMerge from kustomization.yaml, then the - emptyDir: {} and creationTimestamp: null passes thru into the result

almost as if those null empty fields are not retained as you build the object in preparation (or during) the patchesStrategicMerge process

Most helpful comment

I believe HEAD is consistent with option 2 right now, but i'd like to review the tests again to confirm this.

We'll be doing a new release shortly

All 14 comments

I've stumbled upon the same issue, worth noting that 3.8.1 is still affected.

Probably, this issue was introduced with the recent migration from api-machinery to kyaml.

Right now, I've switched back to 3.7.0 which is not affected.

Yes, it's the kyaml.

By default kyaml processing does cleanups that should have no impact on KRM use.
E.g. indents are made canonical, trailing blanks dropped,
quoting might be added or removed based on field types,
map fields may be canonically sorted.

Dropping fields with value null feels like it belongs in the cleanup
category. Dropping a field that has an explicitly empty map or array
also feels like a cleanup, but we've established that the latter
can cause problems for code that wants an array to append to.
We've had no reports that this is a problem for maps.

So options are

1) retain [] (which we already do) but drop {} and null.
1) retain [] and {}, but drop null.
1) retain all three.

With any of these options, a user would still have the option to run the yaml
through a filter that aggressively removed any or all of these cases.

It's just a question of what the default should be.

I like 2 (drop null), but sounds like people like 3?

i don't have an example with null that leads to a malformed kubernetes object if it is dropped, so i'm indifferent on that one.
the emptydir volume example speaks to retaining {} -- dropping it produces objects that don't meet the spec
retaining [] allows people to provide bases that are more easily extended by others in a durable (over time) way

definitely 2. 3 gets us closer to the behaviour we had up the kustomize.370. What do others think?

@Shell32-Natsu , let's let this sit for a week to gather comments.

Default decision is option 2 (which requires a PR to allow {}).

@monopole Got that

Hi @monopole ,

yes this is definitely a bug, an empty object is completely different compared to null which specifies the absense of something. This is the same e.g. in JSON and is also defined like that in the YAML Spec: https://yaml.org/spec/1.2/spec.html#id2803362
Edit: I agree with option 2, [] and {} should be kept, null can be discarded.

Also examples of https://yaml.org/spec/1.2/spec.html#id2805071

A null: null
Also a null: # Empty
Not a null: ""

For us this causes a huge problem with Prometheus Operator as they require an empty object so that rules are picked up everywhere:

            ruleNamespaceSelector:
              description: A label selector is a label query over a set of resources.
                The result of matchLabels and matchExpressions are ANDed. An empty
                label selector matches all objects. A null label selector matches
                no objects.

See

The interesting thing is:
patching an empty object works, but as soon as another patch is applied on the same resource the previously empty object is discarded, I guess that a cleanup happens before the patch.

We would really appreciate a fast fix, thanks!

I believe HEAD is consistent with option 2 right now, but i'd like to review the tests again to confirm this.

We'll be doing a new release shortly

I come across here after an outage on our system. When kustomize 3.8.1 is used to deploy gloo (https://github.com/solo-io/gloo), the default setting entries are removed from the rendered output. Below the diff. generated with kustomize-3.4.0 which shows the removed entries:

diff /tmp/LIVE-522819448/gloo.solo.io.v1.Settings.gloo.default /tmp/MERGED-616571511/gloo.solo.io.v1.Settings.gloo.default
8c8
<   generation: 7
---
>   generation: 8
33a34,36
>   kubernetesArtifactSource: {}
>   kubernetesConfigSource: {}
>   kubernetesSecretSource: {}

gloo then goes with the new setting that missed 3 entries (kubernetesArtifactSource*), and it's actually down.

I'm looking forward to a new release of kustomize that addresses this issue. In the mean time, I'd go excluding some manifests from kustomize-3.8.1 output.

Update:

Fyi, gloo doesn't have very strict CRD/setting validation here (FIMXE). When some setting is wrong they can keep running but not functioning. On the otherhand, if kustomize-3.4.0 is used, there would be another problem with gloo, though not critical

Below is the full object manitest they need.

apiVersion: v1
items:
- apiVersion: gloo.solo.io/v1
  kind: Settings
  metadata:
    annotations:
    labels:
      app: gloo
    name: default
    namespace: gloo
  spec:
    discovery:
      fdsMode: WHITELIST
    discoveryNamespace: gloo
    gateway:
      readGatewaysFromAllNamespaces: false
      validation:
        allowWarnings: true
        alwaysAccept: true
        proxyValidationServerAddr: gloo:9988
    gloo:
      disableKubernetesDestinations: false
      disableProxyGarbageCollection: false
      invalidConfigPolicy:
        invalidRouteResponseBody: Gloo Gateway has invalid configuration. Administrators
          should run `glooctl check` to find and fix config errors.
        invalidRouteResponseCode: 404
      xdsBindAddr: 0.0.0.0:9977
    kubernetesArtifactSource: {}
    kubernetesConfigSource: {}
    kubernetesSecretSource: {}
    refreshRate: 60s

I believe HEAD is consistent with option 2 right now, but i'd like to review the tests again to confirm this.

We'll be doing a new release shortly

Hi @monopole,

is that fixed in 3.8.2 https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv3.8.2 ?

Many thanks

@icy Yes the fix is in 3.8.2.

@icy Yes the fix is in 3.8.2.

Thanks a lot for your information, and thanks the team for the fix.

I can confirm that my gloo-related problem is gone with 3.8.2

Thanks for fixing this so quickly. Was deploying prometheus operator too and had a hell of a time figuring out what happened as I upgraded both kustomize and operator in the same PR. Lesson learned. 3.8.2 fixed

/close

Please reopen if you want to discuss further.

@Shell32-Natsu: Closing this issue.

In response to this:

/close

Please reopen if you want to discuss further.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings