flux 🚀 - Kustomize integration support

There's a few bits of technical design we'd have to figure out:

how does automation (updating the images used) work with kustomize config?

Part of flux's API is that you can ask it to update the images used for particular workloads; and, you can ask for that to be automated. With regular YAMLs this is pretty straight forward. Since image values may be patched arbitrarily with kustomize, I think it would be pretty tricky without a fairly rigid convention for how to factor the config.

Or maybe not? At a glance, it seems like it would be possible to be able to trace where an image was defined, and update it there; you'd probably want to lock base layers, so you're not updating things for any more than one environment. Anyway: needs some thought.

when does kustomize run?

It's not a big deal to run kustomize every time we want to apply the config to the cluster (with a period on the order of a minute). However: we also need YAMLs around for other circumstances, like answering API calls, which can happen arbitrarily.

do we need some kind of plugin mechanism for this?

We have special code for supporting Helm, in fluxd (and a _whole_ operator!). Supporting Helm and kustomize would be great; supporting Helm and kustomize and, I dunno, ksonnet, would be cool; at some point though, baking things in gets pretty monolithic.

They all have similar properties (Helm is a little different because of the indirection via charts, but similar enough). I wonder if there's a mechanism to be invented where the particular flavour of config is supported by a sidecar, or some such. One idea popped up in #1194, though it doesn't cover the automation/update requirement.

squaremo on 1 Aug 2018

It looks like the latest release of kustomize https://github.com/kubernetes-sigs/kustomize/releases/tag/v1.0.5) has support for setting image tags in the kustomization.yaml file, I wonder if this would help for automatic image changes?

geofflamrock on 9 Aug 2018

FYI: Support for kustomize has just been merged into kubectl: https://github.com/kubernetes/kubernetes/pull/70875

tobru on 26 Dec 2018

🎉5

Support for kustomize has just been merged into kubectl:

Interesting! And neat.

squaremo on 27 Dec 2018

This is something that we are very keenly interested in.

andrewl3wis on 9 Jan 2019

FYI: Support for kustomize has just been merged into kubectl: kubernetes/kubernetes#70875

And now it's reverted :sob: https://github.com/kubernetes/kubernetes/pull/72805

tobru on 15 Jan 2019

I have written a design proposal which, if successful, will result in Flux supporting Kustomize and other manifest factorization technologies.

Comments are open, so please take a look and let me know what you think (answers to comments will be taken in a best-effort manner).

My next step will be to write a proof of concept PR and run it by any interested parties.

2opremio on 11 Feb 2019

👍2

It's back in 1.14 as a subcommand: https://github.com/kubernetes/kubernetes/pull/73033

zeeZ on 19 Mar 2019

I have a working solution for using Kustomize, see https://github.com/2opremio/flux-kustomize-demo

Please take a look, if the feedback is positive we will end up merging it.

2opremio on 11 Apr 2019

👍2

We started to test this and it has been working ok but it raised a few questions that i would like to ask / see what you think about

my understanding is that for each _git-path_ option passed to flux it will run the commands defined in _.flux.yaml_

Previously, when using rendered yaml manifests , flux was able to recursively look into sub-directories of _git-path_ and apply all the manifests it found. This was very powerful to manage a dynamic set of manifest in a static set of _git-paths_ .
Right now we are just defining a single _kustomizaion.yaml_ file to import a lot of bases to essentially describe the whole cluster, this way we can point flux at a single _git-path_ ... this works but i am not sure it will scale very well and could end up hitting some size limits if the output of _kustomize build ._ becomes too big.
What do you think would be the right way to manage such a situation ?
using the _kubeyaml_ command to update images, as described in your example, seems a bit cumbersome and will require a patch file for every workload defined in the imported bases.

When using a single kustomization.yaml to describe the whole cluster, potentially with hundreds of microservices, this won't scale very well.
We have tried using _kustomize set image_ directly , this works but it will create a new entry in kustomization.yaml for every single image in the _environment_ that then will need to be ported to the production one.
Also this does not work for annotations which makes it not a workable solution

I guess all my problems right now are about understanding how to divide _kustomize_ and _flux_ responsabilities

Multiple flux with smaller kustomize.yaml ( Maybe one per namespace or something similar )
Single flux with a bigger kustomize.yaml

What are your / the community thoughts about this scenario ?

primeroz on 15 Apr 2019

@primeroz First, thanks for testing it!

my understanding is that for each _git-path_ option passed to flux it will run the commands defined in _.flux.yaml_

It depends on where you place the .flux.yaml files with respect to the paths passed to --git-path.

Quoting the design document:

_For every flux target path_ [i.e. passed to --git-path]_, flux will look for a .flux.yaml file in the target path and all its parent directories._

If no .flux.yaml file is found, Flux will treat the path _normally_ (recursively looking for _raw_ manifests like it always has).

this works but i am not sure it will scale very well and could end up hitting some size limits if the output of _kustomize build ._ becomes too big.

I guess the only way to know is by testing it. We read the output sequentially and my guess is it will be able to withstand quite a big size. If you encounter a performance problem, please let me know, I will be more than happy to address it.

However, if you feel more comfortable, and based on the .flux.yaml search rules above, you can split the generated output into multiple .flux.yaml-governed directories at will (or even keep using raw manifests for the pieces which don't change per environment).

using the kubeyaml command to update images, as described in your example, seems a bit cumbersome and will require a patch file for every workload defined in the imported bases.

It doesn't require one patch file for every workload, you can put all the workloads in the same patch-file. Also, as indicated in the demo's README, my plan is to modify kubeyaml (incorporating a flag like --add-if-not-found) so that it adds the patched resource on-demand if not found.

Note that, however you make it, when using an incremental patch-based solution like kustomizeyou will somehow need to store the patches. Be it in the kustomization.yaml file, a separate patch file (e.g. flux-path.yaml as I proposed) or somewhere else.

I guess all my problems right now are about understanding how to divide kustomize and flux responsabilities
What are your / the community thoughts about this scenario ?

It really depends on your particular use-case. I am happy to make suggestions if you describe your scenario (both qualitatively and quantitatively, so that I can estimate Flux's load), but ultimately it's about making your life easier.

My goal with the proposed solution (and gathering feedback early) is to come up with a generic approach (covering not only kustomize but other tools too), which is easy to use with positive reception from the community. I am happy to make modifications in order to cover common use-cases/scenarios. Also, I am equally happy to burry it and go for a better approach, if I find one while getting feedback.

2opremio on 16 Apr 2019

Hi thanks for your response. i will do a bit more experimentation .

This is what i was thinking to do anyway in terms of structure but i wanted to limit the amount of repetition between _clusters_ overlays

├── bases/
│   ├── app1/
│   │   └── kustomization.yaml
│   ├── app100/
│   │   └── kustomization.yaml
│   ├── app2/
│   │   └── kustomization.yaml
│   ├── dns/
│   │   └── kustomization.yaml
│   └── ingress/
│       └── kustomization.yaml
└── clusters/
    ├── dev1/
    │   └── kustomization.yaml
    ├── dev2/
    │   └── kustomization.yaml
    ├── prod1/
    │   └── kustomization.yaml
    ├── staging1/
    │   └── kustomization.yaml
    └── .flux.yaml

For each cluster I will point flux to the right _git-path_ clusters/XXX directory

Right now i will define a "patch.yaml" in each cluster overlay but it wlil be tricky until the "-add-if-not-found" flag is supported by kubeyaml because i have loads of services running and i will need to generate that file dynamically
For the Images i can just do a _kustomize set image_ but that won't work for annotations ... i did not dig much into it yet so i am not sure if i do require annotations updates or i can live without it for now
I don't want , at least for now, to auto update images in prod , I am not sure how to keep in sync the image releases from staging to prod.
I guess i could just copy the "patch.yaml" over but that feel more like subversion than git :)
For each cluster this might generate a very big "Yaml stream" but as you say you expect that to not be a problem so i will give it a shot see when / if it breaks

However, if you feel more comfortable, and based on the .flux.yaml search rules above, you can split the generated output into multiple .flux.yaml-governed directories at will (or even keep using raw manifests for the pieces which don't change per environment).

I am not sure i uderstand this. I get it that I can put .flux.yaml in a parent directory so to share it between different overlays/clusters definition.
Would i still need to specify the git-path multiple times for flux ?
so for example if i change the above structure to

├── bases/
│   ├── app1/
│   │   └── kustomization.yaml
│   ├── app100/
│   │   └── kustomization.yaml
│   ├── app2/
│   │   └── kustomization.yaml
│   ├── dns/
│   │   └── kustomization.yaml
│   └── ingress/
│       └── kustomization.yaml
└── clusters/
    ├── dev1/
    │   ├── apps/
    │   │   └── kustomization.yaml
    │   └── core/
    │       └── kustomization.yaml
    ├── dev2/
    ├── prod1/
    ├── staging1/
    └── .flux.yaml

Would i need to pass two _git-path_ to flux in dev1 ( one for apps and one for core ) or by specifying just one _git-path_ would be enough for flux to find the 2 subdir _apps_ and _core_ ?
With raw manifests that's what i was doing :)

Thanks for the hard work though, this looks really good for us!

primeroz on 16 Apr 2019

This is what i was thinking to do anyway in terms of structure but i wanted to limit the amount of repetition between _clusters_ overlays

Looks good

Right now i will define a "patch.yaml" in each cluster overlay but it wlil be tricky until the "-add-if-not-found" flag is supported by kubeyaml because i have loads of services running and i will need to generate that file dynamically

If the approach is validated I will definitely implement --add-if-not-found in kubeyaml or an equivalent solution. I can also do it earlier if it's a showstopper for your experimentation.

For the Images i can just do a _kustomize set image_ but that won't work for annotations ... i did not dig much into it yet so i am not sure if i do require annotations updates or i can live without it for now

Out of curiosity, why can't you? Is it because kustomize edit add annotation adds the annotation to all resources?

This is how I originally implemented the demo (after modifying kustomize slightly https://github.com/kubernetes-sigs/kustomize/pull/950). There are ways around that problem but the solution I found (add a separate kustomization.yaml file per resource patched) was really cumbersome. Using kubeyaml in patch files is much cleaner and simple.

I don't want , at least for now, to auto update images in prod , I am not sure how to keep in sync the image releases from staging to prod.

Weave Cloud offers this feature, by using Flux's API to propagate changes across environments. Note that this is a separate problem to using .flux.yaml files or not.

I guess i could just copy the "patch.yaml" over but that feel more like subversion than git :)

For each cluster this might generate a very big "Yaml stream" but as you say you expect that to not be a problem so i will give it a shot see when / if it breaks

This sounds low-tech, but if you only want to propagate image versions seems like a pretty good approach.

Note that it can be a bit fragile if you don't have the same workloads in each environment (otherwise, you may end up deleting the image overlay of a workload in the destination without corresponding workload in the origin).

Similarly, for this approach to allow separate annotations between environments I would use two separate patch files, one for annotations and one for images.

However, if you feel more comfortable, and based on the .flux.yaml search rules above, you can split the generated output into multiple .flux.yaml-governed directories at will (or even keep using raw manifests for the pieces which don't change per environment).

I am not sure i uderstand this. I get it that I can put .flux.yaml in a parent directory so to share it between different overlays/clusters definition.
Would i still need to specify the git-path multiple times for flux ?
so for example if i change the above structure to [...]

Would i need to pass two _git-path_ to flux in dev1 ( one for apps and one for core ) or by specifying just one _git-path_ would be enough for flux to find the 2 subdir _apps_ and _core_ ?
With raw manifests that's what i was doing :)

It depends on what you put in your .flux.yaml file but, if if its content is the same as in https://github.com/2opremio/flux-kustomize-demo, yes, you will have to provide the two paths. You could also modify the file to work in both scenarios, but let's not get into that.

The key to understand this is that the git-path entries are used as the current working directory of the commands run in the .flux.yaml files. There was a typo in the design document which I just corrected. I also made the explanation more friendly.

Quoting the updated design document:
_The working directory (aka CWD) of the commands executed from a .fluxctl.yaml file will be set to the target path (--git-path entry) used when searching the .fluxctl.yaml file._

2opremio on 16 Apr 2019

We read the output sequentially and my guess is it will be able to withstand quite a big size

Upon further checking that's not really true (right now we read the full file while parsing, which we can easily change) but I think it will tolerate large sizes. Can you report on the output sizes you will have? If it's in the order of a few dozen megabytes I think it should be fine.

2opremio on 16 Apr 2019

Sorry for late reply, holidays in between!

If the approach is validated I will definitely implement --add-if-not-found in kubeyaml or an equivalent solution. I can also do it earlier if it's a showstopper for your experimentation.

That would be great for us

Out of curiosity, why can't you? Is it because kustomize edit add annotation adds the annotation to all resources?

Correct. that would create common annotation that would apply to any resource created by that kustomization file

Upon further checking that's not really true (right now we read the full file while parsing, which we can easily change) but I think it will tolerate large sizes. Can you report on the output sizes you will have? If it's in the order of a few dozen megabytes I think it should be fine.

Just over 1MB right now and is working fine. I guess i was worrying about something before checking if it was a real deal. I will keep an eye on it and possibly create an example where i can reach a >24MB size and see how it goes ( That will be a lot of resources! )

primeroz on 23 Apr 2019

Would be nice to see this implemented as kustomize seems to be becoming the defacto standard for declarative templating with it being merged into kubectl.

rdubya16 on 23 Apr 2019

👍1

@rdubya16 by _this_ you mean the working implementation mentioned above? :)

2opremio on 23 Apr 2019

😄1

@2opremio This is my first look at flux so I can't really speak to this implementation. We are managing yaml files with kustomize checked into git but want to make the move toward gitops in the next few months and was hoping there was kustomize support so we wouldnt have to do something hacky like OP.

rdubya16 on 23 Apr 2019

Sorry for late reply, holidays in between!

If the approach is validated I will definitely implement --add-if-not-found in kubeyaml or an equivalent solution. I can also do it earlier if it's a showstopper for your experimentation.

That would be great for us

@primeroz Is this the only blocker? Would you be happy to use it without any other modifications?

2opremio on 23 Apr 2019

@rdubya16 did you check https://github.com/2opremio/flux-kustomize-demo ?

2opremio on 23 Apr 2019

@2opremio yeah i think so. I am getting back onto this right now since i spent most of my last week on porting our self-made terraform kubernetes onto GKE and only got our "bootstrap flux" done for now ( which just uses jsonnet generated yaml manifests)

I am getting onto the apps section now , for which i want to use kustomize , so i will update you once i got more info

from my first experiments though i think that is the only blocker i see

primeroz on 24 Apr 2019

Also doing a proof of concept (with promotion from dev -> staging -> prod now using this). I think we want to achieve a similar pattern to https://www.weave.works/blog/managing-helm-releases-the-gitops-way , but using kustomize instead of Helm :)

@2opremio just wondering on the use-case for annotations in the demo.

At the moment we would have annotations for various services defined in kustomize overlays, like

  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.my-container: glob:staging-*

In general, I don't think flux would ever need to change this, unless we were to use something like fluxctl to change the policies.

Is that fairly common (i.e. in our .flux.yaml) we could just leave out the annotation commands.

nabadger on 24 Apr 2019

I think it would be great to improve error management out of the kustomize / flux implementation

I had an issue where I did not add the key to a remote Kustomize bases repo that the one flux cloned uses for remote bases.

It took a long time before getting any errors in the logs :

flux-apps-644d6cd98d-hnqs6 flux ts=2019-04-24T13:15:05.965356274Z caller=images.go:24 component=sync-loop error="getting unlocked automated resources: error executing generator command \"kustomize build .\" from file \"dev/vault/.flux.yaml\": exit status 1\nerror output:\nError: couldn't make loader for [email protected]//cluster-config/dev/services/vault?ref=vault: trouble cloning [email protected]//cluster-config/dev/services/vault?ref=vault: exit status 128\n\n\ngenerated output:\nError: couldn't make loader for [email protected]//cluster-config/dev/services/vault?ref=vault: trouble cloning [email protected]//cluster-config/dev/services/vault?ref=vault: exit status 128\n\n"

Also once i fixed the issue ( adding the key to the remote ) it never recovered ( retried on its own ) until i killed flux
Or at least it did not retry to run kustomize in 10 minutes i waited

primeroz on 24 Apr 2019

I think it would be great to improve error management out of the kustomize / flux implementation

I totally agree, it's in the list. See https://docs.google.com/document/d/1ebAjaZF84G3RINYvw8ii6dO4-_zO645j8Hb7mvrPKPY/edit#heading=h.7fpfjkabanhy

It took a long time before getting any errors in the logs

You can force a sync by running fluxctl sync

Also once i fixed the issue ( adding the key to the remote ) it never recovered

Uhm, please make sure to let me know if it happens again. It may be a bug in the implementation.

2opremio on 24 Apr 2019

In general, I don't think flux would ever need to change this, unless we were to use something like fluxctl to change the policies.

Is that fairly common (i.e. in our .flux.yaml) we could just leave out the annotation commands.

@nabadger The .flux.yaml files are designed so that you can leave the updaters out:

_Generators and updaters are intentionally independent in case a matching updater cannot be provided. It is too ambitious to make updaters work for all possible factorization technologies (particularly Configuration-As-Code)._

If you don't care about automatic releases and fluxctl commands which update the resources (e.g. fluxctl automate, fluxctl release), then just omit the updaters.

2opremio on 24 Apr 2019

👍1

Also once i fixed the issue ( adding the key to the remote ) it never recovered
Uhm, please make sure to let me know if it happens again. It may be a bug in the implementation.

FYI i think this was my fault, i forgot i increased the "sync-interval" to 15m

I will surely report if it happens again.

primeroz on 25 Apr 2019

I will definitely implement --add-if-not-found in kubeyaml or an equivalent solution. I can also do it earlier if it's a showstopper for your experimentation.

I started implementing this, but I hit a wall in containerImage because specifying the container name is not enough for all cases:

When generating the container of a HelmRelease you also need the format in which to specify the container in the values section.
In some workloads (Deployments) you can have normal containers and init containers. The container name doesn't indicate which kind of container it is.

So, I need to rethink this. Maybe we can supply extra environment variables (e.g. the yaml path to the container) but it's going to be non-trivial.

2opremio on 25 Apr 2019

Just to understand the issue , don't we have the same problem of _init_ vs _workload_ containers when dealing with raw manifests ?

Is this a problem just for creating the YAML patch since kubeyaml does not know if is an init or a workload container ?

Would an assumption of wanting to update the _workloads_ cotainer 99.9% of the time be bad ? :)

primeroz on 25 Apr 2019

Is this a problem just for creating the YAML patch since kubeyaml does not know if is an init or a workload container ?

Yes; kubeyaml is told only the container name, and without an existing entry, it won't know whether the entry should be a container or initContainer.

squaremo on 25 Apr 2019

Running into an issue when trying to test this.

I have a repo called flux-ci-promotion under the workload named dev-echo (it echos headers back).

WORKLOAD                      CONTAINER  IMAGE                                                                       RELEASE  POLICY
default:deployment/dev-echo   echo       registry.gitlab.com/<redacted>/flux-ci-promotion:dev-test  ready    
default:deployment/flux       flux       docker.io/2opremio/flux:generators-releasers-8baf8bd0                       ready    
default:deployment/memcached  memcached  memcached:1.4.25                                                            ready

 fluxctl list-images -w default:deployment/dev-echo
WORKLOAD                     CONTAINER  IMAGE                                                              CREATED
default:deployment/dev-echo  echo       registry.gitlab.com/<redacted>/flux-ci-promotion  
                                        '-> dev-test                                                       29 May 16 05:03 UTC
                                            dev-test-1                                                     29 May 16 05:03 UTC
                                            dev-test-2                                                     29 May 16 05:03 UTC

At this point I want to release the dev-test-1 tag, as I want to look at how the updater hook and kubeyaml work.

 fluxctl -vv  release --update-image=registry.gitlab.com/<redacted>/flux-ci-promotion:dev-test-1 --workload=default:deployment/dev-echo                                                                                                                              Submitting release ...                                                                          
Error: verifying changes: failed to verify changes: the image for container "echo" in resource "default:deployment/dev-echo" should be "registry.gitlab.com/<redacted>/flux-ci-promotion:dev-test-1", but is "registry.gitlab.com/<redacted>/flux-ci-promotion:dev-test"
Run 'fluxctl release --help' for usage.

I get a similar message (well the same) on the flux-controller logs

flux-56f4db559-wm9ct flux ts=2019-04-25T17:48:10.214338085Z caller=loop.go:123 component=sync-loop jobID=8e927b0f-e5d8-15a3-9118-b04e78f1f945 state=done success=false err="verifying changes: failed to verify changes: the image for container \"echo\" in resource \"default:deployment/dev-echo\" should be \"registry.gitlab.com/<redacted>flux-ci-promotion:dev-test-1\", but is \"registry.gitlab.com/<redacted>/flux-ci-promotion:dev-test\""

The running pod is using flux-ci-promotion:dev-test, and deploys fine from the initial configuration via the flux-deployment.

I'm not sure if this is related, but if I try to automate or de-automate the workload via fluxctl, I can't get it work.

fluxctl automate --workload=default:deployment/dev-echo                              
Error: no changes made in repo                                                                    
Run 'fluxctl automate --help' for usage

WORKLOAD                      CONTAINER  IMAGE                                                                       RELEASE  POLICY
default:deployment/dev-echo   echo       registry.gitlab.com/<redacted>/flux-ci-promotion:dev-test  ready

The policy doesn't change.

EDIT

I re-tested this with git-path pointing to some raw manifests just using the output of kustomize build. and still using 2opremio/flux:generators-releasers-8baf8bd0. Both the fluxctl release and fluxctl (de)automate commands worked as expected. Will keep trying :)

EDIT 2

Ok found the issue. The error output was from kubeyaml.py.

The cause of my issue is that in my kustomize overlay I was specifying namePrefix: dev-.

I suspect you can re-produce the same issue if set this in your demo-example?

This might pose a problem as namePrefix is very common.

It looks like kubeyaml was being passed --name "echo" which throws the error, where as if I was to pass dev-echo (which includes the namePrefix), it works. The name comes from $FLUX_WL_NAME.

Not too sure if this can be fixed.

nabadger on 25 Apr 2019

I also raised this issue https://github.com/kubernetes-sigs/kustomize/issues/1015 because if it would be possible to have the "hard work" done in kustomize then it would benefit us all :)

Unless I've missed something, such a feature (in kustomize) would be great, regardless of how it was called (flux or whatever).

nabadger on 25 Apr 2019

@nabadger good job finding the issue!

The cause of my issue is that in my kustomize overlay I was specifying namePrefix: dev-.

Can you elaborate on the cause of the problem? Also, if possible, can you share the repo you were using?

2opremio on 26 Apr 2019

@primeroz I think I have found a solution for the patch generation. Instead of kubeyaml editing flux-patch.yaml we can make it generate a Strategic Merge Patch from the output of kustomize build .. Then we can pass the SMP to kustomize and apply it to flux-patch.yaml

I will work on a PR to kubeyaml

2opremio on 26 Apr 2019

@2opremio sure I've copied my repo to github and cleaned it up a bit.

https://github.com/nabadger/flux-ci-promotion

In my example I have multiple applications pulled in from bases ( https://github.com/nabadger/flux-ci-promotion/blob/master/kustomize/dev/kustomization.yaml ) so would require the ability to edit (or create) multiple patch files.

The --add-if-not-found would help resolve the issues I was having.

I was trying to mimic the --add-if-not-found option by creating a template patch file, and using it to create real patch files.

See https://github.com/nabadger/flux-ci-promotion/blob/master/kustomize/.flux.yaml

My issue here is that my template has a metadata.name value of name (its generic), but the FLUX_WL_NAME that is passed in will obviously never match this, so kubeyaml won't generate the manifest.

If we use namePrefix, that generates a different FLUX_WL_NAME, but it's really just the same issue.

I suspect my issues would be helped by https://github.com/weaveworks/flux/issues/1261#issuecomment-487013445

Would be interesting to see how this copes with different resource types (Stateful/Deployment etc). I think that would be ok, but we may struggle with any CRDs we have (they tend to be a special case anyway).

nabadger on 26 Apr 2019

@nabadger @primeroz It took forever to figure out, but I finally have a solution! It required creating a new tool to compute strategic merge patches (which I called kubedelta).

Anyways, to give it a try please:

Take a look at the patch-bookeeping-kubedelta branch: https://github.com/2opremio/flux-kustomize-demo/tree/patch-bookeeping-kubedelta .
Make sure to update the image to the one indicated at https://github.com/weaveworks/flux/pull/1848

I am looking forward to your feedback! The performance of the updaters will be worse but I think it will still support large YAML streams (that's the price to pay for not doing any bookkeeping :) ).

2opremio on 7 May 2019

👍1

@2opremio thanks will take a look today

nabadger on 7 May 2019

@2opremio Thanks for the effort you have put into this :)

I've done some more testing on a small app I have. Currently I find that there is an issue when namePrefix is used in the kustomization.yaml - but if that is left out, this works well (i.e. sync works, releases work, the patch files generated look good so far).

The namePrefix is an interesting one because I think it's a common feature with kustomize.

I think it will be the same issue as before and something that kubeyaml is not yet able to handle (perhaps).

I think you will run into the same issue if you were to specify namePrefix in your demo ( https://github.com/2opremio/flux-kustomize-demo/blob/patch-bookeeping-kubedelta/staging/kustomization.yaml for example).

Generally speaking this is the setup that we have (I don't think it's uncommon, but I also don't know anyone that is doing this style of thing with gitops and commits back to the repo which flux can do).

We define a base "generic" app deployment. It will have some defaults like anti-affinity, replicas, resources, health checks.

./common/generic-app/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  replicas: 1
  template:
    spec:
      containers:
      - args: []
        image: container-image
        livenessProbe:
          httpGet:
            path: /healthz
            port: http
        name: container-name
        ports:
        - containerPort: 8000
          name: http
        readinessProbe:
          httpGet:
            path: /readiness
            port: http
        resources:
          limits:
            cpu: 200m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi

We define an overlay which could be in the same repo or a remote one.

In our example (with flux) this could probably include all apps for a particular environment (i.e. our entire cluster state for staging).

./kustomization.yaml

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

bases:
- ./common/generic-app
# - ./common/another-app
# -./common/and-another

patchesStrategicMerge:
- patch-replicas.yaml
- flux-patch.yaml

Here's a typical patch that flux creates now

flux-patch.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  template:
    spec:
      $setElementOrder/containers:
      - name: container-name
      containers:
      - image: registry.gitlab.com/..../echoheaders:v0.0.2
        name: container-name

This is where it gets interesting. The patch needs to reference the original metadata.name, which in this example is just app.

The above works, because our generic base is app, and without any prefixing, so is the overlay.

If however we set namePrefix to review, the kubeyaml workload passed in will be review-app. This will not match app, so it fails with the following error:

flux-59c4dcd7c6-2vggj flux ts=2019-05-07T19:02:11.6408159Z caller=loop.go:123 component=sync-loop jobID=59314baa-7825-83f1-d5c8-20281816c244 state=done success=false err="loading resources after updates: error executing generator command \"kustomize build .\" from file \"review/.flux.yaml\": exit status 1\nerror output:\nError: failed to find an object with apps_v1_Deployment|review-app to apply the patch\n\ngenerated output:\nError: failed to find an object with apps_v1_Deployment|review-app to apply the patch\n"

I'll work on a more concrete example. I'm not yet convinced of an easy answer to this when multiple deployments are involved which share a common base (I don't think it can be done with a single kustomization.yaml per environment like we have now, as I think the patches would conflict as they would all reference app).

nabadger on 7 May 2019

@nabadger if the prefix is set from the very beginning it should work shouldn't it? (then flux won't even be aware of identifiers without the prefix)

2opremio on 7 May 2019

@nabadger a concrete example would help

2opremio on 7 May 2019

I think I now understand where the problem is. ~I haven't tested it, but if~ Kustomize applies the prefix at the very end, it expects the patch not to have the prefix but kubeyaml will be provided the final name (with the prefix). ~Please confirm.~

This can be solved in two ways:

You can remove the prefix from the environment variable before invoking the script
I plan to add patch-applying capabilities to kubedelta. This will allow us to apply the patch after kustomize is invoked (in particular after the prefix is added). In fact, this will allow for a generic approach supporting any manifest-generstion technology, not just kustomize.

@nabadger please go for (1) for now

2opremio on 8 May 2019

There is also:

Create an extra overlay for the patch, on top of the one with the namePrefix, to ensure the prefix is added before applying the flux patch.

2opremio on 8 May 2019

@2opremio thanks I'll give 1) a try and provide more concrete examples.

nabadger on 8 May 2019

@2opremio here's a working example (just kustomize not flux) on the layout we are essentially working with:

https://github.com/nabadger/flux-kustomize-example

nabadger on 9 May 2019

@nabadger Thanks.

2. I plan to add patch-applying capabilities to kubedelta. This will allow us to apply the patch after kustomize is invoked (in particular after the prefix is added). In fact, this will allow for a generic approach supporting any manifest-generstion technology, not just kustomize.

In the end I designed something better, a predefined updater called mergePatchUpdater which implicitly stores and applies the modifications from Flux into a merge patch file.

I haven't implemented it yet, but for Kustomize it would look like:

---
version: 1
generators:
  - command: kustomize .
updaters:
  - mergePatchUpdater: flux-patch.yaml

2opremio on 9 May 2019

I have finally implemented the patch-based updater (after some design discussions it got transformed into patchUpdated configuration files vs commandUpdated ones, see the design document for more details). For instance:

version: 1
patchUpdated:
  generators:
    - command: kustomize build .
  patchFile: flux-patch.yaml

Please use the last image indicated at #1848 and give it a try!

@primeroz @nabadger This should hopefully fix all the problems you mentioned.

2opremio on 10 May 2019

Great - looking forward to testing this early next week, thanks @2opremio :)

nabadger on 11 May 2019

removed comment - my segfault was a result of having a bad .flux.yaml configured against the latest image.

nabadger on 13 May 2019

@2opremio I am running 2opremio/generators-releasers-bb344048 and is working like a charm.

did a bit of testing of the release manager with @nabadger and that is looking good as well , thanks a lot for all your work!

2 things i would like to highlight in case they make a difference before this code makes it to master

need more error outputs :) Humans are bad and for about half an hour we had a wrong .flux.yaml (the one from this comment rather than the one from this ) and no error or anything was in the logs , it was all quiet and nothing was happening.
With this new "external" way to apply the patch through flux on top of the yaml outputed by the tool ... we now don't have any way to render the manifests as they will look like when flux apply them .
We used to render the manifests with kustomize build in CI and check diffs.
While this is not a huge deal it is kinda annoying. We will look for some tool to replicate the patching in CI before the diffs are shown

Again, thanks for this !

primeroz on 14 May 2019

@2opremio I did a couple of tests which both worked.

1 - follows your example where you may just be patching a single app.
2 - is another example (which is my use-case) where we have multiple apps from a base.

This worked well and generated the expected patch-file like so:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-echoheaders-app
  namespace: sandbox
spec:
  template:
    spec:
      $setElementOrder/containers:
      - name: container-name
      containers:
      - image: registry.gitlab.com/<redacted>/echoheaders:v0.0.4
        name: container-name
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: staging-echoheaders-app
  namespace: sandbox
spec:
  template:
    spec:
      $setElementOrder/containers:
      - name: container-name
      containers:
      - image: registry.gitlab.com/<redacted>/echoheaders:v0.0.3
        name: container-name

nabadger on 14 May 2019

Something I noticed whilst testing (which might be a general flux thing) was the number of temporary build directories left over by flux in /tmp (inside the flux container)

i.e. it creates

/tmp/kustomize-380447668

I noticed these hang around for any failed builds (which happen when we mess our up our deploy keys).

I had about 50 - wondering if flux will clean these up automatically, or whether that may cause issues for a long-running flux instance?

This is an issue with kustomize.

Created upstream issue https://github.com/kubernetes-sigs/kustomize/issues/1076

nabadger on 14 May 2019

I guess this is close to being merged now? :) We really want it before we start to setup new environments, so we don't have to do workarounds to handle multi-environment repos.

guzmo on 15 May 2019

God morgon @guzmo,

We started working with milestones a couple of weeks ago to provide you (users) with more predictable release dates. The release of this feature is planned for the next major release of Flux on June 5.

hiddeco on 15 May 2019

Morning! :)
Great news. Is it possible to install your dev image until it's released? ( When this is merged )
Would like to get started with it hehe
/ Andreas

guzmo on 15 May 2019

There are prerelease builds available that tend to be stable although no guarantee is given.

hiddeco on 15 May 2019

👍1

removed comment - my segfault was a result of having a bad .flux.yaml configured against the latest image.

Please repost. Flux should never crash, you should have gotten an error indicating what's wrong. Mind sharing the config file as well?

2opremio on 15 May 2019

There are prerelease builds available that tend to be stable although no guarantee is given.

@guzmo that only applies to code which has been merged to master. If you are interested in using it before that, please use the image indicated in PR #1848 until it's merged.

2opremio on 15 May 2019

2 things i would like to highlight in case they make a difference before this code makes it to master

need more error outputs :) Humans are bad and for about half an hour we had a wrong .flux.yaml (the one from this comment rather than the one from this ) and no error or anything was in the logs , it was all quiet and nothing was happening.

With this new "external" way to apply the patch through flux on top of the yaml outputed by the tool ... we now don't have any way to render the manifests as they will look like when flux apply them .
We used to render the manifests with kustomize build in CI and check diffs.
While this is not a huge deal it is kinda annoying. We will look for some tool to replicate the patching in CI before the diffs are shown

Again, thanks for this !

@primeroz thanks for bringing it up. Yes, we need to improve debugability. It's planned, but will be done after #1848 lands. You can read more about it at debugging section of https://docs.google.com/document/d/1ebAjaZF84G3RINYvw8ii6dO4-_zO645j8Hb7mvrPKPY/. I think it will cover both (1) and (2). Let me know what you think commenting directly in the document

2opremio on 15 May 2019

Something I noticed whilst testing (which might be a general flux thing) was the number of temporary build directories left over by flux in /tmp (inside the flux container)

i.e. it creates
/tmp/kustomize-380447668

That's probably a bug in kustomize , maybe it's already fixed upstream. I will take a look.

2opremio on 15 May 2019

removed comment - my segfault was a result of having a bad .flux.yaml configured against the latest image.

Please repost. Flux should never crash, you should have gotten an error indicating what's wrong. Mind sharing the config file as well?

@nabadger I went through the history and found the problem, there is a fix for it at https://github.com/weaveworks/flux/pull/1848/commits/bd7fa1477e977b9927eb444d77f743cc3ad759f4

2opremio on 15 May 2019

👍1

That's probably a bug in kustomize , maybe it's already fixed upstream. I will take a look.

@nabadger I've only found https://github.com/kubernetes-sigs/kustomize/issues/566 , whose fix has been released in Kustomize 2.0.3 (which is the version we ship with flux in this PR).

Would you mind creating an upstream issue?

2opremio on 15 May 2019

👍1

@2opremio Ye, I'll wait for it to be merged, so at least this feature is kinda stable :)

guzmo on 15 May 2019

@2opremio Is there a reason that the patchFile item is required in .flux.yaml ? It seems to work when you point to an empty yaml but won't work without that item missing. If you don't need any flux specific edits on top of your kustomize files, it seems kind of strange to require unless im missing something.

rdubya16 on 15 May 2019

Will this support the helm operator as well? Or is that a feature to do when this one is complete?

guzmo on 15 May 2019

Will this support the helm operator as well?

@guzmo Support in what way?

squaremo on 15 May 2019

Well, will it be possible to change the "spec.values" key inside "kind: HelmRelease" yaml files. Maybe it's obvious it does :P But since that's another container running ( helm operator ) I thought maybe it will not be supported in the first version.

guzmo on 15 May 2019

will it be possible to change the "spec.values" key inside "kind: HelmRelease" yaml files

If Kustomize can do that, this PR will be able to do that. (whether Kustomize can or not is unclear to me, but I think there's a pretty good chance)

squaremo on 15 May 2019

@2opremio Is there a reason that the patchFile item is required in .flux.yaml ? It seems to work when you point to an empty yaml but won't work without that item missing. If you don't need any flux specific edits on top of your kustomize files, it seems kind of strange to require unless im missing something.

@rdubya16 It only works with an empty yaml because of a bug I fixed in bd7fa14 . If you want include the generators and omit the updaters use a commandUpdated configuration file instead of patchUpdated one.

2opremio on 15 May 2019

will it be possible to change the "spec.values" key inside "kind: HelmRelease" yaml files

If Kustomize can do that, this PR will be able to do that. (whether Kustomize can or not is unclear to me, but I think there's a pretty good chance)

@guzmo Kustomize or any commands included in the flux image (e.g. templating using a shell script). In the future (when Ephemeral containers land in Kubernetes) we will let you use any commands supplied by a container image you specify.

EDIT: @guzmo we can add additional commands you think will be generally useful for factorizing manifests.

2opremio on 15 May 2019

@2opremio Found an issue where fluxctl can wipe out the flux-patch.yaml file.

fluxctl policy --k8s-fwd-ns flux-apps -n sandbox -w sandbox:deployment/staging-echoheaders-app                                      
WORKLOAD                                    STATUS   UPDATES
sandbox:deployment/staging-echoheaders-app  success

I failed to supply the --tag option here, but the command went through. I'm not actually sure what flux is meant todo without specifying an option here?

This results in a commit which deletes the contents of flux-patch.yaml

git pull 
...
Updating b42d644..041c022
Fast-forward
 environments/test-gke/nb1/flux-patch.yaml | 30 ------------------------------
 1 file changed, 30 deletions(-)

nabadger on 19 May 2019

@nabadger I need to check the code (or an example). Do you have a simplified reproduction? Otherwise, can you show me the logs and the contents of .flux.yaml and prior contents of flux-patch.yaml?

2opremio on 20 May 2019

@2opremio I think fluxctl release command fails to push to the repository when we merge multiple services at the same time ( guess because it has to fix merges in the flux-patch.yaml? ). Is this something you have noticed? The problem might go away when the flux-patch has most of the services, but it's boring to see a lot of failed pipelines and rerun them because it might be this :)

guzmo on 20 May 2019

@guzmo No, I haven't seen that. I am more than happy to take a look but I need specifics of how to reproduce (ideally with a totally reproducible step by step example).

2opremio on 20 May 2019

@2opremio I haven't been been able to reproduce my earlier errors yet related to the deletion of flux-patch.yaml. The same command at the moment returns

fluxctl policy -n default -w default:deployment/web-app-1-app Error: git commit: exit status 1 Run 'fluxctl policy --help' for usage.

Which is probably expected error handling of some sort?

nabadger on 21 May 2019

I have however found an issue with automation via weave annotations when using the kustomize commonAnnotations.

Let me know if I should keep raising issues in this case or separate ones.

I noticed that flux is not updating images based on how you define the annotations.

It might be useful to set automation policies in a base kustomization for a specific type of environment, like so (I'm fine not doing this, but others might like such a feature):

kustomization.yaml

commonAnnotations:
 flux.weave.works/automated: "true"
 flux.weave.works/tag.container-name: 'semver: ~0.0'

Whilst flux will think the workload is automated, it doesn't apply any new images. I set the various sync options to 1m and waited long enough for flux to apply any image updates it thinks it needs to.

WORKLOAD                          CONTAINER       IMAGE                                                  RELEASE  POLICY
default:deployment/flux           flux            docker.io/2opremio/flux:generators-releasers-bb344048  ready
default:deployment/memcached      memcached       memcached:1.4.25                                       ready
default:deployment/web-app-1-app  container-name  nabadger/podinfo:0.0.9                                 ready    automated

If i bump the image version for a demo app I get the following. It should jump to 0.0.11 from 0.0.9

WORKLOAD                          CONTAINER       IMAGE             CREATED
default:deployment/web-app-1-app  container-name  nabadger/podinfo
                                                  |   0.0.11        21 May 19 14:26 UTC
                                                  '-> 0.0.9         21 May 19 14:26 UTC
                                                      0.0.8         21 May 19 14:26 UTC
                                                      0.0.5         21 May 19 14:26 UTC
                                                      0.0.4         21 May 19 12:29 UTC

If however I add these annotations as a patch onto the deployment (and not using commonAnnotations) like so, it ends up working straight away:

kustomization.yaml

patches:
- deploy-patch.yaml

deploy-patch.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.container-name: 'semver: ~0.0'
  name: app

Could this be related to the image-poller checking annotations on non Deployment kinds? If we use commonAnnotations in this manner, then they get applied to all resource types (ingress, service, etc etc).

I see this warning in the flux logs which means it at least attempts to deal with this:

flux-5774f5cc85-srhvd flux ts=2019-05-21T15:14:22.41693528Z caller=images.go:18 component=sync-loop msg="polling images"
flux-5774f5cc85-srhvd flux ts=2019-05-21T15:14:22.696128944Z caller=images.go:34 component=sync-loop error="checking workloads for new images: Unsupported kind service"
flux-5774f5cc85-srhvd flux ts=2019-05-21T15:14:36.081429712Z caller=loop.go:111 component=sync-loop event=refreshed [email protected]:nabadger/flux-kustomize-example.git branch=test-policy-set HEAD=cd72ded861feb1b50f4b3fc148478dda6d9496fe

nabadger on 21 May 2019

@2opremio I haven't been been able to reproduce my earlier errors yet related to the deletion of flux-patch.yaml. The same command at the moment returns
Error: git commit: exit status 1
Run 'fluxctl policy --help' for usage.
Which is probably expected error handling of some sort?

It's hard to tell what the exact problem is, but please retry with Flux 1.12.3 (which includes #2054 ) to be released later today.

I will also update the image in #1848 to include it

2opremio on 22 May 2019

Whilst flux will think the workload is automated, it doesn't apply any new images. I set the various sync options to 1m and waited long enough for flux to apply any image updates it thinks it needs to.

That's strange, since Flux doesn't distinguish where the annotations come from

Could this be related to the image-poller checking annotations on non Deployment kinds? If we use commonAnnotations in this manner, then they get applied to all resource types (ingress, service, etc etc).

Ah, it could be. I don't think we should fail on that, I believe it should be a warning. However, this isn't directly related to the current issue. Could you create a separate ticket for it?

2opremio on 22 May 2019

👍1

@2opremio Ive been playing around with this build for the last week or so. Seems like it will suit our purposes. Ive ported most of our existing yaml over to use it in a sandbox environment and seems to be working well. I don't have any suggested changes just want to add my support for this implementation.

rdubya16 on 23 May 2019

Sorry for late reply. Got no time right now to create something reproducable I’m afraid :( only thing we did was merging two services at the same time so our build pipeline ran fluxctl release almost at the same time. So my guess is that the flux-patch.yaml had merge conflict when flux tried to push.
Otherwise I’d say the branch works great and looking forward for it being merged :)

guzmo on 23 May 2019

👍1

Ah, it could be. I don't think we should fail on that, I believe it should be a warning. However, this isn't directly related to the current issue. Could you create a separate ticket for it?

@nabadger , @hiddeco beat you to it :) #2092

2opremio on 24 May 2019

👍1

@rdubya16 @guzmo Great that it's working well for you!

2opremio on 24 May 2019

@2opremio - just letting you know that this is still working well for us.

Something I noticed is that listing workloads and listing images via fluxctl seem to cause the flux pod to run a git clone of the repo and a kustomize build .

I was wondering if this is expected, since I thought maybe this information (at least for images) could be fetched from memcache instead.

The impact of this can slow response times for fluxctl. This is especially true when using remote bases in kustomize (since multiple git clones are required). This is something for other users to be aware of, but it's a problem of kustomize/git (not flux).

In our use-case fluxctl commands like listing images were taking upto 10 seconds. We've since switched to a monorepo for our kustomzie configuration, so now it's back down to around 1 second.

nabadger on 30 May 2019

Yeah, we haven't looked too deeply into performance yet. The first step is to release a functional and (at least reasonably) stable feature.Thanks for the report though.

@squaremo will be in charge of merging it toaster soon

2opremio on 30 May 2019

👍1

@squaremo will be in charge of merging it toaster soon

Very toaster!

squaremo on 30 May 2019

Hahahaha, I meant "to master", damn corrector.

2opremio on 30 May 2019

:confetti_ball: #1848 is merged! :tada: You can now use a flux image from the official pre-releases repo: https://hub.docker.com/r/weaveworks/flux-prerelease/tags, and set the --manifest-generation flag, to try this feature out.

squaremo on 30 May 2019

🎉6

Flux: Kustomize integration support

Most helpful comment

All 85 comments

how does automation (updating the images used) work with kustomize config?

when does kustomize run?

do we need some kind of plugin mechanism for this?

Related issues