Flux: Flag for refreshing image DB before attempting release

Created on 12 Sep 2018  路  14Comments  路  Source: fluxcd/flux

I am trying to instruct Flux to update a Deployment with multiple containers to update the images of the containers to a certain version in one take.

This is required because the images of the containers are version-married. Rolling out a newer version of the sidecar while the main container is on a older version could cause nasty issues and vice-versa.

Attempts I have made

1. automation

Automation is not possible due to the individual polling of images, as soon as Flux detects a change of image-1 it will push an update.

Having an option to add constrains to when Flux should role out an automated release would solve this issue but I do not think this feature would serve a large part of the community.

2. --update-all-images flag to the rescue

I had good hopes for this one but it does not seem to work either. First 1.5.0 did not adhere to tag filters and after an upgrade to 1.6.0 the Flux daemon still seems to depend on the cache.

From a pipeline perspective this seems unwanted behavior as you often want to build, push and release in a small timeframe. If you immediately instruct a fluxctl release after your push, the cache is not up to date yet causing the same issues as described above.

3. multiple --update-image=<specific image> flags and loop

Last attempt was a fluxctl release with multipe --update-image flags and a nasty do loop. This seemed not to work either as only the last flag was picked up. And indeed as @squaremo informed me on Slack:

It's not possible to provide more than one. Historical limitation (our first use case for flux was to update each image after it had been built in CI)


The easiest fix from a user perspective would be to have --update-all-images skip cache completely and just make a request to the registry but I am open to alternatives.

blocked-design dogfood enhancement

Most helpful comment

@stealthybox and I were talking about the issue of updating images and them not being in cache. Stealthbox suggested a possible --force flag that would skip the image cache entirely. Something to cut the time down to zero.

I have an issue where we build and immediately deploy this new image via fluxctl release. We know the image tag and everything. If we could bypass the cache or cut the time to 0 then this would be a huge help. Thinking a type of bypass would help in a lot of cases noted here. flux-recv cuts the time down for sure but we still have to wait which doesn't give the best experience to the user deploying new code to the cluster.

All 14 comments

The easiest fix from a user perspective would be to have --update-all-images skip cache completely and just make a request to the registry

The reason we don't do this is that it's very expensive, and fragile. Expensive because it means fetching the manifest for every tag of every image, to make sure we can correctly calculate which are the most recent. Fragile because if we can't a fetch just one of the manifests, we have to abandon the whole operation.

Going back to the use case, it sounds like you know exactly which images you want to update to (attempt 3.), but the problem is that at the time of asking, it's not guaranteed that flux will have noticed all of the new images.

Would it help if you could call fluxctl release --update-image=foo:v2 --update-image=bar:v2 and tell it go look for the images if it doesn't already know about them?

It would look like this (see -i and --refresh):

```
$ fluxctl help release
Release a new version of a controller.

Usage:
fluxctl release [flags]

Examples:
fluxctl release -n default --controller=deployment/foo --update-image=library/hello:v2
fluxctl release --all --update-image=library/hello:v2 --update-image=library/goodbye:v2
fluxctl release --controller=default:deployment/foo --update-all-images

Flags:
--all Release all controllers
-c, --controller strings List of controllers to release :/
--dry-run Do not release anything; just report back what would have been done
--exclude strings List of controllers to exclude
-f, --force Disregard locks and container image filters (has no effect when used with --all or --update-all-images)
-h, --help help for release
--interactive Select interactively which containers to update
-m, --message string attach a message to the update
-n, --namespace string Controller namespace (default "default")
--refresh Refresh metadata for the mentioned images, before attempting the release
--update-all-images Update all images to latest versions
-i, --update-image []string Update specific images
--user string override the user reported as initiating the update (default "Michael Bridgen mikeb@squaremobius.net")
-v, --verbose count include skipped (and ignored, with -vv) controllers in output

The reason we don't do this is that it's very expensive, and fragile. Expensive because it means fetching the manifest for every tag of every image, to make sure we can correctly calculate which are the most recent. Fragile because if we can't a fetch just one of the manifests, we have to abandon the whole operation.

Understandable now you describe it.

Would it help if you could call fluxctl release --update-image=foo:v2 --update-image=bar:v2 and tell it go look for the images if it doesn't already know about them?

It would look like this (see -i and --refresh):

:+1: this would be perfect.


For the time being I am going to loop on fluxctl list-images to detect when the metadata of the images has been updated. Which is quite nasty but ~should theoretically work~ works.

Although I'm loth to jump to solutions too quickly, I'm renaming this bug since we seem to agree on how to address the problem.

Notes:

  • this'll take a bit of extra protocol between the daemon code and the image DB, since at present the latter can only be told to refresh repos asynchronously
  • semantics: if the named image can't be found, the operation should fail

@squaremo wondering if any work has already been done on this? We (@arranf and I) are hitting this issue right now; we have a CI pipeline which pushes the image and then runs fluxctl release... a little while later. It seems to fail every time, but rerunning the job makes it pass, pretty sure this is because of the image polling thing. Don't really want to put a sleep 30 in there...

I too am having a similar problem with newly built images showing up as missing when pushed up. If we could have a way to refresh the DB as soon as the image is pushed to the registry would be a great help.

Are there any updates on this? We want to do something similar where we push a tag and then run a fluxctl release from our CI/CD pipelines.

I'm happy to give the --refresh flag a shot

  • this'll take a bit of extra protocol between the daemon code and the image DB, since at present the latter can only be told to refresh repos asynchronously

@squaremo do you have any thoughts around this and can maybe point me in the right direction?

Also, thanks for making flux available and all the hard work on it!

Could this be related to why I am randomly seeing:
Error: image "DOMAIN/REPO:TAG" does not exist: invalid image ID

When doing:
fluxctl release --k8s-fwd-ns flux --watch --timeout 2m --workload=NAMESPACE:deployment/DEPLOYMENTNAME--update-image=DOMAIN/REPO:TAG

in my pipeline, where I in the previous step pushed the new image to my repository?

I have a GitLab pipeline where I in one job build and push a new image with a specific tag.
In the following job I do the fluxctl release.
I just tried putting in some sleep 15 to account for timing, but the problem persists.

@MikaelElkiaer Unless you have webhooks set up, the duration between pushing a new image, and it being known by scan, is likely to be O(minutes). The scanner has to get around to that image repo (it processes them one by one); then it has to fetch metadata for the new tag and possibly others.

You can reduce this lag by installing webhooks - Weave Cloud does it for you, or the (rather new, so _caveat emptor_) https://github.com/fluxcd/flux-recv. This means fluxd will begin scanning as soon as it hears that a new image is available. But it is still not synchronous, so you'd have to do a sleep -- just a shorter one.

Found an acceptable workaround, by continuously checking fluxctl list-images.

For example (pardon my weak sh-fu):

while [ $(fluxctl --k8s-fwd-ns $FLUXNS list-images -w $WORKLOAD | grep $IMAGE_TAG | wc -l) -eq 0 ]; do sleep $WAIT_TIME; done

Edit: Damn, didn't notice this was already mentioned earlier. :/

The reason we don't do this is that it's very expensive, and fragile. Expensive because it means fetching the manifest for every tag of every image, to make sure we can correctly calculate which are the most recent. Fragile because if we can't a fetch just one of the manifests, we have to abandon the whole operation.

Understandable now you describe it.

Would it help if you could call fluxctl release --update-image=foo:v2 --update-image=bar:v2 and tell it go look for the images if it doesn't already know about them?

It would look like this (see -i and --refresh):

馃憤 this would be perfect.

For the time being I am going to loop on fluxctl list-images to detect when the metadata of the images has been updated. Which is quite nasty but ~should theoretically work~ works.

The easiest fix from a user perspective would be to have --update-all-images skip cache completely and just make a request to the registry

The reason we don't do this is that it's very expensive, and fragile. Expensive because it means fetching the manifest for every tag of every image, to make sure we can correctly calculate which are the most recent. Fragile because if we can't a fetch just one of the manifests, we have to abandon the whole operation.

Going back to the use case, it sounds like you know exactly which images you want to update to (attempt 3.), but the problem is that at the time of asking, it's not guaranteed that flux will have noticed all of the new images.

Would it help if you could call fluxctl release --update-image=foo:v2 --update-image=bar:v2 and tell it go look for the images if it doesn't already know about them?

"Expensive because it means fetching the manifest for every tag of every image"
What do you actually mean? Fetching the manifest from the repository for every tag of every image.

I only deploy one image? Why not checking just that "version" tag, in that particular manifest?

"Expensive because it means fetching the manifest for every tag of every image"
What do you actually mean? Fetching the manifest from the repository for every tag of every image.

This is specifically about the --update-all-images flag, which looks for the latest image to apply. To do this, it needs the metadata for all tags of the images so that it can find the right tag that adheres both to any image tag filters that may have been configured and is the latest available image for this filter.

Which is my personal rephrase of what Michael already said

The reason we don't do this is that it's very expensive, and fragile. Expensive because it means fetching the manifest for every tag of every image, to make sure we can correctly calculate which are the most recent. Fragile because if we can't a fetch just one of the manifests, we have to abandon the whole operation.


I only deploy one image? Why not checking just that "version" tag, in that particular manifest?

This is exactly the suggestion Michael made by saying

Would it help if you could call fluxctl release --update-image=foo:v2 --update-image=bar:v2 and tell it go look for the images if it doesn't already know about them?

@stealthybox and I were talking about the issue of updating images and them not being in cache. Stealthbox suggested a possible --force flag that would skip the image cache entirely. Something to cut the time down to zero.

I have an issue where we build and immediately deploy this new image via fluxctl release. We know the image tag and everything. If we could bypass the cache or cut the time to 0 then this would be a huge help. Thinking a type of bypass would help in a lot of cases noted here. flux-recv cuts the time down for sure but we still have to wait which doesn't give the best experience to the user deploying new code to the cluster.

Here everyone,

I am currently struggling also with a long wait between docker push and deployment.
The flux-recv does not work for me as I use the Gitlab container registry.

Is there any hope that this issue get solved?

I understood currently we have only these two options:

  • lower the docker registry scan interval
  • using flux-recv (gitlab registry seems not supported)
Was this page helpful?
0 / 5 - 0 ratings