Flux: Flux doesn't update images consistently

Created on 8 May 2019 · 19Comments · Source: fluxcd/flux

Describe the bug
Flux doesn't auto-update images consistently (after an expected behaviour of a week or so)

To Reproduce
Steps to reproduce the behaviour:
Configure flux the same as original except the following diff

The HelmRelease is as below:

apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  name: myapp
  namespace: dev
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.chart-image: regexp:\b[0-9a-f]{7}\b
    flux.weave.works/tag.proxy: regexp:\b[0-9a-f]{7}\b
spec:
  chart:
    git: [email protected]:myorg/my-releases.git
    path: charts/myapp
    ref: master
  values:
    image:
      repository: asia.gcr.io/myproj/myservice1
      tag: "1d83c62"
    proxy:
      image:
        tag: 925726a
        repository: asia.gcr.io/myproj/myservice2
      pausingThreshold: 24h
      pausingGracePeriod: 1h
    replicaCount: 1
    istioSidecarInject: "true"
    prometheus:
      port: 10003
      enable: true

This config worked perfectly well for a week or so. However, after a week the docker images were stopped being automatically updated into the git repo and hence couldn't be deployed.
There were no errors on the fluxd pod.
Workaround: On restarting the flux pod, it did update as expected immediately
However, would like to know if there is any issue with the configuration.
The same issue was observed for other HelmRelease files as well.

Expected behavior
Flux should automatically update images in the git repo and perform release consistently whenever there is new image in the container registry.

Logs
No relevant logs

Additional context
Add any other context about the problem here, e.g

Flux version: 1.12.0
Helm Operator version: 0.8.0
Kubernetes version: 1.12.6-gke.10
Git provider: github
Container registry provider: Google Container Registry

Source

koustubhg

Most helpful comment

I'm having this issue currently.

I can see the new image is available:

minehart-arch ~ » fluxctl list-images -n metabase   
                                  CONTAINER    IMAGE              CREATED
metabase:deployment/metabase-metabase     metabase     metabase/metabase  
                                                       |   latest         13 Jan 20 22:57 UTC
                                                       |   v0.34.1        13 Jan 20 22:57 UTC
                                                       '-> v0.34.0        20 Dec 19 01:11 UTC
                                                           v0.33.7.3      17 Dec 19 01:36 UTC
                                                           v0.33.7.2      17 Dec 19 00:47 UTC
                                                           v0.33.7.1      14 Dec 19 10:32 UTC
                                                           v0.34.0-rc1    14 Dec 19 02:36 UTC
                                                           v0.33.7        13 Dec 19 23:04 UTC
                                                           v0.33.6        19 Nov 19 21:24 UTC
                                                           v0.33.5.1      13 Nov 19 21:25 UTC
metabase:helmrelease/metabase-production  chart-image  metabase/metabase  
                                                       |   latest         13 Jan 20 22:57 UTC
                                                       |   v0.34.1        13 Jan 20 22:57 UTC
                                                       '-> v0.34.0        20 Dec 19 01:11 UTC
                                                           v0.33.7.3      17 Dec 19 01:36 UTC
                                                           v0.33.7.2      17 Dec 19 00:47 UTC
                                                           v0.33.7.1      14 Dec 19 10:32 UTC
                                                           v0.34.0-rc1    14 Dec 19 02:36 UTC
                                                           v0.33.7        13 Dec 19 23:04 UTC
                                                           v0.33.6        19 Nov 19 21:24 UTC
                                                           v0.33.5.1      13 Nov 19 21:25 UTC

Flux adds it to its release queue:

kminehart-arch ~ » kubectl logs -n flux flux-5bd5676869-d7vg4 | grep v0.34.1                      
...
ts=2020-01-21T22:01:52.409789362Z caller=images.go:111 component=sync-loop workload=metabase:helmrelease/metabase-production container=chart-image repo=metabase/metabase pattern=glob:* current=metabase/metabase:v0.34.0 info="added update to automation run" new=metabase/metabase:v0.34.1 reason="latest v0.34.1 (2020-01-13 22:57:42.226921 +0000 UTC) > current v0.34.0 (2019-12-20 01:11:38.1905654 +0000 UTC)"

The workload for chart-image is set to automated.

kminehart-arch ~ » fluxctl list-workloads -n metabase                                                                                                                                                                                                    130 ↵
WORKLOAD                                  CONTAINER    IMAGE                      RELEASE   POLICY
metabase:deployment/metabase-metabase     metabase     metabase/metabase:v0.34.0  ready     
metabase:helmrelease/metabase-production  chart-image  metabase/metabase:v0.34.0  deployed  automated

And if I manually trigger the deployment:

kminehart-arch ~ » fluxctl release --workload=metabase:helmrelease/metabase-production --update-image=metabase/metabase:v0.34.1                                                                                                                            1 ↵
Submitting release ...
WORKLOAD                                  STATUS   UPDATES
metabase:helmrelease/metabase-production  success  chart-image: metabase/metabase:v0.34.0 -> v0.34.1
Commit pushed:  5e7c3d5
Commit applied: 5e7c3d5

It applies the commits:

Is there anything I missed? Flux is able to see the new image, it's able to write to my git repository, the helmrelease is set to automated. It randomly stopped working on December 31st.

kminehart on 21 Jan 2020

👍4

All 19 comments

There doesn't appear to be any problem with the configuration.

Things that could be not happening:

fluxd stops scanning for new images
fluxd doesn't see new images when it scans for them
fluxd doesn't recognise new tags as being more recent than what is deployed
the release process fails

If you grep the fluxd logs for of_which_missing you will see all the log lines about updating the image database. Those should indicate whether fluxd is still scanning, and if so, whether it notices new tags or not (cases 1 and 2). If these log messages stop being output, that's a sign that it's stopped scanning.

When fluxd makes an automated release of a new image, it logs the decision with the message added update to automation run so you can grep for that to see if it has attempted to do automated releases. If not, it may have messages marked as warning or error suggesting what happened.

squaremo on 8 May 2019

We are also seeing similar behaviour - the excerpted helm release below is stuck on 0.1.58 despite the latest tag being 0.1.150:

apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.chart-image: semver:~0.1
  name: bar
  namespace: bar
spec:
  chart:
    git: [email protected]:foo/bar
    path: a/b/c
    ref:   master
  releaseName: bar
  values:
    image:
      tag: 0.1.58

grepping the logs for of_which_missing as suggested above shows Flux is no longer scanning for updates - what would cause it to stop?

benhartley on 8 May 2019

Flux is no longer scanning for updates - what would cause it to stop?

Prior to 1.12.1, the image scanning could get stuck when the image registry kept an HTTP connection open indefinitely (and possibly other similar conditions). fluxd now puts timeouts on all those operations, so shouldn't get stalled in that way. To date we'd only seen that behaviour with the Azure image registry.

@benhartley Are you using fluxd <1.12.1?

squaremo on 8 May 2019

Flux has been upgraded in the cluster to 1.12.2 (incrementally - so prior to today it was on 1.12.1).

It's possible that this problem started some time ago (hence the large drift in version numbers for the helm release). So we might have been on <1.12.1 when it started.

Is it possible that if it started a long time ago, it would persist despite Flux having since been upgraded?

benhartley on 8 May 2019

Is it possible that if it started a long time ago, it would persist despite Flux having since been upgraded?

Any upgrade would imply restarting the fluxd pod, and restarting the pod would drop any connections that were causing problems. How far does it get before it stops logging messages about scanning? Are there any log messages about rate limiting? (you would see "quota exceeded" or "reducing rate limit" in the message)

squaremo on 8 May 2019

I only have the logs for today since the 1.12.1 upgrade, and there is no mention of scanning for docs at all

No log messages around rate limiting either

benhartley on 8 May 2019

@benhartley two weeks have passed since your last comment, did the issue surface again or is everything working as expected?

hiddeco on 22 May 2019

Thanks for following up @hiddeco - our issue is fixed. The problem was that I was mistakenly expecting Flux to discover the image repository from the Helm chart values. Once I added it to the HelmRelease the automated updates started working again.

benhartley on 22 May 2019

@benhartley awesome! Glad it has been resolved.

I'll close this issue to get it off our radar, if you encounter a new problem, you know where to find us.

hiddeco on 22 May 2019

I should point out I wasn't the OP on this issue, don't know whether @koustubhg still has an issue

benhartley on 22 May 2019

I should point out I wasn't the OP on this issue

You are right, _reopens_.

@koustubhg, did the issue surface again or is everything working as expected?

hiddeco on 22 May 2019

Having similar behavior with pure yaml (not helm charts).
Seems to be related to image creation timestamp issues (zero ts in metadata, taking ts from first layer, etc.)

rjanovski on 3 Jun 2019

I'm having a similar issue to @rjanovski

J-Lou on 19 Jun 2019

@rjanovski @J-Lou do you have fluxctl installed? What is the output of fluxctl list-images --workload <namespace>:<kind>/name?

hiddeco on 20 Jun 2019

Seems to be related to image creation timestamp issues (zero ts in metadata, taking ts from first layer, etc.)

@rjanovski Can you elaborate on what you think is happening, please?

squaremo on 20 Jun 2019

@squaremo for example, if building with Kaniko, --reproducible strips timestamps from layers for bit-level layer reproducibility.

airmap-madison on 22 Jun 2019

for example, if building with Kaniko, --reproducible strips timestamps from layers for bit-level layer reproducibility.

I see, OK. Yes, that would certainly cause flux problems -- it would just not do any upgrades, because it can't tell if any image is more recent than any other image. Is that what you are seeing @rjanovski?

squaremo on 24 Jun 2019

I am experiencing the same issue with flux 1.14.2.
Before it was happening due to this bug: 81 which does not occur again.
After updating helm operator to 1.0.0-rc2 it went away for a while. Now it seems to detect the image update looking at of_which_missing occurences in the fluxd logs. But it's hard to trace the rest of the flow and where it fails.

mar1n3r0 on 31 Oct 2019

I'm having this issue currently.

I can see the new image is available:

minehart-arch ~ » fluxctl list-images -n metabase   
                                  CONTAINER    IMAGE              CREATED
metabase:deployment/metabase-metabase     metabase     metabase/metabase  
                                                       |   latest         13 Jan 20 22:57 UTC
                                                       |   v0.34.1        13 Jan 20 22:57 UTC
                                                       '-> v0.34.0        20 Dec 19 01:11 UTC
                                                           v0.33.7.3      17 Dec 19 01:36 UTC
                                                           v0.33.7.2      17 Dec 19 00:47 UTC
                                                           v0.33.7.1      14 Dec 19 10:32 UTC
                                                           v0.34.0-rc1    14 Dec 19 02:36 UTC
                                                           v0.33.7        13 Dec 19 23:04 UTC
                                                           v0.33.6        19 Nov 19 21:24 UTC
                                                           v0.33.5.1      13 Nov 19 21:25 UTC
metabase:helmrelease/metabase-production  chart-image  metabase/metabase  
                                                       |   latest         13 Jan 20 22:57 UTC
                                                       |   v0.34.1        13 Jan 20 22:57 UTC
                                                       '-> v0.34.0        20 Dec 19 01:11 UTC
                                                           v0.33.7.3      17 Dec 19 01:36 UTC
                                                           v0.33.7.2      17 Dec 19 00:47 UTC
                                                           v0.33.7.1      14 Dec 19 10:32 UTC
                                                           v0.34.0-rc1    14 Dec 19 02:36 UTC
                                                           v0.33.7        13 Dec 19 23:04 UTC
                                                           v0.33.6        19 Nov 19 21:24 UTC
                                                           v0.33.5.1      13 Nov 19 21:25 UTC

Flux adds it to its release queue:

kminehart-arch ~ » kubectl logs -n flux flux-5bd5676869-d7vg4 | grep v0.34.1                      
...
ts=2020-01-21T22:01:52.409789362Z caller=images.go:111 component=sync-loop workload=metabase:helmrelease/metabase-production container=chart-image repo=metabase/metabase pattern=glob:* current=metabase/metabase:v0.34.0 info="added update to automation run" new=metabase/metabase:v0.34.1 reason="latest v0.34.1 (2020-01-13 22:57:42.226921 +0000 UTC) > current v0.34.0 (2019-12-20 01:11:38.1905654 +0000 UTC)"

The workload for chart-image is set to automated.

kminehart-arch ~ » fluxctl list-workloads -n metabase                                                                                                                                                                                                    130 ↵
WORKLOAD                                  CONTAINER    IMAGE                      RELEASE   POLICY
metabase:deployment/metabase-metabase     metabase     metabase/metabase:v0.34.0  ready     
metabase:helmrelease/metabase-production  chart-image  metabase/metabase:v0.34.0  deployed  automated

And if I manually trigger the deployment:

kminehart-arch ~ » fluxctl release --workload=metabase:helmrelease/metabase-production --update-image=metabase/metabase:v0.34.1                                                                                                                            1 ↵
Submitting release ...
WORKLOAD                                  STATUS   UPDATES
metabase:helmrelease/metabase-production  success  chart-image: metabase/metabase:v0.34.0 -> v0.34.1
Commit pushed:  5e7c3d5
Commit applied: 5e7c3d5

It applies the commits:

Is there anything I missed? Flux is able to see the new image, it's able to write to my git repository, the helmrelease is set to automated. It randomly stopped working on December 31st.

kminehart on 21 Jan 2020

👍4

Was this page helpful?

0 / 5 - 0 ratings