Describe the bug
Flux doesn't auto-update images consistently (after an expected behaviour of a week or so)
To Reproduce
Steps to reproduce the behaviour:
Configure flux the same as original except the following diff

The HelmRelease is as below:
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: myapp
namespace: dev
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: regexp:\b[0-9a-f]{7}\b
flux.weave.works/tag.proxy: regexp:\b[0-9a-f]{7}\b
spec:
chart:
git: [email protected]:myorg/my-releases.git
path: charts/myapp
ref: master
values:
image:
repository: asia.gcr.io/myproj/myservice1
tag: "1d83c62"
proxy:
image:
tag: 925726a
repository: asia.gcr.io/myproj/myservice2
pausingThreshold: 24h
pausingGracePeriod: 1h
replicaCount: 1
istioSidecarInject: "true"
prometheus:
port: 10003
enable: true
This config worked perfectly well for a week or so. However, after a week the docker images were stopped being automatically updated into the git repo and hence couldn't be deployed.
There were no errors on the fluxd pod.
Workaround: On restarting the flux pod, it did update as expected immediately
However, would like to know if there is any issue with the configuration.
The same issue was observed for other HelmRelease files as well.
Expected behavior
Flux should automatically update images in the git repo and perform release consistently whenever there is new image in the container registry.
Logs
No relevant logs
Additional context
Add any other context about the problem here, e.g
There doesn't appear to be any problem with the configuration.
Things that could be not happening:
If you grep the fluxd logs for of_which_missing you will see all the log lines about updating the image database. Those should indicate whether fluxd is still scanning, and if so, whether it notices new tags or not (cases 1 and 2). If these log messages stop being output, that's a sign that it's stopped scanning.
When fluxd makes an automated release of a new image, it logs the decision with the message added update to automation run so you can grep for that to see if it has attempted to do automated releases. If not, it may have messages marked as warning or error suggesting what happened.
We are also seeing similar behaviour - the excerpted helm release below is stuck on 0.1.58 despite the latest tag being 0.1.150:
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: semver:~0.1
name: bar
namespace: bar
spec:
chart:
git: [email protected]:foo/bar
path: a/b/c
ref: master
releaseName: bar
values:
image:
tag: 0.1.58
grepping the logs for of_which_missing as suggested above shows Flux is no longer scanning for updates - what would cause it to stop?
Flux is no longer scanning for updates - what would cause it to stop?
Prior to 1.12.1, the image scanning could get stuck when the image registry kept an HTTP connection open indefinitely (and possibly other similar conditions). fluxd now puts timeouts on all those operations, so shouldn't get stalled in that way. To date we'd only seen that behaviour with the Azure image registry.
@benhartley Are you using fluxd <1.12.1?
Flux has been upgraded in the cluster to 1.12.2 (incrementally - so prior to today it was on 1.12.1).
It's possible that this problem started some time ago (hence the large drift in version numbers for the helm release). So we might have been on <1.12.1 when it started.
Is it possible that if it started a long time ago, it would persist despite Flux having since been upgraded?
Is it possible that if it started a long time ago, it would persist despite Flux having since been upgraded?
Any upgrade would imply restarting the fluxd pod, and restarting the pod would drop any connections that were causing problems. How far does it get before it stops logging messages about scanning? Are there any log messages about rate limiting? (you would see "quota exceeded" or "reducing rate limit" in the message)
I only have the logs for today since the 1.12.1 upgrade, and there is no mention of scanning for docs at all
No log messages around rate limiting either
@benhartley two weeks have passed since your last comment, did the issue surface again or is everything working as expected?
Thanks for following up @hiddeco - our issue is fixed. The problem was that I was mistakenly expecting Flux to discover the image repository from the Helm chart values. Once I added it to the HelmRelease the automated updates started working again.
@benhartley awesome! Glad it has been resolved.
I'll close this issue to get it off our radar, if you encounter a new problem, you know where to find us.
I should point out I wasn't the OP on this issue, don't know whether @koustubhg still has an issue
I should point out I wasn't the OP on this issue
You are right, _reopens_.
@koustubhg, did the issue surface again or is everything working as expected?
Having similar behavior with pure yaml (not helm charts).
Seems to be related to image creation timestamp issues (zero ts in metadata, taking ts from first layer, etc.)
I'm having a similar issue to @rjanovski
@rjanovski @J-Lou do you have fluxctl installed? What is the output of fluxctl list-images --workload <namespace>:<kind>/name?
Seems to be related to image creation timestamp issues (zero ts in metadata, taking ts from first layer, etc.)
@rjanovski Can you elaborate on what you think is happening, please?
@squaremo for example, if building with Kaniko, --reproducible strips timestamps from layers for bit-level layer reproducibility.
for example, if building with Kaniko,
--reproduciblestrips timestamps from layers for bit-level layer reproducibility.
I see, OK. Yes, that would certainly cause flux problems -- it would just not do any upgrades, because it can't tell if any image is more recent than any other image. Is that what you are seeing @rjanovski?
I am experiencing the same issue with flux 1.14.2.
Before it was happening due to this bug: 81 which does not occur again.
After updating helm operator to 1.0.0-rc2 it went away for a while. Now it seems to detect the image update looking at of_which_missing occurences in the fluxd logs. But it's hard to trace the rest of the flow and where it fails.
I'm having this issue currently.
I can see the new image is available:
minehart-arch ~ » fluxctl list-images -n metabase
CONTAINER IMAGE CREATED
metabase:deployment/metabase-metabase metabase metabase/metabase
| latest 13 Jan 20 22:57 UTC
| v0.34.1 13 Jan 20 22:57 UTC
'-> v0.34.0 20 Dec 19 01:11 UTC
v0.33.7.3 17 Dec 19 01:36 UTC
v0.33.7.2 17 Dec 19 00:47 UTC
v0.33.7.1 14 Dec 19 10:32 UTC
v0.34.0-rc1 14 Dec 19 02:36 UTC
v0.33.7 13 Dec 19 23:04 UTC
v0.33.6 19 Nov 19 21:24 UTC
v0.33.5.1 13 Nov 19 21:25 UTC
metabase:helmrelease/metabase-production chart-image metabase/metabase
| latest 13 Jan 20 22:57 UTC
| v0.34.1 13 Jan 20 22:57 UTC
'-> v0.34.0 20 Dec 19 01:11 UTC
v0.33.7.3 17 Dec 19 01:36 UTC
v0.33.7.2 17 Dec 19 00:47 UTC
v0.33.7.1 14 Dec 19 10:32 UTC
v0.34.0-rc1 14 Dec 19 02:36 UTC
v0.33.7 13 Dec 19 23:04 UTC
v0.33.6 19 Nov 19 21:24 UTC
v0.33.5.1 13 Nov 19 21:25 UTC
Flux adds it to its release queue:
kminehart-arch ~ » kubectl logs -n flux flux-5bd5676869-d7vg4 | grep v0.34.1
...
ts=2020-01-21T22:01:52.409789362Z caller=images.go:111 component=sync-loop workload=metabase:helmrelease/metabase-production container=chart-image repo=metabase/metabase pattern=glob:* current=metabase/metabase:v0.34.0 info="added update to automation run" new=metabase/metabase:v0.34.1 reason="latest v0.34.1 (2020-01-13 22:57:42.226921 +0000 UTC) > current v0.34.0 (2019-12-20 01:11:38.1905654 +0000 UTC)"
The workload for chart-image is set to automated.
kminehart-arch ~ » fluxctl list-workloads -n metabase 130 ↵
WORKLOAD CONTAINER IMAGE RELEASE POLICY
metabase:deployment/metabase-metabase metabase metabase/metabase:v0.34.0 ready
metabase:helmrelease/metabase-production chart-image metabase/metabase:v0.34.0 deployed automated
And if I manually trigger the deployment:
kminehart-arch ~ » fluxctl release --workload=metabase:helmrelease/metabase-production --update-image=metabase/metabase:v0.34.1 1 ↵
Submitting release ...
WORKLOAD STATUS UPDATES
metabase:helmrelease/metabase-production success chart-image: metabase/metabase:v0.34.0 -> v0.34.1
Commit pushed: 5e7c3d5
Commit applied: 5e7c3d5
It applies the commits:

Is there anything I missed? Flux is able to see the new image, it's able to write to my git repository, the helmrelease is set to automated. It randomly stopped working on December 31st.
Most helpful comment
I'm having this issue currently.
I can see the new image is available:
Flux adds it to its release queue:
The workload for chart-image is set to
automated.And if I manually trigger the deployment:
It applies the commits:
Is there anything I missed? Flux is able to see the new image, it's able to write to my git repository, the helmrelease is set to automated. It randomly stopped working on December 31st.