Describe the bug
Can't perform fluxctl release for new release, not yet committed to git
To Reproduce
Steps to reproduce the behaviour:
Expected behavior
Change to new tag, add tag to git by fluxd
Logs
$ fluxctl list-images --workload backend:statefulset/backend
E0805 18:46:19.076371 6706 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:39745->127.0.0.1:38966: read: connection reset by peer
WORKLOAD CONTAINER IMAGE CREATED
backend:statefulset/backend backend gcr.io/titanium-messenger-001/mmts-backend
| k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3 05 Aug 19 09:13 UTC
| k8s-WEL-220-futures-to-tasks-76a18b9b071b5b80d2f7d4f45565ac53cd6c9422 05 Aug 19 07:16 UTC
| k8s-WEL-220-futures-to-tasks-3a5d445a8c57b22832316def36e5ec611a261248 05 Aug 19 07:13 UTC
| k8s-WEL-220-futures-to-tasks-22eb692819799d733c80efb575f89278c6a511bc 05 Aug 19 07:12 UTC
| k8s-WEL-220-futures-to-tasks-0e1d936ae8dae40638475dd0c5155a204d64345a 05 Aug 19 07:03 UTC
| k8s-latest 02 Aug 19 11:54 UTC
| k8s-master-40f8de44031ab06b2aec060cf603e38a65a44803 02 Aug 19 11:54 UTC
'-> k8s-master-1d85e954134a0b576c8e3715a69d40a7cd362573 02 Aug 19 11:27 UTC
k8s-fix-program-durations-26e19d3426245d17864f2a9487913f0796c876ea 01 Aug 19 10:42 UTC
k8s-master-41e13458a943122e70ccdeb28c510271bc30a1fd 31 Jul 19 12:36 UTC
check-postgres-availability sorintlab/stolon
| v0.14.0-pg11 31 Jul 19 09:42 UTC
| v0.14.0-pg10 31 Jul 19 09:42 UTC
| v0.14.0-pg9.6 31 Jul 19 09:42 UTC
| v0.14.0-pg9.5 31 Jul 19 09:42 UTC
| v0.14.0-pg9.4 31 Jul 19 09:42 UTC
| master-pg9.4 31 Jul 19 09:36 UTC
| master-pg9.5 31 Jul 19 09:36 UTC
| master-pg9.6 31 Jul 19 09:36 UTC
'-> master-pg10 31 Jul 19 09:36 UTC
master-pg11 31 Jul 19 09:35 UTC
$ fluxctl release --workload=backend:statefulset/backend --update-image=gcr.io/titanium-messenger-001/mmts-backend:k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3
Submitting release ...
Error: verifying changes: failed to verify changes: the image for container "backend" in resource "backend:statefulset/backend" should be "gcr.io/titanium-messenger-001/mmts-backend:k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3", but is "gcr.io/titanium-messenger-001/mmts-backend:k8s-master-1d85e954134a0b576c8e3715a69d40a7cd362573"
Run 'fluxctl release --help' for usage.
$ k logs -f deploy/flux
ts=2019-08-05T15:46:32.513445458Z caller=loop.go:119 component=sync-loop jobID=7a7325ed-234d-87a5-7b3e-82ab5781c465 state=in-progress
ts=2019-08-05T15:46:35.070573653Z caller=releaser.go:59 component=sync-loop jobID=7a7325ed-234d-87a5-7b3e-82ab5781c465 type=release updates=1
ts=2019-08-05T15:46:36.968192639Z caller=loop.go:129 component=sync-loop jobID=7a7325ed-234d-87a5-7b3e-82ab5781c465 state=done success=false err="verifying changes: failed to verify changes: the image for container \"backend\" in resource \"backend:statefulset/backend\" should be \"gcr.io/titanium-messenger-001/mmts-backend:k8s-master-7a7a4419fbca5e3c9dd307443b7fac0de1f5a6d3\", but is \"gcr.io/titanium-messenger-001/mmts-backend:k8s-master-1d85e954134a0b576c8e3715a69d40a7cd362573\""
ts=2019-08-05T15:47:07.863810739Z caller=loop.go:111 component=sync-loop event=refreshed [email protected]:oktossm/gitops.git branch=master HEAD=17ec311f1391b321f1e4af855ee2a86861e928d5
Additional context
Add any other context about the problem here, e.g
Thanks a bunch in advance!
This kind of problem is occasionally due to the YAML update program not coping with a particular file. Do you mind posting (or gisting) the YAML file in question?
Sure! but gonna submit another one, with same issues:
$ fluxctl list-images --workload stolon:deployment/stolon-proxy
WORKLOAD CONTAINER IMAGE CREATED
stolon:deployment/stolon-proxy stolon-proxy sorintlab/stolon
| v0.14.0-pg11 31 Jul 19 09:42 UTC
| v0.14.0-pg10 31 Jul 19 09:42 UTC
| v0.14.0-pg9.6 31 Jul 19 09:42 UTC
| v0.14.0-pg9.5 31 Jul 19 09:42 UTC
| v0.14.0-pg9.4 31 Jul 19 09:42 UTC
| master-pg9.4 31 Jul 19 09:36 UTC
| master-pg9.5 31 Jul 19 09:36 UTC
| master-pg9.6 31 Jul 19 09:36 UTC
'-> master-pg10 31 Jul 19 09:36 UTC
master-pg11 31 Jul 19 09:35 UTC
$
$ fluxctl release --workload=stolon:deployment/stolon-proxy --update-image=sorintlab/stolon:v0.14.0-pg10
Submitting release ...
Error: verifying changes: failed to verify changes: the image for container "stolon-proxy" in resource "stolon:deployment/stolon-proxy" should be "sorintlab/stolon:v0.14.0-pg10", but is "sorintlab/stolon:master-pg10"
Run 'fluxctl release --help' for usage.
$
stolon-proxy-deployment.yaml:
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: stolon-proxy
namespace: stolon
spec:
replicas: 2
template:
metadata:
labels:
component: stolon-proxy
stolon-cluster: kube-stolon
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
containers:
- name: stolon-proxy
image: sorintlab/stolon:master-pg10
command:
- "/bin/bash"
- "-ec"
- |
exec gosu stolon stolon-proxy
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: STPROXY_CLUSTER_NAME
valueFrom:
fieldRef:
fieldPath: metadata.labels['stolon-cluster']
- name: STPROXY_STORE_BACKEND
value: "kubernetes"
- name: STPROXY_KUBE_RESOURCE_KIND
value: "configmap"
- name: STPROXY_LISTEN_ADDRESS
value: "0.0.0.0"
- name: STPROXY_METRICS_LISTEN_ADDRESS
value: "0.0.0.0:8080"
ports:
- name: postgres
containerPort: 5432
- name: proxy
containerPort: 8080
readinessProbe:
tcpSocket:
port: postgres
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
tcpSocket:
port: postgres
initialDelaySeconds: 10
timeoutSeconds: 5
resources:
limits:
cpu: "100m"
memory: 128Mi
requests:
cpu: "100m"
memory: 128Mi
BTW We do have manifest-generation=true and sync-garbage-collection
Can it be possible that error is more high-level? Don't think all my yaml's are broken, since they are checked with yamllint + kubeval.
$ fluxctl list-images --workload kurento:deployment/kurento
WORKLOAD CONTAINER IMAGE CREATED
kurento:deployment/kurento kurento kurento/kurento-media-server
| 6 19 Jul 19 19:15 UTC
| 6.11 19 Jul 19 19:15 UTC
| 6.11.0 19 Jul 19 19:15 UTC
| 6.11.0-20190719195344 19 Jul 19 19:15 UTC
| latest 19 Jul 19 19:15 UTC
'-> 6.10 04 Apr 19 13:15 UTC
6.10.0 04 Apr 19 13:15 UTC
6.10.0-20190404150939 04 Apr 19 13:15 UTC
6.9.0 19 Dec 18 13:09 UTC
6.9.0-20181219 19 Dec 18 13:09 UTC
$ fluxctl release --workload=kurento:deployment/kurento --update-image=kurento/kurento-media-server:6.11.0
Submitting release ...
Error: verifying changes: failed to verify changes: the image for container "kurento" in resource "kurento:deployment/kurento" should be "kurento/kurento-media-server:6.11.0", but is "kurento/kurento-media-server:6.10"
Run 'fluxctl release --help' for usage.
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: kurento
namespace: kurento
labels:
app: kurento
spec:
replicas: 1
selector:
matchLabels:
app: kurento
template:
metadata:
labels:
app: kurento
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- kurento
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 1800
containers:
- name: kurento
image: kurento/kurento-media-server:6.10
args:
- "--modules-config-path=/etc/kurento-modules/..data"
imagePullPolicy: Always
ports:
- containerPort: 8888
name: intra-node
resources:
limits:
cpu: "100m"
memory: 256Mi
requests:
cpu: "100m"
memory: 256Mi
readinessProbe:
exec:
command:
- pgrep
- kurento
initialDelaySeconds: 30
timeoutSeconds: 5
livenessProbe:
tcpSocket:
port: intra-node
initialDelaySeconds: 30
timeoutSeconds: 5
volumeMounts:
- name: kurento-conf-json
mountPath: "/etc/kurento"
- name: kurento-modules
mountPath: "/etc/kurento-modules"
volumes:
- name: kurento-conf-json
configMap:
name: kurento-conf-json
- name: kurento-modules
configMap:
name: kurento-modules
Created new cluster on GKE with flux.
Deployed only kurento, full deployment yaml is here: https://gist.github.com/yellowmegaman/5847e9f79ce783cccb5b50d77fda9b4e
Tried to update it with same result.
Full flux args:
name = "flux"
image = "docker.io/weaveworks/flux:1.13.1"
args = ["--memcached-service=", "--git-timeout=100s", "--ssh-keygen-dir=/var/fluxd/keygen", "[email protected]:oktossm/gitops.git", "--git-branch=${var.cluster_name}", "--listen-metrics=:3031", "--git-poll-interval=3m0s", "--sync-interval=3m0s", "--sync-garbage-collection", "--git-path=defaultcluster", "--git-ci-skip-message=[SKIP CI]", "--git-label=flux${var.cluster_name}", "--manifest-generation=true"]
If you place the deployment spec in its own file without any separators, does it work?
@stefanprodan ok that's what I did:
If I remove ---, then kubeval check won't pass, and kubectl won't accept that too.
$ kubectl apply -f workloads/kurento/kurento-Combined.yaml
error: error validating "workloads/kurento/kurento-Combined.yaml": error validating data: ValidationError(Service): unknown field "data" in io.k8s.api.core.v1.Service; if you choose to ignore these errors, turn validation off with --validate=false
@stefanprodan Just understood what you actually meant. Previously, when just reported, all resources were in their own separate files in workloads/kurento folder.
But I had to use separators because I use manifest-generation=true, and according to docs, flux just read STDIN and there have to separated, since they are all merged in STDIN in one thing
Here is my .flux.yaml
$ cat .flux.yaml
---
version: 1
commandUpdated:
generators:
- command: >-
cat namespaces/* | sed 's/envplaceholder/'$ENVNAME'/g'
- command: >-
cd workloads && cat */* | sed 's/envplaceholder/'$ENVNAME'/g' | sed 's/lbipplaceholder/'$LBIP'/g' | sed 's/domainplaceholder/'$ENVDOMAIN'/g'
It is used to achieve basic templating with env variables supplied to flux container.
Was able to reproduce the issue with the plain deployment manifest from https://github.com/fluxcd/flux/issues/2324#issuecomment-521669796. Not precisely clear yet what goes wrong, except for that it happens during the calculation of updates.
—
But succeeded in releasing a lean deployment. @yellowmegaman are the workloads you are trying to release in a healthy state?
@hiddeco sorry, didn't see comment update. Yeah, they are 100% healthy.
I'm now encountering this issue too, relevant info posted below that's causing this. Unfortunately the image is in a private repo using the commit SHA, so I cannot share
Flux version: 1.14.2
--sync-garbage-collection is on.
Command
fluxctl release --force --workload=redoc:deployment/api-docs -i eu.gcr.io/project/folder/api-spec:1c2884e26f1f5184796b67116665c6f6b8cc1671 --k8s-fwd-ns flux
Logs
ts=2019-10-07T14:11:27.031542995Z caller=images.go:111 component=sync-loop workload=redoc:deployment/api-docs container=redoc repo=eu.gcr.io/project/folder/api-spec pattern=glob:* current=eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251 info="added update to automation run" new=eu.gcr.io/project/folder/api-spec:1c2884e26f1f5184796b67116665c6f6b8cc1671 reason="latest 1c2884e26f1f5184796b67116665c6f6b8cc1671 (2019-10-07 11:39:58.311378805 +0000 UTC) > current 3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251 (2019-10-07 09:46:36.468467058 +0000 UTC)"
...
ts=2019-10-07T14:15:51.986109515Z caller=loop.go:144 component=sync-loop jobID=be2b89b3-bea6-6d32-bb77-5d6321dda3d3 state=done success=false err="verifying changes: failed to verify changes: the image for container "redoc" in resource "redoc:deployment/api-docs" should be "eu.gcr.io/project/folder/api-spec:1c2884e26f1f5184796b67116665c6f6b8cc1671", but is "eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251""
deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-docs
labels:
app: api-docs
annotations:
fluxcd.io/automated: "true"
spec:
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: api-docs
template:
metadata:
labels:
app: api-docs
spec:
containers:
- name: redoc
image: eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 10m
memory: 64Mi
.flux.yaml
version: 1
commandUpdated:
generators:
- command: kustomize build .
patchFile: patch.yaml
List images
$ fluxctl list-images --workload=redoc:deployment/api-docs --k8s-fwd-ns flux
WORKLOAD CONTAINER IMAGE CREATED
legacy-api:deployment/api-docs redoc eu.gcr.io/project/folder/api-spec
| 1c2884e26f1f5184796b67116665c6f6b8cc1671 07 Oct 19 11:39 UTC
| 3751eed958e151b091da864ee808ccf7a22bb773 07 Oct 19 11:30 UTC
'-> 3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251 07 Oct 19 09:46 UTC
2820bbd012208ebc0e6c089e073ba9215890ca4b 27 Sep 19 10:21 UTC
List workloads
$ fluxctl list-workloads -a --k8s-fwd-ns flux | grep redoc
redoc:deployment/api-docs redoc eu.gcr.io/project/folder/api-spec:3fdd26d30ea2017a6cc35cd2268dfa73c2d2e251 ready automated
i have meet this 。is it resolved?
I see the error is sourced from https://github.com/fluxcd/flux/blob/07b3d3d608dc53738d9778d6dd23355e8f50e550/pkg/release/releaser.go#L124-L126
But surely when releasing, the image is _expected_ to change?
@tobbbles kubeyaml, the Python tool we use to selectively patch the YAML doesn't work well when there is no namespace present in the resource Flux is trying to write changes to. I expect things to work when you resolve this(, and we should adjust kubeyaml so that it works properly without a namespace present).
@tobbbles In your case, the release is failing (at least) because you have commandUpdated in .flux.yaml, rather than patchUpdated, so the patch file is ignored.
This issue is labelled integrations/kustomize because the fear was that kustomize configurations will often have manifests missing a namespace, with the namespace added later by a kustomization. And this is certainly a problem if you use commandUpdated, because it will either try to operate on the base files (no namespace) or it'll just not attempt any updates to files, thus the complaint about no changes. The problem is really that commandUpdated isn't suited to working with kustomize configurations, unless you build your own patches in an update command.
For the purpose of making it work with kustomize better, I can offer a couple of things:
.flux.yaml files could be checked against a schema, so that problems like the mismatched commandUpdated/patchFile are exposedthe release is failing (at least) because you have
commandUpdatedin.flux.yaml
@yellowmegaman You appear to have a similar problem: you haven't supplied any commands to update the files, so when fluxd tries to change do a release, nothing gets changed. The second point above will make this more obvious: if there are no update commands given, fluxd should refuse update operations.
@squaremo Thank you; that was absolutely the issue, and using patchUpdated resolved all of the release issues I was experiencing.
I think being able to validate .flux.yaml (potentially from fluxctl) would be very valuable, especially for new users such as myself.