Hello,
I have a problem with release HELM chart which is using ECR based image.
According this: https://github.com/weaveworks/flux/blob/master/site/faq.md
Flux should have support for ECR.
Amazon Elastic Container Registry (ECR) has its own authentication using IAM. If your worker nodes can read from ECR, then Flux will be able to access it too.
but in my case:
fluxctl --k8s-fwd-ns=flux release \
--controller=XXX:helmrelease/XXX \
--update-image=YYY.dkr.ecr.eu-west-1.amazonaws.com/XXX:latest
gives me
ts=2019-03-13T11:35:24.66966609Z caller=loop.go:123 component=sync-loop jobID=31682c88-c410-0c6e-1cce-09f82159044b state=done success=false err="image \"YYY.dkr.ecr.eu-west-1.amazonaws.com/XXX:latest\" does not exist: invalid image ID
when i try
fluxctl --k8s-fwd-ns=flux list-images -c XXX:helmrelease/XXX
I get:
ts=2019-03-13T11:36:49.33549696Z caller=images.go:155 component=daemon err="fetching image metadata for YYY.dkr.ecr.eu-west-1.amazonaws.com/XXX: Get https://YYY.dkr.ecr.eu-west-1.amazonaws.com/v2/XXX/tags/list: no basic auth credentials"
But when HELM chart is installed from downloaded GIT, image is downloaded correctly and package works.
Also Flux values works:
ecr:
region: eu-west-1
includeId: YYY
excludeId: 602401143452
EC2 instance has access to ECR, I am also running KIAM inside EKS, so I have granted rights (using IAM role in kiam) to both pods - flux and flux-helm-operator.
I have installed using command:
helm repo add weaveworks https://weaveworks.github.io/flux
helm install --name flux --namespace flux -f ./flux-values.yaml weaveworks/flux
Can you please advise where is the problem and how to solve it so flux has access to ECR image?
Thx
Thanks for the great bug report! It has everything: what you tried, what you expected to happen, relevant log lines, etc.. This makes my job so much easier :smile:
But when HELM chart is installed from downloaded GIT, image is downloaded correctly and package works.
Just to check: is this stating that when you install the Helm chart using helm, all the deployments and so on in the chart start up and run fine? (i.e., does this demonstrate that the worker nodes can fetch the images you're using?)
What happens if you remove the region: field from the values you give to the chart, does it detect that it's in eu-west-1? You should see a log line like
... info="detected cluster region", region=eu-west-1
Assuming it does detect the region correctly, you should see log lines like
... info="attempting to refresh auth tokens", region=eu-west-1, account-ids="YYY"
Do you see those?
If all the above is OK, I think we have some subtle IAM permissions mismatch to puzzle through.
@squaremo thanks for a quick response!
just to check: is this stating that when you install the Helm chart using helm, all the deployments and so on in the chart start up and run fine? (i.e., does this demonstrate that the worker nodes can fetch the images you're using?)
Yes, and not only this. When I install same helm using Flux, it is installed as well. Just upgrade does not work nor fluxctl --k8s-fwd-ns=flux list-images -c XXX:helmrelease/XXX
What happens if you remove the region:
With or without KIAM role - the same result
ts=2019-03-13T12:43:32.78466456Z caller=aws.go:69 component=aws warn="no AWS region configured, or detected as cluster region" err="EC2MetadataError: failed to make EC2Metadata request\ncaused by: request blocked by whitelist-route-regexp \"^$\": /latest/meta-data/placement/availability-zone\n"
ts=2019-03-13T12:43:32.784698675Z caller=main.go:288 warning="AWS authorization not used; pre-flight check failed"
Also when I use
dockercfg:
enabled: true
secretName: "ecr-docker-secret"
in values, there is no volume created as should be according to this:
https://github.com/weaveworks/flux/blob/master/chart/flux/templates/deployment.yaml
{{- if .Values.registry.dockercfg.enabled }}
- name: docker-credentials
secret:
secretName: "{{ .Values.registry.dockercfg.secretName }}"
{{- end }}
This log line:
ts=2019-03-13T12:43:32.78466456Z caller=aws.go:69 component=aws warn="no AWS region configured, or detected as cluster region" err="EC2MetadataError: failed to make EC2Metadata request\ncaused by: request blocked by whitelist-route-regexp \"^$\": /latest/meta-data/placement/availability-zone\n"
indicates that the pod doesn't have access to the EC2 metadata API. Supplying the region in .registry.ecr.region (rather than letting fluxd detect it with the metadata API) gets around this -- but the failure makes me suspect other APIs are not available. Are there any warnings in the logs about fetch credentials, when you do explicitly set the region? It would look like
... error="fetching credentials for AWS region", region=eu-west-1, err="..."
@squaremo
I have also tried: https://github.com/bzon/ecr-k8s-secret-creator/blob/master/FLUX_GUIDE.md
if I do cat /etc/fluxd/docker/config.json inside pod
I got (seems correct)
{
"auths": {
"auth": "REDACTED"
"https://XXX.dkr.ecr.eu-west-1.amazonaws.com": {
}
}
}
Also it is running inside with proper parameter
1 root 0:00 /sbin/tini -- fluxd --ssh-keygen-dir=/var/fluxd/keygen --k8s-secret-name=flux-git-deploy --memcached-hostname=flux-memcached --memcached-service= [email protected]:XXX --git-branch=staging --git-path= --git-user=Weave Flux [email protected] --git-set-author=false --git-poll-interval=5m --git-timeout=20s --sync-interval=5m --git-ci-skip=false --registry-poll-interval=5m --registry-rps=200 --registry-burst=125 --registry-trace=false --docker-config=/etc/fluxd/docker/config.json --registry-ecr-region=eu-west-1 --registry-ecr-include-id=YYY --registry-ecr-exclude-id=6.02401143452e+11
flux is still able to deploy helm chart, but
fluxctl --k8s-fwd-ns=flux list-images -c XXX:helmrelease/XXX
still gives me - image data not available
not I am able to uprade chart - I am still getting
err="image \"YYY.dkr.ecr.eu-west-1.amazonaws.com/XXX:latest\" does not exist: invalid image ID
> kubectl -n flux logs flux-58d5cdf67-24rth | grep "fetch"
nothing
But I think it is because of kiam
https://github.com/uswitch/kiam
Denies access to all other AWS Metadata API paths by default (but can be whitelisted via flag)
But I think it is because of kiam
https://github.com/uswitch/kiamDenies access to all other AWS Metadata API paths by default (but can be whitelisted via flag)
Ah yes, that's a bit of a smoking gun .. If you allow /latest/meta-data/placement/availability-zone then we may be able to find out more about the authentication request that happens after.
@squaremo
I have uninstalled KIAM but have still same issue
I have removed
ecr:
region:
and deployed yet again
ts=2019-03-13T14:47:37.292135293Z caller=main.go:249 component=cluster host=https://172.20.0.1:443 version=kubernetes-v1.11.8-eks-7c34c0
ts=2019-03-13T14:47:37.292168694Z caller=main.go:261 component=cluster kubectl=/usr/local/bin/kubectl
ts=2019-03-13T14:47:37.293378727Z caller=main.go:269 component=cluster ping=true
ts=2019-03-13T14:47:37.29447817Z caller=aws.go:74 component=aws info="detected cluster region" region=eu-west-1
ts=2019-03-13T14:47:37.294511853Z caller=aws.go:78 component=aws info="restricting ECR registry scans" regions=eu-west-1 include-ids=redacted exclude-ids=6.02401143452e+11
seems that region was detected but I cannot get this working for ECR.
One small change -- I don't think this will fix the whole thing, but: you should quote the exclude-id, so it doesn't get parsed as a number.
Are there any log lines in kubectl log ... | grep fetch now?
That was it :-)
Now it works!!!
So now I need to figure out KIAM.
Thanks.
Yay -- an unexpectedly effective fix!
If you do get KIAM interoperating with flux, it would be wonderful if you could report back on what you did. Then we could put together a bit of documentation on it, to help other folks.
@squaremo
I have a problem where agent and server are running on same node - which effectively blocks API calls from that node via iptables to AWS API - BUT it seems Flux now works.
So it seems it was due to lack of quotes in
ecr:
region: eu-west-1
includeId: "YYY"
excludeId: "602401143452"
Facing this issue after upgrading from helm repo v0.5.1 to v0.6.3. Previously it was working correctly.
Tested specifying ecr region, includeId, excludeId, using quotes, erasing and redeploying flux chart. Still the same errors:
I'm using kube2iambut I don't see any errors there and it doesn't seems related.
ts=2019-03-27T00:28:42.825981605Z caller=loop.go:123 component=sync-loop jobID=66f3876e-3c9c-d9a7-7d92-3c87338ed88b state=done success=false err="image \"xxx.dkr.ecr.eu-west-1.amazonaws.com/oauth2-proxy:latest\" does not exist: invalid image ID"
ts=2019-03-27T00:28:43.487384425Z caller=warming.go:192 component=warmer canonical_name=xxx.dkr.ecr.eu-west-1.amazonaws.com/oauth2-proxy auth={map[]} err="requesting tags: Get https://xxx.dkr.ecr.eu-west-1.amazonaws.com/v2/oauth2-proxy/tags/list: no basic auth credentials"
Here is how I deploy flux
helm upgrade --install flux \
--set git.url=ssh://[email protected]/example/k8s \
--set git.user=example-github-bot \
--set [email protected] \
--set registry.ecr.region=eu-west-1 \
--set registry.ecr.excludeId=602401143452 \
--set registry.ecr.includeId=xxx \
--set helmOperator.createCRD=false \
--set helmOperator.create=true \
--set prometheus.enabled=true \
--namespace core \
weaveworks/flux
I'm also facing this issue and I believe is started happening when I changed the ECR repository for one of my automated workloads. Although it is configured everywhere correctly I get this 'no basic auth provided' when trying to list tags for this repository. All IAMs seem to be correctly defined for pods executing in the cluster...
Most helpful comment
Facing this issue after upgrading from helm repo v0.5.1 to v0.6.3. Previously it was working correctly.
Tested specifying ecr region, includeId, excludeId, using quotes, erasing and redeploying flux chart. Still the same errors:
I'm using
kube2iambut I don't see any errors there and it doesn't seems related.Here is how I deploy flux