Describe the bug
After deleted the flux pod, the new flux pod keeps crashing with status CrashLoopBackOff.
To Reproduce
Steps to reproduce the behaviour:
Expected behavior
The new flux pod should be started correctly.
Logs
ts=2020-03-10T15:49:27.500040924Z caller=main.go:256 version=1.18.0
ts=2020-03-10T15:49:27.500146713Z caller=main.go:396 msg="using kube config: \"/root/.kube/config\" to connect to the cluster"
ts=2020-03-10T15:49:27.511119866Z caller=main.go:474 err="failed to chmod identity file: chmod /etc/fluxd/ssh/identity: read-only file system"
Additional context
I'm having this problem too, and so far my investigation is pointing to Istio (or more specifically, the pod spec when istio sidecar injection is enabled). @lazybetrayer, are you running Istio as well?
I haven't nailed down the precise issue, but it appears that when the sidecar is injected, the defaultMode parameter that should mount the secret with a mode of 256 (0400) is being interpreted differently. The pod appears to have the right mode when I query it, but
the net effect is that the identity file has mode 0440.
Perhaps the webhook is flipping between octal and decimal when it reads and re-emits the pod spec? I'll post if I discover more.
TL;DR: it appears to be an istio 1.5 problem, but there isn't an obvious fix on the horizon.
Update: I've been able to find a fix, though automating it through Istio may be a problem.
Here's the full explanation and repro instructions. Turns out it has nothing to do with decimal vs octal.
Given podinfo.yaml in this gist, the secret mounts the way flux wants to see it:
ubuntu@microk8s-vm:~$ kubectl exec -i -t -n istio-system podinfo-5c98d75558-q7bjf -- ls -la /foo/..data/
total 4
drwxr-xr-x 2 root root 60 Mar 17 03:03 .
drwxrwxrwt 3 root root 100 Mar 17 03:03 ..
-r-------- 1 root root 1823 Mar 17 03:03 identity
(note that my istio-system namespace does not have injection enabled whereas my fluxcd namespace does)
When istioctl kube-inject is run on it (or when done automatically via sidecar injection), many things get added to the deployment manifest, but of particular interest is the security context (see podinfo.injected.yaml excerpt in the gist).
securityContext:
fsGroup: 1337
When I run that pod, the same command shows the identity file having mode 0440, even though nothing in the secret declaration has changed:
ubuntu@microk8s-vm:~$ kubectl exec -i -t -n fluxcd podinfo-5c98d75558-mjt49 -- ls -la /foo/..data/
Defaulting container name to podinfo.
Use 'kubectl describe pod/podinfo-5c98d75558-mjt49 -n fluxcd' to see all of the containers in this pod.
total 4
drwxr-sr-x 2 root 1337 60 Mar 17 01:53 .
drwxrwsrwt 3 root 1337 100 Mar 17 01:53 ..
-r--r----- 1 root 1337 1823 Mar 17 01:53 identity
I was checking the istio issues to see if anyone had reported anything similar, and found this issue, which while not the same problem, did come to the initial conclusion "the injector shouldn't be adding a security context as of istio v1.5". I tried taking the injected deployment manifest, removed the security context, and re-created the deployment, and voila, the secret is mounted properly:
ubuntu@microk8s-vm:~$ kubectl get -o json -n fluxcd po podinfo-5c98d75558-mjt49 | jq .spec.securityContext
{
"fsGroup": 1337
}
ubuntu@microk8s-vm:~$ kubectl get -o json -n istio-system po podinfo-87f499dc5-2r8xv | jq .spec.securityContext
{}
ubuntu@microk8s-vm:~$ kubectl exec -i -t -n istio-system podinfo-87f499dc5-2r8xv -- ls -la /foo/..data/
Defaulting container name to podinfo.
Use 'kubectl describe pod/podinfo-87f499dc5-2r8xv -n istio-system' to see all of the containers in this pod.
total 4
drwxr-xr-x 2 root root 60 Mar 17 03:10 .
drwxrwxrwt 3 root root 100 Mar 17 03:10 ..
-r-------- 1 root root 1823 Mar 17 03:10 identity
I'm not even sure where to begin reporting and fixing this. Istio does need the security context to function properly. Someone has reported the bug in Kubernetes (though specifically defaultMode as it applies to projected volumes). FluxCD is just caught up in the middle of this, though there are some workarounds that could be applied while istio and k8s get their issues solved.
Flux could flag protect the chmod behavior to give end users the ability to skip the chmod and thus not crash.
The helm chart (which is what I'm using) could expose --k8s-secret-volume-mount-path (the setting of which would disable the secret mount), which would allow me to do some gymnastics to achieve what I want with a key of my own creation. Basically I'd change where the key was expected to live, then use an initContainer to extract it from one of my secrets and copy it to a memory-backed emptyDir volume and give it the correct perms. Just using the initContainer with the existing chart doesn't work, because as the daemon notes, it can't chmod a file on a volume mounted from a secret (ever since the CVE that made k8s change all ephemeral volumes to read only).
I'm using istio 1.5.1 and have the same issue.
There is no reason why Flux should be injected with the Envoy sidecar as Flux only talks to the Kubernetes API.
When creating the flux namespace disable Istio injection with:
kubectl create ns flux
kubectl label namespace flux istio-injection=disabled
Thanks, that's what I was thinking
Istio does many things beyond traffic management, so just blindly turning off the sidecar is a bit of a blunt solution.
At the end of the day, the flux code is attempting an operation that cannot succeed (a chmod on a readonly filesystem) and crashing fatally because of it.
A simple call to unix.Access(keyfile, unix.W_OK) could prevent this crash from happening.
For those who need Istio observability and can't remove the sidecar, there are two possible fixes: use Istio 1.4.x, or access your git repo using https instead of git. The latter is what I did and I'm running fine with Istio 1.5.1.
To bump this up because I have this exact issue, in the multi-tenancy configuration, turning off sidecar injection is not really possible for the “tenants” flux instances as they run in the namespace of the service being deployed.
I’d be great to see a fix to this issue! 🙂
EDIT:
In the flux-patch, one can add:
metadata:
annotations:
sidecar.istio.io/inject: "false"
in spec.template to disable the Istio injection just for the Flux pod.
It works for me though I think Flux should work anyway w/ Istio injection.
Agree with @Frizlab that for multi-tenancy its not viable to mandate turning off sidecar at namespace level.
And also it would be useful to be able to use istio egress gateway with sidecar injection on flux to monitor the traffic levels flux - github, which isn't possible with the namespace wide or flux-patch fix.
The flux-patch.yaml edit is working otherwise though thanks.
Stepped over this issue as well. Any solution for this besides the workaround?
Most helpful comment
To bump this up because I have this exact issue, in the multi-tenancy configuration, turning off sidecar injection is not really possible for the “tenants” flux instances as they run in the namespace of the service being deployed.
I’d be great to see a fix to this issue! 🙂
EDIT:
In the flux-patch, one can add:
in
spec.templateto disable the Istio injection just for the Flux pod.It works for me though I think Flux should work anyway w/ Istio injection.