Flux: Helm operator suddenly loses service account privileges?

Created on 24 Jan 2019  路  2Comments  路  Source: fluxcd/flux

I installed with the helm chart from the github pages of this repo using values:

helmOperator:
  create: "true"
  createCRD: "true"

Cleanup:

helm del --purge mongodb

Restart all flux pods (flux, helm-operator, memcached):

kubectl -n flux delete pods --all

Soon mongodb shows up in list of helm revisions, with 1 release.

ts=2019-01-24T06:14:31.713343559Z caller=operator.go:172 component=operator debug="PROCESSING item [\"demo/mongodb\"]"
ts=2019-01-24T06:14:31.713397532Z caller=operator.go:229 component=operator debug="Starting to sync cache key demo/mongodb"
ts=2019-01-24T06:14:34.156496619Z caller=release.go:139 component=release info="processing release mongodb-temp" action=CREATE options="{DryRun:true ReuseName:false}" timeout=300s
ts=2019-01-24T06:14:34.202857451Z caller=operator.go:214 component=operator info="Successfully synced 'demo/mongodb'"
...
I0124 06:14:34.203643       7 event.go:221] Event(v1.ObjectReference{Kind:"HelmRelease", Namespace:"demo", Name:"mongodb", UID:"217861fc-1f86-11e9-8fda-42010aa20fe8", APIVersion:"flux.weave.works/v1beta1", ResourceVersion:"899958", FieldPath:""}): type: 'Normal' reason: 'ChartSynced' Chart managed by HelmRelease processed successfully

But then no new updates will work (e.g. 4.0.2 -> 4.0.3), and even before I attempt an update this is in the logs:

ts=2019-01-24T06:17:29.365855484Z caller=release.go:139 component=release info="processing release mongodb" action=CREATE options="{DryRun:false ReuseName:false}" timeout=300s
ts=2019-01-24T06:17:29.558709839Z caller=release.go:186 component=release error="Chart release failed: mongodb: &status.statusError{Code:2, Message:\"release mongodb failed: namespaces \\\"demo\\\" is forbidden: User \\\"system:serviceaccount:kube-system:default\\\" cannot get namespaces in the namespace \\\"demo\\\"\", Details:[]*any.Any(nil)}"
ts=2019-01-24T06:17:29.962887958Z caller=chartsync.go:351 component=chartsync warning="Failed to install chart" namespace=demo name=mongodb error="rpc error: code = Unknown desc = configmaps is forbidden: User \"system:serviceaccount:kube-system:default\" cannot list configmaps in the namespace \"kube-system\""

Seems like for some reason it is using default SA? When I exec in I can kubectl -n kube-system list configmaps for example.

question

Most helpful comment

I had manually installed Helm with proper SA via the following (obtained via a dry run):

# May need to elevate your GCP to admin privileges to apply this manifest
# kubectl create clusterrolebinding "cluster-admin-$(whoami)" --clusterrole=cluster-admin --user="$(gcloud config get-value core/account)"
# helm init --service-account tiller
# See also https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#defining_permissions_in_a_role
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: helm
    name: tiller
  # Default was tiller-deploy but can be changed as per https://github.com/helm/helm/blob/440e79ff95/pkg/helm/portforwarder/portforwarder.go#L38
  name: tiller
  namespace: kube-system
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      labels:
        app: helm
        name: tiller
    spec:
      automountServiceAccountToken: true
      containers:
      - env:
        - name: TILLER_NAMESPACE
          value: kube-system
        - name: TILLER_HISTORY_MAX
          value: "0"
        image: gcr.io/kubernetes-helm/tiller:v2.12.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /liveness
            port: 44135
          initialDelaySeconds: 1
          timeoutSeconds: 1
        name: tiller
        ports:
        - containerPort: 44134
          name: tiller
        - containerPort: 44135
          name: http
        readinessProbe:
          httpGet:
            path: /readiness
            port: 44135
          initialDelaySeconds: 1
          timeoutSeconds: 1
        resources: {}
      serviceAccountName: tiller
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: helm
    name: tiller
  name: tiller
  namespace: kube-system
spec:
  ports:
  - name: tiller
    port: 44134
    targetPort: tiller
  selector:
    app: helm
    name: tiller
  type: ClusterIP
...

 
Notice I changed tiller-deploy to tiller; the problem is that when I used terraform-provider-helm it then launched its own tiller-deploy deployment.

It appears that then the flux helm operator started to have issues picking the correct tiller (they both have same labels app: helm, name: tiller.

terraform-provider-helm calls installer.Installer from helm and checks for an AlreadyExists error, which would only be thrown if a duplicate deployment was created (which was not the case since I changed the name).

When I changed that deployment name I knew it would probably bite later. I had incorrectly assumed Helm would check for an existing service with correct labels (as the client does).

Sorry for the irrelevant issue, hopefully someone else searching "cannot list" lands here one day though! Helm provider lets you provide overrides on the deployment so I could use that to set the name.

Lots of trouble just because I didn't like the redundant -deploy aha :)

FWIW my bootstrap process going forward will probably be something like:

  1. Deploy GKE cluster and node pools with Terraform
  2. Deploy helm with kubernetes provider
  3. Deploy helm and flux with helm provider
  4. (Use a terraform local data source to grab fluxctl identity; then use that for creating a gitlab deploy key)
  5. Flux does its thing

The idea being devs can fork a repo/copy a template to get their own flux-based cluster playground.

All 2 comments

That error comes from Tiller that runs in the kube-system namespace. Is your Tiller running under it's own SA like it's been described here https://github.com/weaveworks/flux/blob/master/site/helm-get-started.md#prerequisites ?

I had manually installed Helm with proper SA via the following (obtained via a dry run):

# May need to elevate your GCP to admin privileges to apply this manifest
# kubectl create clusterrolebinding "cluster-admin-$(whoami)" --clusterrole=cluster-admin --user="$(gcloud config get-value core/account)"
# helm init --service-account tiller
# See also https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#defining_permissions_in_a_role
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: helm
    name: tiller
  # Default was tiller-deploy but can be changed as per https://github.com/helm/helm/blob/440e79ff95/pkg/helm/portforwarder/portforwarder.go#L38
  name: tiller
  namespace: kube-system
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      labels:
        app: helm
        name: tiller
    spec:
      automountServiceAccountToken: true
      containers:
      - env:
        - name: TILLER_NAMESPACE
          value: kube-system
        - name: TILLER_HISTORY_MAX
          value: "0"
        image: gcr.io/kubernetes-helm/tiller:v2.12.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /liveness
            port: 44135
          initialDelaySeconds: 1
          timeoutSeconds: 1
        name: tiller
        ports:
        - containerPort: 44134
          name: tiller
        - containerPort: 44135
          name: http
        readinessProbe:
          httpGet:
            path: /readiness
            port: 44135
          initialDelaySeconds: 1
          timeoutSeconds: 1
        resources: {}
      serviceAccountName: tiller
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: helm
    name: tiller
  name: tiller
  namespace: kube-system
spec:
  ports:
  - name: tiller
    port: 44134
    targetPort: tiller
  selector:
    app: helm
    name: tiller
  type: ClusterIP
...

 
Notice I changed tiller-deploy to tiller; the problem is that when I used terraform-provider-helm it then launched its own tiller-deploy deployment.

It appears that then the flux helm operator started to have issues picking the correct tiller (they both have same labels app: helm, name: tiller.

terraform-provider-helm calls installer.Installer from helm and checks for an AlreadyExists error, which would only be thrown if a duplicate deployment was created (which was not the case since I changed the name).

When I changed that deployment name I knew it would probably bite later. I had incorrectly assumed Helm would check for an existing service with correct labels (as the client does).

Sorry for the irrelevant issue, hopefully someone else searching "cannot list" lands here one day though! Helm provider lets you provide overrides on the deployment so I could use that to set the name.

Lots of trouble just because I didn't like the redundant -deploy aha :)

FWIW my bootstrap process going forward will probably be something like:

  1. Deploy GKE cluster and node pools with Terraform
  2. Deploy helm with kubernetes provider
  3. Deploy helm and flux with helm provider
  4. (Use a terraform local data source to grab fluxctl identity; then use that for creating a gitlab deploy key)
  5. Flux does its thing

The idea being devs can fork a repo/copy a template to get their own flux-based cluster playground.

Was this page helpful?
0 / 5 - 0 ratings