K3s: Enable traefik dashboard - helm-install-traefik pod fails with "Error: unknown flag: --purge"

Created on 21 Jan 2020  Â·  20Comments  Â·  Source: k3s-io/k3s

Version:

k3s version v1.17.0+k3s.1 (0f644650)

Server install command: curl -sfL https://get.k3s.io | K3S_TOKEN=abc123 sh -s - server --cluster-init
Agent install command: curl -sfL https://get.k3s.io | K3S_TOKEN=abc123 K3S_URL=https://server:6443/ sh -s -

Describe the bug
On my server I modified /var/lib/rancher/k3s/server/manifests/traefik.yaml adding dashboard.enabled = true. After saving, the change seems to get picked up and I see the traefik install pod start, but it fails and gets stuck in a crash loop.

To Reproduce
Modify /var/lib/rancher/k3s/server/manifests/traefik.yaml to try and enable the dashboard

Expected behavior
Traefik Dashboard - I'm not 100% sure my procedure here is right. I find several blog posts talking about enabling traefik dashboard in k3s but this seemed like the easiest option.

Actual behavior
Crashloop, pod log:

2020-01-21T12:11:34.98497809-06:00 stderr F CHART=$(sed -e "s/%{KUBERNETES_API}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/g" <<< "${CHART}")
2020-01-21T12:11:34.987297604-06:00 stderr F set +v -x
2020-01-21T12:11:34.98736353-06:00 stderr F + cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates/
2020-01-21T12:11:34.988936567-06:00 stderr F + update-ca-certificates
2020-01-21T12:11:35.018340191-06:00 stderr F WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
2020-01-21T12:11:35.037636413-06:00 stderr F + + tiller export --listen=127.0.0.1:44134 HELM_HOST=127.0.0.1:44134--storage=secret
2020-01-21T12:11:35.037673091-06:00 stderr F
2020-01-21T12:11:35.037681251-06:00 stderr F + HELM_HOST=127.0.0.1:44134
2020-01-21T12:11:35.037687228-06:00 stderr F + helm_v2 init --skip-refresh --client-only
2020-01-21T12:11:35.121554954-06:00 stderr F [main] 2020/01/21 18:11:35 Starting Tiller v2.12.3 (tls=false)
2020-01-21T12:11:35.121726446-06:00 stderr F [main] 2020/01/21 18:11:35 GRPC listening on 127.0.0.1:44134
2020-01-21T12:11:35.121816536-06:00 stderr F [main] 2020/01/21 18:11:35 Probes listening on :44135
2020-01-21T12:11:35.121909529-06:00 stderr F [main] 2020/01/21 18:11:35 Storage driver is Secret
2020-01-21T12:11:35.12200519-06:00 stderr F [main] 2020/01/21 18:11:35 Max history per release is 0
2020-01-21T12:11:35.131376676-06:00 stdout F Creating /root/.helm
2020-01-21T12:11:35.131474012-06:00 stdout F Creating /root/.helm/repository
2020-01-21T12:11:35.131535049-06:00 stdout F Creating /root/.helm/repository/cache
2020-01-21T12:11:35.131604154-06:00 stdout F Creating /root/.helm/repository/local
2020-01-21T12:11:35.131659886-06:00 stdout F Creating /root/.helm/plugins
2020-01-21T12:11:35.131738067-06:00 stdout F Creating /root/.helm/starters
2020-01-21T12:11:35.131784603-06:00 stdout F Creating /root/.helm/cache/archive
2020-01-21T12:11:35.131892813-06:00 stdout F Creating /root/.helm/repository/repositories.yaml
2020-01-21T12:11:35.131906147-06:00 stdout F Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
2020-01-21T12:11:35.131993983-06:00 stdout F Adding local repo with URL: http://127.0.0.1:8879/charts
2020-01-21T12:11:35.133100698-06:00 stdout F $HELM_HOME has been configured at /root/.helm.
2020-01-21T12:11:35.133122309-06:00 stdout F Not installing Tiller due to 'client-only' flag having been set
2020-01-21T12:11:35.133129971-06:00 stdout F Happy Helming!
2020-01-21T12:11:35.13528301-06:00 stderr F ++ helm_v2 ls --all '^traefik$' --output json
2020-01-21T12:11:35.135508412-06:00 stderr F ++ jq -r '.Releases | length'
2020-01-21T12:11:35.219538766-06:00 stderr F [storage] 2020/01/21 18:11:35 listing all releases with filter
2020-01-21T12:11:35.300276818-06:00 stderr F + EXIST=
2020-01-21T12:11:35.300352736-06:00 stderr F + '[' '' == 1 ']'
2020-01-21T12:11:35.300436873-06:00 stderr F + '[' '' == v2 ']'
2020-01-21T12:11:35.300513923-06:00 stderr F + helm_repo_init
2020-01-21T12:11:35.30069198-06:00 stderr F + grep -q -e 'https\?://'
2020-01-21T12:11:35.302207441-06:00 stderr F + echo 'chart path is a url, skipping repo update'
2020-01-21T12:11:35.302256859-06:00 stdout F chart path is a url, skipping repo update
2020-01-21T12:11:35.302262238-06:00 stderr F + helm_v3 repo remove stable
2020-01-21T12:11:35.371111262-06:00 stderr F Error: no repositories configured
2020-01-21T12:11:35.372877998-06:00 stderr F + true
2020-01-21T12:11:35.372918017-06:00 stderr F + return
2020-01-21T12:11:35.373007592-06:00 stderr F + helm_update install --set-string kubernetes.ingressEndpoint.useDefaultPublishedService=true --set-string metrics.prometheus.enabled=true --set-string rbac.enabled=true --set-string ssl.enabled=true
2020-01-21T12:11:35.373122624-06:00 stderr F + '[' helm_v3 == helm_v3 ']'
2020-01-21T12:11:35.373746776-06:00 stderr F ++ helm_v3 ls --all -f '^traefik$' --output json
2020-01-21T12:11:35.373961792-06:00 stderr F ++ jq -r '"\(.[0].app_version),\(.[0].status)"'
2020-01-21T12:11:35.374192578-06:00 stderr F ++ tr '[:upper:]' '[:lower:]'
2020-01-21T12:11:35.567083121-06:00 stderr F + LINE=1.7.19,failed
2020-01-21T12:11:35.567718274-06:00 stderr F ++ echo 1.7.19,failed
2020-01-21T12:11:35.567878271-06:00 stderr F ++ cut -f1 -d,
2020-01-21T12:11:35.568939264-06:00 stderr F + INSTALLED_VERSION=1.7.19
2020-01-21T12:11:35.569532961-06:00 stderr F ++ echo 1.7.19,failed
2020-01-21T12:11:35.569741132-06:00 stderr F ++ cut -f2 -d,
2020-01-21T12:11:35.570669853-06:00 stderr F + STATUS=failed
2020-01-21T12:11:35.570697496-06:00 stderr F + '[' -e /config/values.yaml ']'
2020-01-21T12:11:35.570774756-06:00 stderr F + '[' install = delete ']'
2020-01-21T12:11:35.57080816-06:00 stderr F + '[' -z 1.7.19 ']'
2020-01-21T12:11:35.570858876-06:00 stderr F + '[' -z '' ']'
2020-01-21T12:11:35.570885467-06:00 stderr F + '[' failed = deployed ']'
2020-01-21T12:11:35.57090561-06:00 stderr F + '[' failed = failed ']'
2020-01-21T12:11:35.570952853-06:00 stderr F + '[' helm_v3 == 'helm_v3]'
2020-01-21T12:11:35.570979572-06:00 stderr F /usr/bin/entry: line 43: [: missing `]'
2020-01-21T12:11:35.57106419-06:00 stderr F + helm_v3 install --set-string kubernetes.ingressEndpoint.useDefaultPublishedService=true --set-string metrics.prometheus.enabled=true --set-string rbac.enabled=true --set-string ssl.enabled=true --purge traefik
2020-01-21T12:11:35.643826637-06:00 stderr F Error: unknown flag: --purge

Most helpful comment

I've verified that simply editing /var/lib/rancher/k3s/server/manifests/traefik.yaml works in k3s v1.17.2+k3s1. I added the following:

    dashboard.enabled: "true"
    dashboard.domain: "traefik.me.somewhere"

The change gets deployed and ingress through traefik.me.somewhere shows me the traefik dashboard. Super easy, very cool!

All 20 comments

I tried what seemed like it was worth a shot: helm rollback traefik 1 --namespace kube-system

and got this error:

Error: failed to replace object: Service "traefik" is invalid: spec.clusterIP: Invalid value: "": field is immutable

Is there a way to remove traefik and get it reinstalled using the defaults like when I deployed k3s? I'm assuming it's not just a vanilla traefik helm install?

EDIT:

I decided to try uninstalling and installing traefik from scratch, I ran helm delete traefik --namespace kube-system and k3s automatically redeployed traefik (I should have guessed it would do that) so I didn't have to reinstall.

Traefik install still failed with the dashboard.enabled: "true" line so I removed that and things are mostly back to normal (some ingress problem that'll probably be fixed by recreating the ingress).

FWIW, this is the YAML I was trying to use. K3s is my first time using traefik so this may just not be a valid configuration but it seems like it is from looking at the traefik helm chart:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: traefik
  namespace: kube-system
spec:
  chart: https://%{KUBERNETES_API}%/static/charts/traefik-1.81.0.tgz
  set:
    rbac.enabled: "true"
    ssl.enabled: "true"
    metrics.prometheus.enabled: "true"
    kubernetes.ingressEndpoint.useDefaultPublishedService: "true"
    dashboard.enabled: "true"

I've had issues with Helm charts where the install Job fails in an odd way that causes the remove to also fail. I had to remove the manifest from the filesystem, delete the HelmChart, delete the stuck remove Job, then go edit the chart to remove the finalizer before it would actually finish deleting. Once that was done I was able to move the fixed manifest back into place.

After a few rounds of that I gave up on trying to iterate on Helm charts since updates don't seem to work right. I'm just rendering the template out to the manifests dir using helm template and letting the addon controller work its magic.

@jonstelly have you tried deploying the cluster without traefik (--no-deploy=traefik) and installing traefik manually?

Not yet. I'll try that out in the next few days along with the suggestion to try removing the traefik.yaml manifest and giving the cluster some time to get everything cleaned up, then drop the file back in the directory with the dashboard enabled.

I'm guessing removing the manifest for a short time is similar to specifying --no-deploy=traefik? Maybe some internal plumbing differences. Judging by the default traefik.yaml there doesn't seem to be a lot of customization from the chart defaults, but I'll try this out in a temporary cluster, I'd rather not have to recreate my existing cluster for this change.

And I just saw that #1347 says this might be fixed in v1.17.2-alpha3+k3s1 so I'll try that out first. Will report back with results.

Resolved with v1.17.2-alpha3+k3s1. Closing issue.

I've verified that simply editing /var/lib/rancher/k3s/server/manifests/traefik.yaml works in k3s v1.17.2+k3s1. I added the following:

    dashboard.enabled: "true"
    dashboard.domain: "traefik.me.somewhere"

The change gets deployed and ingress through traefik.me.somewhere shows me the traefik dashboard. Super easy, very cool!

I've verified it works on a single node but in a k3s cluster, editing a manifest file ends up with a total mess. I am not sure if it's fighting with different configurations on servers. I tried to edit the file on all servers and it literally crashed all with a CrashLoopBackOff of Helm-install. So the question is: how to you edit manifest to enable dashboard or set ACME, etc... in such a deployment? Should you disable everything and go with traditional helm deployment?

I asked the same question elsewhere. The docs don't cover how to figure out which node is running the helm controller, or any of the other custom controllers for that matter. I think they use an endpoint lock annotation to hold the master election, but I can't recall for sure.

Then would you suggest to completely disable traefik and use standard helm to deploy it? I found the helm-install concept interesting at first.

I've verified that simply editing /var/lib/rancher/k3s/server/manifests/traefik.yaml works in k3s v1.17.2+k3s1. I added the following:

    dashboard.enabled: "true"
    dashboard.domain: "traefik.me.somewhere"

The change gets deployed and ingress through traefik.me.somewhere shows me the traefik dashboard. Super easy, very cool!

hi how did you deployed the changes?

@ozeta All you need to do is edit the manifest file, the deploy controller picks up the changes automatically. If there are any errors in the manifest, you should see something marginally useful in the k3s logs. If there's an error in the actual Helm output, that goes into a job container that's harder to get at.

OK, so on a cluster, this is what I get when updating the yaml manifest. It tried to install/deploy the new helm in loop and crashes in loop. This works correctly on a single node.

kube-system   1s          Normal    Created                  pod/helm-install-traefik-8w2qz   Created container helm
kube-system   1s          Normal    Started                  pod/helm-install-traefik-8w2qz   Started container helm
kube-system   1s          Normal    SuccessfulCreate         job/helm-install-traefik         Created pod: helm-install-traefik-g8ntq
kube-system   <unknown>   Normal    Scheduled                pod/helm-install-traefik-g8ntq   Successfully assigned kube-system/helm-install-traefik-g8ntq to kiddy
kube-system   1s          Normal    Pulled                   pod/helm-install-traefik-g8ntq   Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system   0s          Normal    Created                  pod/helm-install-traefik-g8ntq   Created container helm
kube-system   0s          Normal    Started                  pod/helm-install-traefik-g8ntq   Started container helm
kube-system   0s          Normal    SuccessfulCreate         job/helm-install-traefik         Created pod: helm-install-traefik-f24xx
kube-system   <unknown>   Normal    Scheduled                pod/helm-install-traefik-f24xx   Successfully assigned kube-system/helm-install-traefik-f24xx to stuffy
kube-system   0s          Normal    Pulled                   pod/helm-install-traefik-f24xx   Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system   0s          Normal    Created                  pod/helm-install-traefik-f24xx   Created container helm
kube-system   0s          Normal    Started                  pod/helm-install-traefik-f24xx   Started container helm
kube-system   0s          Normal    SuccessfulCreate         job/helm-install-traefik         Created pod: helm-install-traefik-cgwlp
kube-system   0s          Normal    Pulled                   pod/helm-install-traefik-f24xx   Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system   0s          Normal    Created                  pod/helm-install-traefik-f24xx   Created container helm
kube-system   0s          Warning   Failed                   pod/helm-install-traefik-f24xx   Error: failed to create containerd task: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/acf4cb5b-b945-4c25-bb5c-50c86ed289c3/volumes/kubernetes.io~secret/helm-traefik-token-wvq2l\\\" to rootfs \\\"/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/helm/rootfs\\\" at \\\"/var/run/secrets/kubernetes.io/serviceaccount\\\" caused \\\"stat /var/lib/kubelet/pods/acf4cb5b-b945-4c25-bb5c-50c86ed289c3/volumes/kubernetes.io~secret/helm-traefik-token-wvq2l: no such file or directory\\\"\"": unknown
kube-system   <unknown>   Normal    Scheduled                pod/helm-install-traefik-cgwlp   Successfully assigned kube-system/helm-install-traefik-cgwlp to kiddy
kube-system   1s          Normal    Pulled                   pod/helm-install-traefik-cgwlp   Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system   0s          Normal    Created                  pod/helm-install-traefik-cgwlp   Created container helm
kube-system   0s          Normal    Started                  pod/helm-install-traefik-cgwlp   Started container helm

Can you provide more info in your cluster? Looks like one of the nodes isn't working properly.

also getting similar error on v1.17.4

CHART=$(sed -e "s/%{KUBERNETES_API}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/g" <<< "${CHART}")
set +v -x
+ cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates/
+ update-ca-certificates
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only
+ tiller --listen=127.0.0.1:44134 --storage=secret
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Adding local repo with URL: http://127.0.0.1:8879/charts 
$HELM_HOME has been configured at /root/.helm.
Not installing Tiller due to 'client-only' flag having been set
Happy Helming!
++ helm_v2 ls --all '^traefik$' --output json
++ jq -r '.Releases | length'
[main] 2020/04/15 13:28:17 Starting Tiller v2.12.3 (tls=false)
[main] 2020/04/15 13:28:17 GRPC listening on 127.0.0.1:44134
[main] 2020/04/15 13:28:17 Probes listening on :44135
[main] 2020/04/15 13:28:17 Storage driver is Secret
[main] 2020/04/15 13:28:17 Max history per release is 0
[storage] 2020/04/15 13:28:18 listing all releases with filter
+ EXIST=1
+ '[' 1 == 1 ']'
+ HELM=helm_v2
+ NAME_ARG=--name
+ JQ_CMD='"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
+ helm_repo_init
+ grep -q -e 'https\?://'
+ '[' helm_v2 == helm_v3 ']'
+ helm_v2 repo update --strict
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈ 
+ '[' -n '' ']'
+ helm_update install
+ '[' helm_v2 == helm_v3 ']'
++ helm_v2 ls --all '^traefik$' --output json
++ ++ tr '[:upper:]' jq '[:lower:]'
-r '"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
[storage] 2020/04/15 13:28:24 listing all releases with filter
+ LINE=1.7.19,failed
++ ++ cut -f1 -d,
echo 1.7.19,failed
+ INSTALLED_VERSION=1.7.19
++ echo 1.7.19,failed
++ cut -f2 -d,
+ STATUS=failed
+ '[' -e /config/values.yaml ']'
+ VALUES='--values /config/values.yaml'
+ '[' install = delete ']'
+ '[' -z 1.7.19 ']'
+ '[' -z '' ']'
+ '[' failed = deployed ']'
+ '[' failed = failed ']'
+ '[' helm_v2 == helm_v3 ']'
+ helm_v2 install --purge traefik
Error: unknown flag: --purge 

can be fixed by removing traefik.yaml from the manifests folder, removing the traefik resources and putting the yaml back,

mv /var/lib/rancher/k3s/server/manifests/traefik.yaml /tmp/traefik.yaml
kubectl -n kube-system delete helmcharts.helm.cattle.io traefik
kubectl delete -n kube-system service traefik-dashboard
kubectl delete -n kube-system ingress traefik-dashboard
mv /tmp/traefik.yaml /var/lib/rancher/k3s/server/manifests/traefik.yaml

so it seems it's not issue with the manifest or their deployment, it's the updating them that fails

@kalgecin I found that I had to do this remove/cleanup/replace dance with the manifest whenever I ran into an issue with a Helm chart using the built-in controller.

The integrated deploy and helm controllers are great for bootstrapping the cluster, but there are some issues like this one, and the fact that it overwrites the templates on restart, and the poor logging... that make it really difficult to use them for extensive customization.

"and the fact that it overwrites the templates on restart" wait a minute, the design is that the yaml is erased with the default one everytime you restart the server?

yup, i have a script that copies my yaml after the server has restarted

Thanks. I guess I will disable and deploy it with helm myself then.

yeah, I've taken to disabling all the built-in templates and just copying stuff into the manifests folder when I want it deployed. For Helm charts I just use helm template to render it locally and then copy the generated yaml over.

Was this page helpful?
0 / 5 - 0 ratings