Version:
k3s version v1.17.0+k3s.1 (0f644650)
Server install command: curl -sfL https://get.k3s.io | K3S_TOKEN=abc123 sh -s - server --cluster-init
Agent install command: curl -sfL https://get.k3s.io | K3S_TOKEN=abc123 K3S_URL=https://server:6443/ sh -s -
Describe the bug
On my server I modified /var/lib/rancher/k3s/server/manifests/traefik.yaml adding dashboard.enabled = true. After saving, the change seems to get picked up and I see the traefik install pod start, but it fails and gets stuck in a crash loop.
To Reproduce
Modify /var/lib/rancher/k3s/server/manifests/traefik.yaml to try and enable the dashboard
Expected behavior
Traefik Dashboard - I'm not 100% sure my procedure here is right. I find several blog posts talking about enabling traefik dashboard in k3s but this seemed like the easiest option.
Actual behavior
Crashloop, pod log:
2020-01-21T12:11:34.98497809-06:00 stderr F CHART=$(sed -e "s/%{KUBERNETES_API}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/g" <<< "${CHART}")
2020-01-21T12:11:34.987297604-06:00 stderr F set +v -x
2020-01-21T12:11:34.98736353-06:00 stderr F + cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates/
2020-01-21T12:11:34.988936567-06:00 stderr F + update-ca-certificates
2020-01-21T12:11:35.018340191-06:00 stderr F WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
2020-01-21T12:11:35.037636413-06:00 stderr F + + tiller export --listen=127.0.0.1:44134 HELM_HOST=127.0.0.1:44134--storage=secret
2020-01-21T12:11:35.037673091-06:00 stderr F
2020-01-21T12:11:35.037681251-06:00 stderr F + HELM_HOST=127.0.0.1:44134
2020-01-21T12:11:35.037687228-06:00 stderr F + helm_v2 init --skip-refresh --client-only
2020-01-21T12:11:35.121554954-06:00 stderr F [main] 2020/01/21 18:11:35 Starting Tiller v2.12.3 (tls=false)
2020-01-21T12:11:35.121726446-06:00 stderr F [main] 2020/01/21 18:11:35 GRPC listening on 127.0.0.1:44134
2020-01-21T12:11:35.121816536-06:00 stderr F [main] 2020/01/21 18:11:35 Probes listening on :44135
2020-01-21T12:11:35.121909529-06:00 stderr F [main] 2020/01/21 18:11:35 Storage driver is Secret
2020-01-21T12:11:35.12200519-06:00 stderr F [main] 2020/01/21 18:11:35 Max history per release is 0
2020-01-21T12:11:35.131376676-06:00 stdout F Creating /root/.helm
2020-01-21T12:11:35.131474012-06:00 stdout F Creating /root/.helm/repository
2020-01-21T12:11:35.131535049-06:00 stdout F Creating /root/.helm/repository/cache
2020-01-21T12:11:35.131604154-06:00 stdout F Creating /root/.helm/repository/local
2020-01-21T12:11:35.131659886-06:00 stdout F Creating /root/.helm/plugins
2020-01-21T12:11:35.131738067-06:00 stdout F Creating /root/.helm/starters
2020-01-21T12:11:35.131784603-06:00 stdout F Creating /root/.helm/cache/archive
2020-01-21T12:11:35.131892813-06:00 stdout F Creating /root/.helm/repository/repositories.yaml
2020-01-21T12:11:35.131906147-06:00 stdout F Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
2020-01-21T12:11:35.131993983-06:00 stdout F Adding local repo with URL: http://127.0.0.1:8879/charts
2020-01-21T12:11:35.133100698-06:00 stdout F $HELM_HOME has been configured at /root/.helm.
2020-01-21T12:11:35.133122309-06:00 stdout F Not installing Tiller due to 'client-only' flag having been set
2020-01-21T12:11:35.133129971-06:00 stdout F Happy Helming!
2020-01-21T12:11:35.13528301-06:00 stderr F ++ helm_v2 ls --all '^traefik$' --output json
2020-01-21T12:11:35.135508412-06:00 stderr F ++ jq -r '.Releases | length'
2020-01-21T12:11:35.219538766-06:00 stderr F [storage] 2020/01/21 18:11:35 listing all releases with filter
2020-01-21T12:11:35.300276818-06:00 stderr F + EXIST=
2020-01-21T12:11:35.300352736-06:00 stderr F + '[' '' == 1 ']'
2020-01-21T12:11:35.300436873-06:00 stderr F + '[' '' == v2 ']'
2020-01-21T12:11:35.300513923-06:00 stderr F + helm_repo_init
2020-01-21T12:11:35.30069198-06:00 stderr F + grep -q -e 'https\?://'
2020-01-21T12:11:35.302207441-06:00 stderr F + echo 'chart path is a url, skipping repo update'
2020-01-21T12:11:35.302256859-06:00 stdout F chart path is a url, skipping repo update
2020-01-21T12:11:35.302262238-06:00 stderr F + helm_v3 repo remove stable
2020-01-21T12:11:35.371111262-06:00 stderr F Error: no repositories configured
2020-01-21T12:11:35.372877998-06:00 stderr F + true
2020-01-21T12:11:35.372918017-06:00 stderr F + return
2020-01-21T12:11:35.373007592-06:00 stderr F + helm_update install --set-string kubernetes.ingressEndpoint.useDefaultPublishedService=true --set-string metrics.prometheus.enabled=true --set-string rbac.enabled=true --set-string ssl.enabled=true
2020-01-21T12:11:35.373122624-06:00 stderr F + '[' helm_v3 == helm_v3 ']'
2020-01-21T12:11:35.373746776-06:00 stderr F ++ helm_v3 ls --all -f '^traefik$' --output json
2020-01-21T12:11:35.373961792-06:00 stderr F ++ jq -r '"\(.[0].app_version),\(.[0].status)"'
2020-01-21T12:11:35.374192578-06:00 stderr F ++ tr '[:upper:]' '[:lower:]'
2020-01-21T12:11:35.567083121-06:00 stderr F + LINE=1.7.19,failed
2020-01-21T12:11:35.567718274-06:00 stderr F ++ echo 1.7.19,failed
2020-01-21T12:11:35.567878271-06:00 stderr F ++ cut -f1 -d,
2020-01-21T12:11:35.568939264-06:00 stderr F + INSTALLED_VERSION=1.7.19
2020-01-21T12:11:35.569532961-06:00 stderr F ++ echo 1.7.19,failed
2020-01-21T12:11:35.569741132-06:00 stderr F ++ cut -f2 -d,
2020-01-21T12:11:35.570669853-06:00 stderr F + STATUS=failed
2020-01-21T12:11:35.570697496-06:00 stderr F + '[' -e /config/values.yaml ']'
2020-01-21T12:11:35.570774756-06:00 stderr F + '[' install = delete ']'
2020-01-21T12:11:35.57080816-06:00 stderr F + '[' -z 1.7.19 ']'
2020-01-21T12:11:35.570858876-06:00 stderr F + '[' -z '' ']'
2020-01-21T12:11:35.570885467-06:00 stderr F + '[' failed = deployed ']'
2020-01-21T12:11:35.57090561-06:00 stderr F + '[' failed = failed ']'
2020-01-21T12:11:35.570952853-06:00 stderr F + '[' helm_v3 == 'helm_v3]'
2020-01-21T12:11:35.570979572-06:00 stderr F /usr/bin/entry: line 43: [: missing `]'
2020-01-21T12:11:35.57106419-06:00 stderr F + helm_v3 install --set-string kubernetes.ingressEndpoint.useDefaultPublishedService=true --set-string metrics.prometheus.enabled=true --set-string rbac.enabled=true --set-string ssl.enabled=true --purge traefik
2020-01-21T12:11:35.643826637-06:00 stderr F Error: unknown flag: --purge
I tried what seemed like it was worth a shot: helm rollback traefik 1 --namespace kube-system
and got this error:
Error: failed to replace object: Service "traefik" is invalid: spec.clusterIP: Invalid value: "": field is immutable
Is there a way to remove traefik and get it reinstalled using the defaults like when I deployed k3s? I'm assuming it's not just a vanilla traefik helm install?
EDIT:
I decided to try uninstalling and installing traefik from scratch, I ran helm delete traefik --namespace kube-system and k3s automatically redeployed traefik (I should have guessed it would do that) so I didn't have to reinstall.
Traefik install still failed with the dashboard.enabled: "true" line so I removed that and things are mostly back to normal (some ingress problem that'll probably be fixed by recreating the ingress).
FWIW, this is the YAML I was trying to use. K3s is my first time using traefik so this may just not be a valid configuration but it seems like it is from looking at the traefik helm chart:
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: traefik
namespace: kube-system
spec:
chart: https://%{KUBERNETES_API}%/static/charts/traefik-1.81.0.tgz
set:
rbac.enabled: "true"
ssl.enabled: "true"
metrics.prometheus.enabled: "true"
kubernetes.ingressEndpoint.useDefaultPublishedService: "true"
dashboard.enabled: "true"
I've had issues with Helm charts where the install Job fails in an odd way that causes the remove to also fail. I had to remove the manifest from the filesystem, delete the HelmChart, delete the stuck remove Job, then go edit the chart to remove the finalizer before it would actually finish deleting. Once that was done I was able to move the fixed manifest back into place.
After a few rounds of that I gave up on trying to iterate on Helm charts since updates don't seem to work right. I'm just rendering the template out to the manifests dir using helm template and letting the addon controller work its magic.
@jonstelly have you tried deploying the cluster without traefik (--no-deploy=traefik) and installing traefik manually?
Not yet. I'll try that out in the next few days along with the suggestion to try removing the traefik.yaml manifest and giving the cluster some time to get everything cleaned up, then drop the file back in the directory with the dashboard enabled.
I'm guessing removing the manifest for a short time is similar to specifying --no-deploy=traefik? Maybe some internal plumbing differences. Judging by the default traefik.yaml there doesn't seem to be a lot of customization from the chart defaults, but I'll try this out in a temporary cluster, I'd rather not have to recreate my existing cluster for this change.
And I just saw that #1347 says this might be fixed in v1.17.2-alpha3+k3s1 so I'll try that out first. Will report back with results.
Resolved with v1.17.2-alpha3+k3s1. Closing issue.
I've verified that simply editing /var/lib/rancher/k3s/server/manifests/traefik.yaml works in k3s v1.17.2+k3s1. I added the following:
dashboard.enabled: "true"
dashboard.domain: "traefik.me.somewhere"
The change gets deployed and ingress through traefik.me.somewhere shows me the traefik dashboard. Super easy, very cool!
I've verified it works on a single node but in a k3s cluster, editing a manifest file ends up with a total mess. I am not sure if it's fighting with different configurations on servers. I tried to edit the file on all servers and it literally crashed all with a CrashLoopBackOff of Helm-install. So the question is: how to you edit manifest to enable dashboard or set ACME, etc... in such a deployment? Should you disable everything and go with traditional helm deployment?
I asked the same question elsewhere. The docs don't cover how to figure out which node is running the helm controller, or any of the other custom controllers for that matter. I think they use an endpoint lock annotation to hold the master election, but I can't recall for sure.
Then would you suggest to completely disable traefik and use standard helm to deploy it? I found the helm-install concept interesting at first.
I've verified that simply editing
/var/lib/rancher/k3s/server/manifests/traefik.yamlworks in k3sv1.17.2+k3s1. I added the following:dashboard.enabled: "true" dashboard.domain: "traefik.me.somewhere"The change gets deployed and ingress through
traefik.me.somewhereshows me the traefik dashboard. Super easy, very cool!
hi how did you deployed the changes?
@ozeta All you need to do is edit the manifest file, the deploy controller picks up the changes automatically. If there are any errors in the manifest, you should see something marginally useful in the k3s logs. If there's an error in the actual Helm output, that goes into a job container that's harder to get at.
OK, so on a cluster, this is what I get when updating the yaml manifest. It tried to install/deploy the new helm in loop and crashes in loop. This works correctly on a single node.
kube-system 1s Normal Created pod/helm-install-traefik-8w2qz Created container helm
kube-system 1s Normal Started pod/helm-install-traefik-8w2qz Started container helm
kube-system 1s Normal SuccessfulCreate job/helm-install-traefik Created pod: helm-install-traefik-g8ntq
kube-system <unknown> Normal Scheduled pod/helm-install-traefik-g8ntq Successfully assigned kube-system/helm-install-traefik-g8ntq to kiddy
kube-system 1s Normal Pulled pod/helm-install-traefik-g8ntq Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system 0s Normal Created pod/helm-install-traefik-g8ntq Created container helm
kube-system 0s Normal Started pod/helm-install-traefik-g8ntq Started container helm
kube-system 0s Normal SuccessfulCreate job/helm-install-traefik Created pod: helm-install-traefik-f24xx
kube-system <unknown> Normal Scheduled pod/helm-install-traefik-f24xx Successfully assigned kube-system/helm-install-traefik-f24xx to stuffy
kube-system 0s Normal Pulled pod/helm-install-traefik-f24xx Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system 0s Normal Created pod/helm-install-traefik-f24xx Created container helm
kube-system 0s Normal Started pod/helm-install-traefik-f24xx Started container helm
kube-system 0s Normal SuccessfulCreate job/helm-install-traefik Created pod: helm-install-traefik-cgwlp
kube-system 0s Normal Pulled pod/helm-install-traefik-f24xx Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system 0s Normal Created pod/helm-install-traefik-f24xx Created container helm
kube-system 0s Warning Failed pod/helm-install-traefik-f24xx Error: failed to create containerd task: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/acf4cb5b-b945-4c25-bb5c-50c86ed289c3/volumes/kubernetes.io~secret/helm-traefik-token-wvq2l\\\" to rootfs \\\"/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/helm/rootfs\\\" at \\\"/var/run/secrets/kubernetes.io/serviceaccount\\\" caused \\\"stat /var/lib/kubelet/pods/acf4cb5b-b945-4c25-bb5c-50c86ed289c3/volumes/kubernetes.io~secret/helm-traefik-token-wvq2l: no such file or directory\\\"\"": unknown
kube-system <unknown> Normal Scheduled pod/helm-install-traefik-cgwlp Successfully assigned kube-system/helm-install-traefik-cgwlp to kiddy
kube-system 1s Normal Pulled pod/helm-install-traefik-cgwlp Container image "rancher/klipper-helm:v0.2.3" already present on machine
kube-system 0s Normal Created pod/helm-install-traefik-cgwlp Created container helm
kube-system 0s Normal Started pod/helm-install-traefik-cgwlp Started container helm
Can you provide more info in your cluster? Looks like one of the nodes isn't working properly.
also getting similar error on v1.17.4
CHART=$(sed -e "s/%{KUBERNETES_API}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/g" <<< "${CHART}")
set +v -x
+ cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates/
+ update-ca-certificates
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 init --skip-refresh --client-only
+ tiller --listen=127.0.0.1:44134 --storage=secret
Creating /root/.helm
Creating /root/.helm/repository
Creating /root/.helm/repository/cache
Creating /root/.helm/repository/local
Creating /root/.helm/plugins
Creating /root/.helm/starters
Creating /root/.helm/cache/archive
Creating /root/.helm/repository/repositories.yaml
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /root/.helm.
Not installing Tiller due to 'client-only' flag having been set
Happy Helming!
++ helm_v2 ls --all '^traefik$' --output json
++ jq -r '.Releases | length'
[main] 2020/04/15 13:28:17 Starting Tiller v2.12.3 (tls=false)
[main] 2020/04/15 13:28:17 GRPC listening on 127.0.0.1:44134
[main] 2020/04/15 13:28:17 Probes listening on :44135
[main] 2020/04/15 13:28:17 Storage driver is Secret
[main] 2020/04/15 13:28:17 Max history per release is 0
[storage] 2020/04/15 13:28:18 listing all releases with filter
+ EXIST=1
+ '[' 1 == 1 ']'
+ HELM=helm_v2
+ NAME_ARG=--name
+ JQ_CMD='"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
+ helm_repo_init
+ grep -q -e 'https\?://'
+ '[' helm_v2 == helm_v3 ']'
+ helm_v2 repo update --strict
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈
+ '[' -n '' ']'
+ helm_update install
+ '[' helm_v2 == helm_v3 ']'
++ helm_v2 ls --all '^traefik$' --output json
++ ++ tr '[:upper:]' jq '[:lower:]'
-r '"\(.Releases[0].AppVersion),\(.Releases[0].Status)"'
[storage] 2020/04/15 13:28:24 listing all releases with filter
+ LINE=1.7.19,failed
++ ++ cut -f1 -d,
echo 1.7.19,failed
+ INSTALLED_VERSION=1.7.19
++ echo 1.7.19,failed
++ cut -f2 -d,
+ STATUS=failed
+ '[' -e /config/values.yaml ']'
+ VALUES='--values /config/values.yaml'
+ '[' install = delete ']'
+ '[' -z 1.7.19 ']'
+ '[' -z '' ']'
+ '[' failed = deployed ']'
+ '[' failed = failed ']'
+ '[' helm_v2 == helm_v3 ']'
+ helm_v2 install --purge traefik
Error: unknown flag: --purge
can be fixed by removing traefik.yaml from the manifests folder, removing the traefik resources and putting the yaml back,
mv /var/lib/rancher/k3s/server/manifests/traefik.yaml /tmp/traefik.yaml
kubectl -n kube-system delete helmcharts.helm.cattle.io traefik
kubectl delete -n kube-system service traefik-dashboard
kubectl delete -n kube-system ingress traefik-dashboard
mv /tmp/traefik.yaml /var/lib/rancher/k3s/server/manifests/traefik.yaml
so it seems it's not issue with the manifest or their deployment, it's the updating them that fails
@kalgecin I found that I had to do this remove/cleanup/replace dance with the manifest whenever I ran into an issue with a Helm chart using the built-in controller.
The integrated deploy and helm controllers are great for bootstrapping the cluster, but there are some issues like this one, and the fact that it overwrites the templates on restart, and the poor logging... that make it really difficult to use them for extensive customization.
"and the fact that it overwrites the templates on restart" wait a minute, the design is that the yaml is erased with the default one everytime you restart the server?
yup, i have a script that copies my yaml after the server has restarted
Thanks. I guess I will disable and deploy it with helm myself then.
yeah, I've taken to disabling all the built-in templates and just copying stuff into the manifests folder when I want it deployed. For Helm charts I just use helm template to render it locally and then copy the generated yaml over.
Most helpful comment
I've verified that simply editing
/var/lib/rancher/k3s/server/manifests/traefik.yamlworks in k3sv1.17.2+k3s1. I added the following:The change gets deployed and ingress through
traefik.me.somewhereshows me the traefik dashboard. Super easy, very cool!