We had built a cluster with 3 master nodes and a bunch of worker nodes. Over night 2 of the master died and didn't came back up. Trying to start the k3s.service wie systemctl (Debian 10) is to no avail, the process gets killed immediately.
```Dec 18 17:33:15 master-3 systemd[1]: Starting Lightweight Kubernetes...
Dec 18 17:33:15 master-3 k3s[5645]: time="2019-12-18T17:33:15.329364828+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 18 17:33:15 master-3 k3s[5645]: time="2019-12-18T17:33:15.329884440+01:00" level=info msg="Cluster bootstrap already complete"
Dec 18 17:33:17 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 18 17:33:17 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 18 17:33:17 master-3 systemd[1]: Failed to start Lightweight Kubernetes.
Calling the k3s binary does basically the same.
```/usr/local/bin/k3s server '--no-deploy=traefik,local-storage,servicelb' '--flannel-backend=wireguard' '--cluster-init' -v 10000
I1218 17:36:26.952042 6205 interface.go:384] Looking for default routes with IPv4 addresses
I1218 17:36:26.952132 6205 interface.go:392] Default route transits interface "eth0"
I1218 17:36:26.952262 6205 interface.go:196] Interface eth0 is up
I1218 17:36:26.952327 6205 interface.go:244] Interface "eth0" has 3 addresses :[88.xx.xx.xx/32 2a01:xxx:xxx:352c::1/64 fe80::xx:xx:xx:b26b/64].
I1218 17:36:26.952359 6205 interface.go:211] Checking addr 88.xx.xx.xx/32.
I1218 17:36:26.952369 6205 interface.go:218] IP found 88.xx.xx.xx
I1218 17:36:26.952384 6205 interface.go:250] Found valid IPv4 address 88.xx.xx.xx for interface "eth0".
I1218 17:36:26.952392 6205 interface.go:398] Found active IP 88.xx.xx.xx
I1218 17:36:26.952418 6205 services.go:45] Setting service IP to "10.43.0.1" (read-write).
INFO[2019-12-18T17:36:26.952441448+01:00] Starting k3s v1.0.0 (18bd921c)
I1218 17:36:26.973940 6205 services.go:45] Setting service IP to "10.43.0.1" (read-write).
I1218 17:36:26.975450 6205 interface.go:384] Looking for default routes with IPv4 addresses
I1218 17:36:26.975470 6205 interface.go:392] Default route transits interface "eth0"
I1218 17:36:26.975540 6205 interface.go:196] Interface eth0 is up
I1218 17:36:26.975593 6205 interface.go:244] Interface "eth0" has 3 addresses :[88.xx.xx.xx/32 2a01:xxx:xxx:352c::1/64 fe80::xx:xx:xx:b26b/64].
I1218 17:36:26.975617 6205 interface.go:211] Checking addr 88.xx.xx.xx/32.
I1218 17:36:26.975627 6205 interface.go:218] IP found 88.xx.xx.xx/
I1218 17:36:26.975640 6205 interface.go:250] Found valid IPv4 address 88.xx.xx.xx for interface "eth0".
I1218 17:36:26.975656 6205 interface.go:398] Found active IP 88.xx.xx.xx
Segmentation fault
Cluster is build by running
curl -sfL https://get.k3s.io | K3S_TOKEN=<TOKEN> INSTALL_K3S_EXEC="server --no-deploy=traefik,local-storage,servicelb --flannel-backend=wireguard --cluster-init" sh -
We can't get the cluster healthy again, because these two masters won't start their k3s again. Any idea how to fix?
I more or less have exactly the same on a bunch of Rock64, Out of 10, I made three masters and the rest workers.
When I went to bed last night all looked well.
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rock64-10 Ready
rock64-1 Ready master 64m v1.16.3-k3s.2 192.168.1.30
rock64-2 Ready master 54m v1.16.3-k3s.2 192.168.1.31
rock64-3 Ready master 16m v1.16.3-k3s.2 192.168.1.32
rock64-4 Ready
rock64-5 Ready
rock64-6 Ready
rock64-7 Ready
rock64-8 Ready
rock64-9 Ready
When I get up this morning k3s is just segfaulting at start:
Dec 19 09:26:55 localhost systemd[1]: Starting Lightweight Kubernetes...
Dec 19 09:26:56 localhost k3s[2597]: time="2019-12-19T09:26:56.915987995Z" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 09:26:57 localhost systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 09:26:57 localhost systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 09:26:57 localhost systemd[1]: Failed to start Lightweight Kubernetes.
Could you please start k3s with debug logs? Otherwise, it's hard to guess what's happening. You can pass
-v 3
to the server, so for example:
curl -sfL https://get.k3s.io | K3S_TOKEN=<TOKEN> INSTALL_K3S_EXEC="server -v 3 --no-deploy=traefik,local-storage,servicelb --flannel-backend=wireguard --cluster-init" sh -
[INFO] Finding latest release
[INFO] Using v1.0.0 as release
[INFO] Downloading hash https://github.com/rancher/k3s/releases/download/v1.0.0/sha256sum-amd64.txt
[INFO] Skipping binary downloaded, installed k3s matches hash
[INFO] Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s
Job for k3s.service failed because a fatal signal was delivered to the control process.
See "systemctl status k3s.service" and "journalctl -xe" for details.
systemctl status k3s.service:
● k3s.service - Lightweight Kubernetes
Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: signal) since Thu 2019-12-19 11:26:17 CET; 3s ago
Docs: https://k3s.io
Process: 32762 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
Process: 32763 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Process: 32764 ExecStart=/usr/local/bin/k3s server -v 3 --no-deploy=traefik,local-storage,servicelb --flannel-backend=wireguard --server https://195.201.223.208:6443
Main PID: 32764 (code=killed, signal=SEGV)
journalctl -xe:
Dec 19 11:26:44 master-3 systemd[1]: Stopped Lightweight Kubernetes.
-- Subject: A stop job for unit k3s.service has finished
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A stop job for unit k3s.service has finished.
--
-- The job identifier is 869556 and the job result is done.
Dec 19 11:26:44 master-3 systemd[1]: Starting Lightweight Kubernetes...
-- Subject: A start job for unit k3s.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit k3s.service has begun execution.
--
-- The job identifier is 869556.
Dec 19 11:26:44 master-3 k3s[386]: time="2019-12-19T11:26:44.335172229+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 11:26:44 master-3 k3s[386]: time="2019-12-19T11:26:44.335378931+01:00" level=info msg="Cluster bootstrap already complete"
Dec 19 11:26:44 master-3 sshd[340]: Failed password for root from 222.186.173.215 port 61980 ssh2
Dec 19 11:26:46 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit k3s.service has exited.
--
-- The process' exit code is 'killed' and its exit status is 11.
Dec 19 11:26:46 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit k3s.service has entered the 'failed' state with result 'signal'.
Dec 19 11:26:46 master-3 systemd[1]: Failed to start Lightweight Kubernetes.
-- Subject: A start job for unit k3s.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit k3s.service has finished with a failure.
--
-- The job identifier is 869556 and the job result is failed.
Mine is no more helpful either:
[INFO] Finding latest release
[INFO] Using v1.0.0 as release
[INFO] Downloading hash https://github.com/rancher/k3s/releases/download/v1.0.0/sha256sum-arm64.txt
[INFO] Skipping binary downloaded, installed k3s matches hash
[INFO] Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO] Skipping /usr/local/bin/ctr symlink to k3s, already exists
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO] systemd: Starting k3s
Job for k3s.service failed because a fatal signal was delivered to the control process.
See "systemctl status k3s.service" and "journalctl -xe" for details.
root@rock64-1:~# tail /var/log/syslog
Dec 19 10:31:25 localhost systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 10:31:25 localhost systemd[1]: Starting Lightweight Kubernetes...
Dec 19 10:31:26 localhost k3s[6774]: time="2019-12-19T10:31:26.698005505Z" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 10:31:27 localhost systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 10:31:27 localhost systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 10:31:27 localhost systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 10:31:32 localhost systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 10:31:32 localhost systemd[1]: k3s.service: Scheduled restart job, restart counter is at 5.
Dec 19 10:31:32 localhost systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 10:31:32 localhost systemd[1]: Starting Lightweight Kubernetes...
can you please also provide journalctl -u k3s ? Will be a bit easier to follow
Mine just repeats itself:
Dec 19 09:57:51 rock64-1 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 09:57:51 rock64-1 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 5.
Dec 19 09:57:51 rock64-1 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 09:57:51 rock64-1 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 09:57:52 rock64-1 k3s[1398]: time="2019-12-19T09:57:52.227235578Z" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 09:57:52 rock64-1 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 09:57:52 rock64-1 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 09:57:52 rock64-1 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 09:57:58 rock64-1 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 09:57:58 rock64-1 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 6.
Dec 19 09:57:58 rock64-1 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 09:57:58 rock64-1 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 09:57:59 rock64-1 k3s[1415]: time="2019-12-19T09:57:59.221760455Z" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 09:57:59 rock64-1 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 09:57:59 rock64-1 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 09:57:59 rock64-1 systemd[1]: Failed to start Lightweight Kubernetes.
of course, see here
Dec 19 11:35:50 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 11:35:50 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 11:35:50 master-3 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 11:35:55 master-3 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 11:35:55 master-3 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 89.
Dec 19 11:35:55 master-3 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 11:35:55 master-3 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 11:35:55 master-3 k3s[2032]: time="2019-12-19T11:35:55.352137616+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 11:35:55 master-3 k3s[2032]: time="2019-12-19T11:35:55.352358530+01:00" level=info msg="Cluster bootstrap already complete"
Dec 19 11:35:57 master-3 systemd[1]: k3s.service: Main process exited, code=killed, status=11/SEGV
Dec 19 11:35:57 master-3 systemd[1]: k3s.service: Failed with result 'signal'.
Dec 19 11:35:57 master-3 systemd[1]: Failed to start Lightweight Kubernetes.
Dec 19 11:36:02 master-3 systemd[1]: k3s.service: Service RestartSec=5s expired, scheduling restart.
Dec 19 11:36:02 master-3 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 90.
Dec 19 11:36:02 master-3 systemd[1]: Stopped Lightweight Kubernetes.
Dec 19 11:36:02 master-3 systemd[1]: Starting Lightweight Kubernetes...
Dec 19 11:36:02 master-3 k3s[2052]: time="2019-12-19T11:36:02.351682007+01:00" level=info msg="Starting k3s v1.0.0 (18bd921c)"
Dec 19 11:36:02 master-3 k3s[2052]: time="2019-12-19T11:36:02.352019332+01:00" level=info msg="Cluster bootstrap already complete"
This seems to be the same as #1181
I'm pretty sure this has to do with dsqlite, I was digging around on my other nodes which ended up failing too and I was a reference somewhere in the logs saying something similar to "unable to elect master". I recently abandoned an LXD raspberry Pi cluster for exactly the same reason. Even if I just rebooted a single node on the LXD cluster the whole thing would become broken due to this dsqlite not seeming to get its act together.
Anyway, I switched out to an external postgres DB yesterday and everything has been working just fine since.
I was able to reproduce the issue but unfortunately, but I have issues compiling k3s on my OrangePi when I was trying to fix it :( So I'll have to back off from this, hopefully someone from rancher can
follow along. The issue seems to lie in canonical go-dqlite library.
panic: runtime error: makeslice: len out of range happens when calling getBLob func.
But I also couldn't find where does "CREATE /registry/health" happens which pops up just before a panic attack - maybe it would tell a bit more.
P.S.
So something is trying to create slice with a length which can't be tracked therefore we're getting "len out of range" of the supported length for a byte sized slice but I can't find what. I don't have Raspberry and my OrangePi has 32bit CPU so I can't test but maybe Raspberry CPU is 64-bit only fr instructions and 32-bit for pointers and 32-bit type INTs and UINTs?
Anyway, here are the debug logs:
root@orangepilite:~# k3s --debug server
DEBU[0000] Asset dir /var/lib/rancher/k3s/data/182bf1607a98af006c64bf65c7e0aeaa6fef00309ac072b56edef511f34d2ac4
DEBU[0000] Running /var/lib/rancher/k3s/data/182bf1607a98af006c64bf65c7e0aeaa6fef00309ac072b56edef511f34d2ac4/bin/k3s-server [k3s --debug server]
INFO[2019-12-22T10:27:05.883284862Z] Starting k3s v1.0.1 (e94a3c60)
INFO[2019-12-22T10:27:06.322994959Z] Testing connection to peers [192.168.1.16:6443]
DEBU[2019-12-22T10:27:06.324675242Z] connected address=192.168.1.16:6443 attempt=0
DEBU[2019-12-22T10:27:06.329403269Z] connected address=192.168.1.16:6443 attempt=0
DEBU[2019-12-22T10:27:06.361818933Z] connected address=192.168.1.16:6443 attempt=0
DEBU[2019-12-22T10:27:06.367374769Z] CREATE /registry/health, size=17, lease=0 => rev=1, err=<nil>
panic: runtime error: makeslice: len out of range
goroutine 1 [running]:
github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Message).getBlob(0x70d88bc, 0x0, 0x585d720, 0x7673b10)
/go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:356 +0x3c
github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol.(*Rows).Next(0x760c3d4, 0x72507e0, 0xb, 0xb, 0x7bf5c, 0x978ca8)
/go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/internal/protocol/message.go:557 +0x2ec
github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver.(*Rows).Next(0x760c3c0, 0x72507e0, 0xb, 0xb, 0x0, 0x0)
/go/src/github.com/rancher/k3s/vendor/github.com/canonical/go-dqlite/driver/driver.go:585 +0x40
database/sql.(*Rows).nextLocked(0x7b4e1e0, 0x970000)
/usr/local/go/src/database/sql/sql.go:2767 +0xb4
database/sql.(*Rows).Next.func1()
/usr/local/go/src/database/sql/sql.go:2745 +0x2c
database/sql.withLock(0x3795d08, 0x7b4e1f8, 0x7b2104c)
/usr/local/go/src/database/sql/sql.go:3184 +0x60
database/sql.(*Rows).Next(0x7b4e1e0, 0x7673b00)
/usr/local/go/src/database/sql/sql.go:2744 +0x78
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.RowsToEvents(0x7b4e1e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:221 +0xc8
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog.(*SQLLog).List(0x760c270, 0x37bbfd8, 0x7165c80, 0x2f04e21, 0x10, 0x0, 0x0, 0x1, 0x0, 0x0, ...)
/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/sqllog/sql.go:188 +0xe0
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).get(0x73f0170, 0x37bbfd8, 0x7165c80, 0x2f04e21, 0x10, 0x0, 0x0, 0x7a36001, 0x70d8400, 0xa83bd4, ...)
/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:55 +0x80
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).Create(0x73f0170, 0x37bbfd8, 0x7165c80, 0x2f04e21, 0x10, 0x76945a0, 0x11, 0x11, 0x0, 0x0, ...)
/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:88 +0xfc
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured.(*LogStructured).Start(0x73f0170, 0x37bbfd8, 0x7165c80, 0x6, 0x781a059)
/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/logstructured/logstructured.go:36 +0xd0
github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/endpoint.Listen(0x37bbfd8, 0x7165c80, 0x0, 0x0, 0x0, 0x781a050, 0x48, 0x0, 0x0, 0x0, ...)
/go/src/github.com/rancher/k3s/vendor/github.com/rancher/kine/pkg/endpoint/endpoint.go:58 +0xe4
github.com/rancher/k3s/pkg/cluster.(*Cluster).startStorage(0x70e3040, 0x37bbfd8, 0x7165c80, 0x0, 0x0)
/go/src/github.com/rancher/k3s/pkg/cluster/cluster.go:62 +0x60
github.com/rancher/k3s/pkg/cluster.(*Cluster).Start(0x70e3040, 0x37bbfd8, 0x7165c80, 0x0, 0x0)
/go/src/github.com/rancher/k3s/pkg/cluster/cluster.go:53 +0xa8
github.com/rancher/k3s/pkg/daemons/control.prepare(0x37bbfd8, 0x7165c80, 0x79ed204, 0x7482c60, 0x5847fe8, 0x1a)
/go/src/github.com/rancher/k3s/pkg/daemons/control/server.go:337 +0xfa8
github.com/rancher/k3s/pkg/daemons/control.Server(0x37bbfd8, 0x7165c80, 0x79ed204, 0x37bbfd8, 0x7165c80)
/go/src/github.com/rancher/k3s/pkg/daemons/control/server.go:83 +0x168
github.com/rancher/k3s/pkg/server.StartServer(0x37bbfd8, 0x7165c80, 0x79ed200, 0x7165c80, 0x2)
/go/src/github.com/rancher/k3s/pkg/server/server.go:51 +0x70
github.com/rancher/k3s/pkg/cli/server.run(0x730bb80, 0x5848a48, 0x1786c8, 0x37b5698)
/go/src/github.com/rancher/k3s/pkg/cli/server/server.go:173 +0xc58
github.com/rancher/k3s/pkg/cli/server.Run(0x730bb80, 0x5604a10, 0x0)
/go/src/github.com/rancher/k3s/pkg/cli/server/server.go:35 +0x44
github.com/rancher/k3s/vendor/github.com/urfave/cli.HandleAction(0x29d5a98, 0x30a7bb0, 0x730bb80, 0x730bb80, 0x0)
/go/src/github.com/rancher/k3s/vendor/github.com/urfave/cli/app.go:514 +0xac
github.com/rancher/k3s/vendor/github.com/urfave/cli.Command.Run(0x2eea252, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2f1866f, 0x15, 0x7896e40, ...)
/go/src/github.com/rancher/k3s/vendor/github.com/urfave/cli/command.go:171 +0x370
github.com/rancher/k3s/vendor/github.com/urfave/cli.(*App).Run(0x731a700, 0x700c0a0, 0x3, 0x4, 0x0, 0x0)
/go/src/github.com/rancher/k3s/vendor/github.com/urfave/cli/app.go:265 +0x510
main.main()
/go/src/github.com/rancher/k3s/cmd/server/main.go:46 +0x2ec
Running into the same issue on three different k3s cluster masters (all set up with k3sup install --ip $SERVER_IP --sudo=false --cluster --k3s-extra-args '--no-deploy traefik').
$ /usr/local/bin/k3s --debug server --cluster-init --tls-san $SERVER_IP --no-deploy traefik
DEBU[0000] Asset dir /var/lib/rancher/k3s/data/HASH
DEBU[0000] Running /var/lib/rancher/k3s/data/HASH/bin/k3s-server [/usr/local/bin/k3s --debug server --cluster-init --tls-san $SERVER_IP --no-deploy traefik]
INFO[2020-02-05T19:34:01.279721010+01:00] Starting k3s v1.17.2+k3s1 (cdab19b0)
Segmentation fault
strace:
[..]
futex(0x6d87010, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d86f10, FUTEX_WAKE_PRIVATE, 1) = 1
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
recvfrom(5, {{len=20, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=1, pid=32278}, 0}, 4096, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, [112->12]) = 20
getsockname(5, {sa_family=AF_NETLINK, nl_pid=32278, nl_groups=00000000}, [112->12]) = 0
close(5) = 0
getuid() = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
openat(AT_FDCWD, "/etc//localtime", O_RDONLY) = 5
read(5, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 4096) = 2335
read(5, "", 4096) = 0
close(5) = 0
write(2, "\33[36mINFO\33[0m[2020-02-05T19:38:4"..., 97INFO[2020-02-05T19:38:45.395473030+01:00] Starting k3s v1.17.2+k3s1 (cdab19b0)
) = 97
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc0004712c8, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_SETMASK, ~[HUP INT QUIT ILL TRAP ABRT BUS FPE SEGV TERM STKFLT CHLD PROF SYS RTMIN RT_1], NULL, 8) = 0
futex(0xc0004712c8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d88228, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc000186848, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x6d88228, FUTEX_WAIT_PRIVATE, 0, NULL) = ?
+++ killed by SIGSEGV +++
Segmentation fault
What platform (architecture) is this on?
Debian 9 and 10, latest patch level, x86-64.
I've seen this error when using cluster with dsqlite on RPi 4 and x86-64. I don't have logs because I removed k3s and will install again with an external DB.
v1.17.3+k3s1 unfortunately didn't fix the issue for me :(
Same issue on RHEL7 amd64 v1.17.3+k3s1. SEGV after "Cluster bootstrap already complete".
The issue still exists on Debian 10 amd64 with v1.17.4+k3s1 in a multi server environment with embedded dsqlite DB.
Most helpful comment
The issue still exists on Debian 10 amd64 with
v1.17.4+k3s1in a multi server environment with embedded dsqlite DB.