K3s: /var/lib/rancher/k3s/data/ extraction failed

Created on 7 Aug 2019  Â·  14Comments  Â·  Source: k3s-io/k3s

Describe the bug
After updating the master with curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.8.0 sh -, the setup script does not complete with a functional cluster startup.

$ curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.8.0 sh -
[INFO]  Using v0.8.0 as release
[INFO]  Downloading hash https://github.com/rancher/k3s/releases/download/v0.8.0/sha256sum-arm.txt
[INFO]  Downloading binary https://github.com/rancher/k3s/releases/download/v0.8.0/k3s-armhf
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping /usr/local/bin/kubectl symlink to k3s, already exists
[INFO]  Skipping /usr/local/bin/crictl symlink to k3s, already exists
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
Job for k3s.service failed because the control process exited with error code.
See "systemctl status k3s.service" and "journalctl -xe" for details.

The journalctl log prints the following error message:

[ lines omitted ]
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit k3s.service has begun execution.
--
-- The job identifier is 4707.
Aug 07 18:49:30 hostname k3s[28794]: time="2019-08-07T18:49:30+02:00" level=fatal msg="exec: \"k3s-server\": executable file not found in $PATH"
Aug 07 18:49:30 hostname systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit k3s.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Aug 07 18:49:30 hostname systemd[1]: k3s.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
[ lines omitted ]

To Reproduce
Set up a cluster with v0.6.1 or earlier and run curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v0.8.0 sh -, according the documentation on "Updates"

Expected behavior
systemctl start k3s results in exit 0 with no errors

Additional context
Just wanted to migrate from v0.6.0 to v.0.8.0 and rerun the command, according the docs.
Running the k3s server standalone command resulted in the following error too:

pi@hostname:~ $ /usr/local/bin/k3s server
FATA[0000] exec: "k3s-server": executable file not found in $PATH
Done kinbug

Most helpful comment

Awesome! I am going to re-open, have seen this before and there is some logic we can add to make sure the directory is valid. Seems more likely to happen on a Pi but could likely happen on any system.

All 14 comments

Sorry, I am not able to reproduce. Are you out of disk space? Or is /var/lib/rancher/k3s/data not writable?

root@k3s-base:~# find /var/lib/rancher/ -iname k3s-server
/var/lib/rancher/k3s/data/851e5f8445c14de3de589e307de8893a789cfadd7bdd5d0683cd7629b9f0684b/bin/k3s-server
/var/lib/rancher/k3s/data/3efc0e0a1184289333969a97781692ee504192da04c7beffa44702a3b8129766/bin/k3s-server

I've checked the filesystem permissions. Neither normal user, nor root can start the service, although root can write to /var/lib/rancher/k3s/data.

root@hostname:~# ls -lisa /var/lib/rancher/k3s/data
total 16
125223 4 drwxr-xr-x 4 root root 4096 Aug  7 18:49 .
125222 4 drwxr-xr-x 5 root root 4096 Jul  4 19:04 ..
125220 4 drwxr-xr-x 3 root root 4096 Aug  7 18:48 3efc0e0a1184289333969a97781692ee504192da04c7beffa44702a3b8129766
125224 4 drwxr-xr-x 3 root root 4096 Jul  4 19:02 d2e8e7878560889a4fc633485f8481f23fdc4c02707a41bc6e3b1ff98fdca189

And the k3s-server is present:

root@hostname:~# find /var/lib/rancher/ -iname k3s-server
/var/lib/rancher/k3s/data/d2e8e7878560889a4fc633485f8481f23fdc4c02707a41bc6e3b1ff98fdca189/bin/k3s-server

What about your filesystem permissions or did they change between the releases of 0.6.0 and 0.8.0 (i'm guessing no)

What is the output of df -h?

It looks like either the sd-card is failing or is out of space.

You can remove /var/lib/rancher/k3s/data and restart k3s, the 3efc0e0a1184289333969a97781692ee504192da04c7beffa44702a3b8129766 directory will be recreated, hopefully with all of the contents.

Filesystem was not out of space but removing the directory solved it. Thank you!

Awesome! I am going to re-open, have seen this before and there is some logic we can add to make sure the directory is valid. Seems more likely to happen on a Pi but could likely happen on any system.

Thanks alot! If you need further information let me know

I am facing this issue during the upgrade of k3s v1.17 to v1.18 using system-update-controller.

I am facing this issue during the upgrade of k3s v1.17 to v1.18 using system-update-controller.

@jkraj is there enough space on the device that /var/lib/rancher/k3s/data is on? Are you seeing this from within system-upgrade-controller? If so please share the log.

Checking for free disk space could be a possible enhancement for install.sh and the SUC. There's also the fact that we don't ever clean up old data directories, as referenced here: https://github.com/rancher/k3s/pull/1786#issuecomment-640253324

I am facing this issue during the upgrade of k3s v1.17 to v1.18 using system-update-controller.

@jkraj is there enough space on the device that /var/lib/rancher/k3s/data is on? Are you seeing this from within system-upgrade-controller? If so please share the log.

I don't see any space issue on the device. I deleted the /var/lib/rancher/k3s/data/<new-hash> directory and restarted k3s everything back to normal.

I didn't see logs of Job started by SUC at that time. It got cleaned-up now.

There's also the fact that we don't ever clean up old data directories

I think the issue is with new data directory not the old ones. I tried executing k3s check-config but check-config not present in the new data directory. So copied check-config from old data directory to new data directory. I got below output before cleanup.

Verifying binaries in /var/lib/rancher/k3s/data/3a8d3d90c0ac3531edbdbde77ce4a85062f4af8865b98cedc30ea730715d9d48/bin:
- sha256sum: does not match (fail)
  ... sha256sum: aux/wg-add.sh: No such file or directory
  ... aux/wg-add.sh: FAILED open or read
  ... blkid: OK
  ... busybox: OK
  ... charon: OK
  ... check-config: OK
  ... cni: OK
  ... conntrack: OK
  ... sha256sum: containerd: No such file or directory
  ... containerd: FAILED open or read
  ... containerd-shim: OK
  ... sha256sum: containerd-shim-runc-v2: No such file or directory
  ... containerd-shim-runc-v2: FAILED open or read
  ... coreutils: OK
  ... sha256sum: ebtables: No such file or directory
  ... ebtables: FAILED open or read
  ... find: OK
  ... sha256sum: ip: No such file or directory
  ... ip: FAILED open or read
  ... ipset: OK
  ... losetup: OK
  ... sha256sum: pigz: No such file or directory
  ... pigz: FAILED open or read
  ... runc: OK
  ... slirp4netns: OK
  ... socat: OK
  ... swanctl: OK
  ... sha256sum: xtables-legacy-multi: No such file or directory
  ... xtables-legacy-multi: FAILED open or read
  ... sha256sum: WARNING: 7 listed files could not be read
- links: arch should link to busybox (fail)
- links: arp should link to busybox (fail)
- links: ash should link to busybox (fail)
- links: aux/mount should link to ../busybox (fail)
- links: awk should link to busybox (fail)
- links: basename should link to coreutils (fail)
- links: bunzip2 should link to busybox (fail)
- links: cat should link to coreutils (fail)
- links: cksum should link to coreutils (fail)
- links: clear should link to busybox (fail)
- links: cmp should link to busybox (fail)
- links: crontab should link to busybox (fail)
- links: csplit should link to coreutils (fail)
- links: dd should link to coreutils (fail)
- links: deallocvt should link to busybox (fail)
- links: deluser should link to busybox (fail)
- links: df should link to coreutils (fail)
- links: dir should link to coreutils (fail)
- links: dnsd should link to busybox (fail)
- links: dumpkmap should link to busybox (fail)
- links: egrep should link to busybox (fail)
- links: env should link to coreutils (fail)
- links: expand should link to coreutils (fail)
- links: fallocate should link to busybox (fail)
- links: fdformat should link to busybox (fail)
- links: fold should link to coreutils (fail)
- links: freeramdisk should link to busybox (fail)
- links: fsfreeze should link to busybox (fail)
- links: fstrim should link to busybox (fail)
- links: getopt should link to busybox (fail)
- links: gunzip should link to busybox (fail)
- links: gzip should link to busybox (fail)
- links: hdparm should link to busybox (fail)
- links: i2cdetect should link to busybox (fail)
- links: i2cdump should link to busybox (fail)
- links: i2cset should link to busybox (fail)
- links: ifdown should link to busybox (fail)
- links: inetd should link to busybox (fail)
- links: init should link to busybox (fail)
- links: ipaddr should link to busybox (fail)
- links: iplink should link to busybox (fail)
- links: iproute should link to busybox (fail)
- links: iptables-restore should link to xtables-legacy-multi (fail)
- links: k3s-server should link to containerd (fail)
- links: less should link to busybox (fail)
- links: login should link to busybox (fail)
- links: lsattr should link to busybox (fail)
- links: lsusb should link to busybox (fail)
- links: lzopcat should link to busybox (fail)
- links: md5sum should link to coreutils (fail)
- links: mkdir should link to coreutils (fail)
- links: mkdosfs should link to busybox (fail)
- links: mke2fs should link to busybox (fail)
- links: mkpasswd should link to busybox (fail)
- links: modprobe should link to busybox (fail)
- links: mv should link to coreutils (fail)
- links: nl should link to coreutils (fail)
- links: nslookup should link to busybox (fail)
- links: numfmt should link to coreutils (fail)
- links: partprobe should link to busybox (fail)
- links: passwd should link to busybox (fail)
- links: pidof should link to busybox (fail)
- links: pipe_progress should link to busybox (fail)
- links: pr should link to coreutils (fail)
- links: printf should link to coreutils (fail)
- links: ptx should link to coreutils (fail)
- links: rdate should link to busybox (fail)
- links: run-init should link to busybox (fail)
- links: setarch should link to busybox (fail)
- links: setfattr should link to busybox (fail)
- links: setkeycodes should link to busybox (fail)
- links: sha384sum should link to coreutils (fail)
- links: sha3sum should link to busybox (fail)
- links: sha512sum should link to coreutils (fail)
- links: shred should link to coreutils (fail)
- links: sort should link to coreutils (fail)
- links: stat should link to coreutils (fail)
- links: strings should link to busybox (fail)
- links: svc should link to busybox (fail)
- links: swapon should link to busybox (fail)
- links: tee should link to coreutils (fail)
- links: tftp should link to busybox (fail)
- links: time should link to busybox (fail)
- links: timeout should link to coreutils (fail)
- links: top should link to busybox (fail)
- links: true should link to coreutils (fail)
- links: unpigz should link to pigz (fail)
- links: unzip should link to busybox (fail)
- links: usleep should link to busybox (fail)
- links: vi should link to busybox (fail)
- links: w should link to busybox (fail)
- links: wget should link to busybox (fail)
- links: which should link to busybox (fail)
- links: xxd should link to busybox (fail)
- links: yes should link to coreutils (fail)

System:
- /sbin iptables v1.6.1: older than v1.8
- swap: disabled
- routes: ok

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

info: reading kernel config from /boot/config-5.3.0-1023-aws ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- /sbin/apparmor_parser
apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: missing (fail)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: missing (fail)
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_SET: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

STATUS: 98 (fail)

I think the issue is with new data directory not the old ones.

I was thinking of this from a disk space perspective. If for some reason /var/lib/rancher/k3s/data/ is on a smaller partition, leaving old copies of k3s-root around might contribute to running out of space, over time.

Also experience exec: \"k3s-server\": executable file not found in $PATH"
For me it is not diskspace, but what appears to be a borked SD card:

root@raspberrypi:~# ls /var/lib/rancher/k3s/
ls: reading directory '/var/lib/rancher/k3s/': Input/output error

Closing due to age. Issue appears to be most frequently caused by bad underlying storage.

Was this page helpful?
0 / 5 - 0 ratings