Kind: Create cluster fails - kind-control-plane does not work on zfs

Created on 9 Jul 2020  ยท  21Comments  ยท  Source: kubernetes-sigs/kind

What happened: cluster create failed

failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

What you expected to happen: successful cluster creation

How to reproduce it (as minimally and precisely as possible):

$ kind create cluster 

Anything else we need to know?:

I'm using ZFS. I have read the other (now closed as resolved) issue about using zfs and I can see that /dev/mapper is bind-mounted into the kind-control-plane container. I'm running 0.8.1 so I believe that I should have the version that should work with ZFS.

Environment:

  • kind version: (use kind version):
    kind v0.8.1 go1.14.2 linux/amd64
  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"archive", BuildDate:"2020-04-23T22:11:11Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64"}

  • Docker version: (use docker info):
    Server Version: 19.03.8-ce containerd version: d76c121f76a5fc8a462dc64594aea72fe18e1178.m

  • OS (e.g. from /etc/os-release): Arch Linux
    5.6.11-arch1-1 #1 SMP PREEMPT Wed, 06 May 2020 17:32:37 +0000 x86_64 GNU/Linux

To capture more information, I ran with kind create cluster --loglevel=debug --retain. Here is some further log data...

In kind-control-plane/containerd.log

Jul 08 12:53:09 kind-control-plane containerd[127]: time="2020-07-08T12:53:09.553579553Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-kind-control-plane,Uid:350cc499f8fb9468ea828e0b13035d50,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to mount rootfs component &{overlay overlay [workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/82/work upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/82/fs lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs]}: invalid argument: unknown"

Also it appears to be trying to use zfs but doesn't have the zfs executable:

Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850165016Z" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850516545Z" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="exec: \"zfs\": executable file not found in $PATH: \"zfs fs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset system/storage/docker\" => "
Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850555778Z" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
Jul 09 14:57:36 kind-control-plane containerd[161]: time="2020-07-09T14:57:36.850594684Z" level=warning msg="could not use snapshotter zfs in metadata plugin" error="exec: \"zfs\": executable file not found in $PATH: \"zfs fs list -Hp -o name,origin,used,available,mountpoint,compression,type,volsize,quota,referenced,written,logicalused,usedbydataset system/storage/docker\" => "

kinbug lifecyclactive

Most helpful comment

so setting the snapshotter to zfs doesn't work (it complains about a missing metadata.db when actually trying to take snapshots). however, I noticed this from the microk8s project: https://github.com/ubuntu/microk8s/commit/a5ec1f9540dbc6500e39dbdf30c79027f8e99239#diff-e263cbd0de8da1f880f701684ae8b035R35-R36

and sure enough,
```sh
cat < kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:

  • |-
    [plugins."io.containerd.grpc.v1.cri".containerd]
    snapshotter = "native"
    EOF
    ```
    works without modifying the base image.

All 21 comments

I made a new kind-control-plane image by copying zfs and the necessary library dependencies from my host into a running container which I committed to an image. I then ran with kind create cluster --loglevel=debug --retain --image=kind-control-plane-zfs. The zfs errors are gone but it is still trying to use overlayfs. I've attached a zip of the logs...
287006805.tar.gz

Adding for reference, the following is what I copied in...

$ docker cp /usr/bin/zfs kind-control-plane:/usr/bin/zfs
$ docker cp /usr/lib/libnvpair.so.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libnvpair.so.1.0.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libuutil.so.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libuutil.so.1.0.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs.so.2 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs.so.2.0.0 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs_core.so.1 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libzfs_core.so.1.0.0 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libtirpc.so.3 kind-control-plane:/usr/lib/
$ docker cp /usr/lib/libtirpc.so.3.0.0 kind-control-plane:/usr/lib/

one thing to consider: I'm not sure if it's safe for us to ship zfs binaries with kind, besides legal questions, I'm not sure if you can have a ZFS binary that isn't shipped to match the dkms module, and the kernel / module are going to come from the host.

in the short term you may have to run kind on some other filesystem that overlay functions on (most of them?)

I don't think containerd/CRI will use ZFS unless forced to, kind is leaving the defaults here.

kind create cluster --config=config.yaml

config.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".containerd]
    snapshotter = "zfs"

I get what you're saying about zfs and that is not a surprise given the whole thing surrounding the zfs license. It'd be very difficult to have zfs prepackaged inside the container without tying heavily to specific kernel/toolchain versions on the host.

I've also been trying k3d and with that I have a wrapper script that creates a loopmount file system using docker-volume-loopback to create sparse ext4 volumes that are passed into k3d. That works but I don't know enough (anything?) about how kind works to port that usage to it.

I might have to put kind down until I can use anohter filesystem.

I think licensing wise we're actually probably fine, since we don't ship a kernel and wouldn't ship the DKMS or binary kernel module, it looks like ubuntu does have a package for just the CLI utils so we'd just ship that, and some automatic tweak to the config.

I'm having difficulty determining if it's safe to mix the zfs utils version versus the kernel module, I've never used them out of sync before (and barely at all).

Can you test if this works using the above cluster config to enable the patch? This is similar to the customization needed for microk8s

if

cat <<EOF | kind create cluster --config=- --image=your-image-with-zfs-binaries
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".containerd]
    snapshotter = "zfs"
EOF

works, then we can look at automating this, (pending also if it's OK to ship the ZFS CLIs without regard to the host)

so setting the snapshotter to zfs doesn't work (it complains about a missing metadata.db when actually trying to take snapshots). however, I noticed this from the microk8s project: https://github.com/ubuntu/microk8s/commit/a5ec1f9540dbc6500e39dbdf30c79027f8e99239#diff-e263cbd0de8da1f880f701684ae8b035R35-R36

and sure enough,
```sh
cat < kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:

  • |-
    [plugins."io.containerd.grpc.v1.cri".containerd]
    snapshotter = "native"
    EOF
    ```
    works without modifying the base image.

thanks!

the "native" snapshotter used to be called "naive" and isn't really meant to be used beyond simple testing IIRC, but that's probably an OK fallback on ZFS at least.

so setting the snapshotter to zfs doesn't work (it complains about a missing metadata.db when actually trying to take snapshots).

this is while using a modified base image w/ the ZFS CLI installed?

this is while using a modified base image w/ the ZFS CLI installed?

yep. there may be some mounts from the host missing? I didn't have time to dig into it further.

Thank you, I think we can automate the fallback to native snapshot driver
in ZFS similar to microk8s pretty easily

/assign

On Tue, Jul 14, 2020, 18:37 Cassandra Comar notifications@github.com
wrote:

this is while using a modified base image w/ the ZFS CLI installed?

yep. there may be some mounts from the host missing? I didn't have time to
dig into it further.

โ€”
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1719#issuecomment-658495214,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK5ADQNTOQNW4DH6EQLR3UB6FANCNFSM4OVTXEMA
.

/lifecycle active

The "native" snapshotter fallback is working only for kubernetes versions from 1.15.11 on:

$ cat /tmp/kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
 [plugins."io.containerd.grpc.v1.cri".containerd]
 snapshotter = "native"
networking:
  apiServerAddress: "0.0.0.0"
nodes:
- role: control-plane
$ kind create cluster --name kind --config /tmp/kind-config.yaml --image kindest/node:v1.14.10@sha256:6cd43ff41ae9f02bb46c8f455d5323819aec858b99534a290517ebc181b443c6
Creating cluster "kind" ...
 โœ“ Ensuring node image (kindest/node:v1.14.10) ๐Ÿ–ผ 
 โœ“ Preparing nodes ๐Ÿ“ฆ  
 โœ“ Writing configuration ๐Ÿ“œ 
 โœ— Starting control-plane ๐Ÿ•น๏ธ 
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

Here is the command output: kind-create-cluster-1.14.10-on-zfs.txt

I would like to know if that is because version of kubernetes older than 1.15.11 does not support working on top of ZFS or there is something that can be done to make it work.

Environment:

  • kind version: (use kind version):
    kind v0.8.1 go1.14.2 linux/amd64

  • Kubernetes version: (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T14:19:25Z", GoVersion:"go1.13.13", Compiler:"gc", Platform:"linux/amd64"}

  • Docker version: (use docker info):
    Server Version: 19.03.12

  • OS (e.g. from /etc/os-release): Ubuntu Linux
    Ubuntu 20.04.1 LTS

@teoincontatto if you run with --retain and then kind export logs I'd be happy to take a look but I have no idea what changed wrt this in 1.14.10 => 1.15.11. I'm not going to have time to do this myself.

I can't actually verify this myself at the moment but a fix based on this thread should be in v0.9.0 (later today?)

@BenTheElder was hoping to test 0.9.0 over the weekend, any idea if that's going to be published soon? I might have some time this week to do a test.

the last fix PR is out right now (WIP) so probably tomorrow. took al
little longer than expected.

On Tue, Sep 1, 2020 at 1:15 AM johnlane notifications@github.com wrote:

@BenTheElder https://github.com/BenTheElder was hoping to test 0.9.0
over the weekend, any idea if that's going to be published soon? I might
have some time this week to do a test.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1719#issuecomment-684548136,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKYBUHOELSEIFFHIRPDSDSUT3ANCNFSM4OVTXEMA
.

it's relatively safe to go ahead and try from HEAD,
kind.sigs.k8s.io/dl/latest/kind-linux-amd64 has prebuild nightly binaries
(not intended for stable third party use, but kubernetes CI internally uses
this), or you can make build (~zero dependencies) and use bin/kind from
a clone.

On Tue, Sep 1, 2020 at 1:20 AM Benjamin Elder bentheelder@google.com
wrote:

the last fix PR is out right now (WIP) so probably tomorrow. took al
little longer than expected.

On Tue, Sep 1, 2020 at 1:15 AM johnlane notifications@github.com wrote:

@BenTheElder https://github.com/BenTheElder was hoping to test 0.9.0
over the weekend, any idea if that's going to be published soon? I might
have some time this week to do a test.

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1719#issuecomment-684548136,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKYBUHOELSEIFFHIRPDSDSUT3ANCNFSM4OVTXEMA
.

I finally got around to being able to test this. Using the above config snippet, it worked for me. I haven't gone further than firing it up and running a nginx hello example but it appears to work.

Thanks!
This should be automatic in 0.9.0+ now ๐Ÿ‘

~Could this be pushed to the gcloud sdk version as well?~

I still have to add the containerdConfigPatches or kind fails to start with v0.9.0.

ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged cubs-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

Syslog has a bunch of errors like below when starting without the patch:

[ 5192.792035] overlayfs: filesystem on '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/36/fs' not supported as upperdir
[ 5200.782779] overlayfs: filesystem on '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/39/fs' not supported as upperdir
[ 5204.773607] overlayfs: filesystem on '/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/40/fs' not supported as upperdir

I have the kind export logs if necessary.

Was this page helpful?
0 / 5 - 0 ratings