Kind: Enable Simulation of automatically provisioned ReadWriteMany PVs

Created on 17 Apr 2020 · 24Comments · Source: kubernetes-sigs/kind

What would you like to be added: A method to provide automatically provisioned ReadeWriteMany PVs that are available on all workers.

Currently the storage provisioner that is being used can only provision

Why is this needed: The current volume provisioner that is being only supports creating ReadWriteOnce volumes. This is because kind is using the rancher local-path-provisioner and they hard code their provisioner to disallow any PVCs with an access mode other than ReadWriteOnce. Many managed kubernetes providers supply some type of distributed file system. I'm currently using Azure Storage File (which is SMB/cifs under the hood) for this use case in production. Google's Kubernetes Engine offers ReadOnlyMany out of the box.

Possible solutions: Could we have the control plane node start up an NFS container backed by a ReadWriteOnce?

Thanks for your time!

kinfeature lifecyclstale prioritbacklog

Source

joshatcaper

👍1

Most helpful comment

Just did a verification of this feature.

I first made sure kubernetes was cloned to ${GOPATH}/src/k8s.io/kubernetes as described in https://kind.sigs.k8s.io/docs/user/working-offline/#prepare-kubernetes-source-code

I then built my own node-image using the latest base-image with nfs-common via the following (takes a while!)

kind build node-image --image kindest/node:master --base-image kindest/base:v20200610-99eb0617 --kube-root "${GOPATH}/src/k8s.io/kubernetes"

Next i created a cluster using the new node-image via

kind create cluster --config kind-config.yaml

Using the following kind-config.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:master

I then pulled and loaded the nfs-provisioner image to prepare for installation

docker pull quay.io/kubernetes_incubator/nfs-provisioner
kind  load docker-image quay.io/kubernetes_incubator/nfs-provisioner

The provisioner could then be installed via Helm (Helm was installed separately).

helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm install nfs-provisioner stable/nfs-server-provisioner

And I was then finally able to to provision a NFS volume via the following PVC

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-dynamic-volume-claim
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi

Everything worked like a charm - looking forward to the next Kind release :)

danquah on 16 Jun 2020

🎉5

All 24 comments

NFS from an overlayfs requires a 4.15+ kernel IIRC.
Currently kind imposes no additional requirements on kernel version beyond what kubernetes does upstream.

I don't think we want to start imposing any kernel requirement yet, or the overhead of running & managing NFS by default.

kind of course supports installing additional drivers, preferably with CSI.

IMHO it makes more sense to run this as an addon. cc @msau42 @pohly.

We can discuss other Read* modes upstream in the rancher project.

BenTheElder on 17 Apr 2020

Ah, I hadn't had a need for RWM. https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

even ReadOnlyMany is going to require some kind of network attached storage or something, since the "many" is nodes not pods (my mistake)

I don't think rancher / local storage is going to do read across nodes :sweat_smile:

probably the best solution here is to document some yaml to apply for getting an NFS provisioner installed on top of a standard kind cluster.

BenTheElder on 17 Apr 2020

@BenTheElder ah, didn't know NFS required a newer kernel in this instance. Would it be possible to do something similar with docker volumes instead? The following docker-compose example should back the containers with a shared volume that is consistent-ish:

version: "2.3"
services:
  control-plane0:
    image: k8s.gcr.io/pause
    volumes:
      - rwmpvc:/rwmpvc
  worker0:
    image: k8s.gcr.io/pause
    volumes:
      - rwmpvc:/rwmpvc
  worker1:
    image: k8s.gcr.io/pause
    volumes:
      - rwmpvc:/rwmpvc

volumes:
  rwmpvc:

The ouput of docker-compose up && docker container inspect <container> will show:

        ...
        "Mounts": [
            {
                "Type": "volume",
                "Name": "test_rwmpvc",
                "Source": "/var/lib/docker/volumes/test_rwmpvc/_data",
                "Destination": "/rwmpvc",
                "Driver": "local",
                "Mode": "rw",
                "RW": true,
                "Propagation": ""
            }
        ],
        ...

Using an approach like this would not require any NFS server to be run internally in the containers. The PV provisioner just needs to consistently derive the host path in a similar way like /rwmpvc/<uuid> on each host.

joshatcaper on 19 Apr 2020

This will work in backends where the nodes are all on a single machine (which we may not guarantee in the future) IF we write a custom provisioner.

IMHO it's better to just provide an opt-in NFS solution you can deploy and document it.

It should just be a kubectl apply away from installing an NFS provisioner as long as you have an updated kernel.

BenTheElder on 19 Apr 2020

Agree, I think a opt-in NFS tutorial would be the best option here for users that need it.

We don't have any great options from sig-storage perspective, most solutions already assume you have an nfs server setup somewhere.

nfs external provisioner: this repo is deprecated and in the process of being migrated to its own repo. This uses ganesha to provision nfs servers, but still requires some sort of stable disk to back it.
nfs-client external provisioner: this repo is deprecated and in the process of being migrated to its own repo. This takes an existing nfs share and carves out subdirectories from it as PVs.
nfs csi driver: currently does not support dynamic provisioning, but there are plans to add in an nfs-client-like provisioner in the near future. can potentially add snapshots support in the future too.

msau42 on 21 Apr 2020

I don't know if this is possible but would there be some way to abstract this from the end user using some method of packaging and enabling "addons" similar to minikube? I don't know about the long term goals of kind but from an outsiders perspective it seems like a wonderful way to deploy an ephemeral copy of software in a CI stage. I was investigating it as a method to run some end-to-end integration testing on my company's software. I'd really like it if the configurations I end up applying to the created cluster very closely match what I'd push to a real cluster otherwise I'd be worried about running into the same issues you hit when you build a "dev" and "production" version of a binary and only test against your "dev" builds, never your production build.

I don't know if addons are a clean way of accomplishing this goal but I think the utility of kind for the in-CI-deployment workflow would greatly be helped by something that completely hides that this isn't a real managed kube cluster from the end user. Obviously, though, having some way to do this is better than having no way of doing this.

Interested in your thoughts.

joshatcaper on 21 Apr 2020

I don't know if this is possible but would there be some way to abstract this from the end user using some method of packaging and enabling "addons" similar to minikube?

Hi, regarding addons: we're not bundling addons at this time.

That approach tends to be problematic for users as it couples the lifecycle of the addons to the version of the cluster tool.

SIG Cluster Lifecycle seems to agree and the future of addon work there seems to be the cluster addons project, which involves a generic system on top of any cluster. We're tracking that work and happy to integrate when it's ready https://github.com/kubernetes-sigs/kind/issues/253

In the meantime addons tend to not be any different from any other cluster workload, they can be managed with kubectl, helm, kustomize, kpt, etc.

For an example of a more involved "addon" that isn't actually bundled with kind config dependencies see https://kind.sigs.k8s.io/docs/user/ingress/

I don't know about the long term goals of kind but from an outsiders perspective it seems like a wonderful way to deploy an ephemeral copy of software in a CI stage.

This gives a rough idea where our priorities are at, which do include supporting this more or less
https://kind.sigs.k8s.io/docs/contributing/project-scope/

I was investigating it as a method to run some end-to-end integration testing on my company's software. I'd really like it if the configurations I end up applying to the created cluster very closely match what I'd push to a real cluster otherwise I'd be worried about running into the same issues you hit when you build a "dev" and "production" version of a binary and only test against your "dev" builds, never your production build.

We have a KubeCon talk about this: https://kind.sigs.k8s.io/docs/user/resources/#testing-your-k8s-apps-with-kind--benjamin-elder--james-munnelly

I don't know if addons are a clean way of accomplishing this goal but I think the utility of kind for the in-CI-deployment workflow would greatly be helped by something that completely hides that this isn't a real managed kube cluster from the end user. Obviously, though, having some way to do this is better than having no way of doing this.

Clusters have a standard API in KUBECONFIG and the API endpoint.
Unfortunately for portability reasons we can't quite hide that this isn't the same as your real cluster, a lot of extension points break down here including but not limited to:

ingress
loadbalancer
storage classes (nonstandard ones in your prod environment, k8s really only has default as something of a standard)

For these you'll want to provide your own wrapper of some sort to ensure that the kind cluster matches your prod more closely (e.g. mimicking the custom storage classes from your prod cluster, trying to run a similar or the same ingress..)

BenTheElder on 22 Apr 2020

👍1

nfs-common will be installed on the nodes going forward which should enable NFS volumes. you still need to run an NFS server somehow.

BenTheElder on 2 May 2020

(also confirmed that it works, the kubernetes NFS e2e tests pass)

BenTheElder on 2 May 2020

requires 4.16 kernel https://www.phoronix.com/scan.php?page=news_item&px=OverlayFS-NFS-Export-Linux-4.16

BenTheElder on 2 May 2020

Just did a verification of this feature.

I first made sure kubernetes was cloned to ${GOPATH}/src/k8s.io/kubernetes as described in https://kind.sigs.k8s.io/docs/user/working-offline/#prepare-kubernetes-source-code

I then built my own node-image using the latest base-image with nfs-common via the following (takes a while!)

kind build node-image --image kindest/node:master --base-image kindest/base:v20200610-99eb0617 --kube-root "${GOPATH}/src/k8s.io/kubernetes"

Next i created a cluster using the new node-image via

kind create cluster --config kind-config.yaml

Using the following kind-config.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:master

I then pulled and loaded the nfs-provisioner image to prepare for installation

docker pull quay.io/kubernetes_incubator/nfs-provisioner
kind  load docker-image quay.io/kubernetes_incubator/nfs-provisioner

The provisioner could then be installed via Helm (Helm was installed separately).

helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm install nfs-provisioner stable/nfs-server-provisioner

And I was then finally able to to provision a NFS volume via the following PVC

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-dynamic-volume-claim
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi

Everything worked like a charm - looking forward to the next Kind release :)

danquah on 16 Jun 2020

🎉5

Nice ! I am currently looking for this. When this will be released?

LordNoteworthy on 25 Jul 2020

0.9.0
delayed for various reasons. we'll re-evaluate and set a new target date
soon.

On Fri, Jul 24, 2020 at 5:17 PM Noteworthy notifications@github.com wrote:

Nice ! I am currently looking for this. When this will be released?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1487#issuecomment-663781048,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKYJIF6PC4KE2MGE2OLR5IP75ANCNFSM4MKMFTTQ
.

BenTheElder on 25 Jul 2020

👍3

@BenTheElder any updates on the new target date? Trying to determine whether to base some internal setup on our own build of kind or whether there will be a release in the near future we can use instead.

danquah on 5 Aug 2020

Sorry I missed this comment (sweeping issues now), v0.9.0 was re-scheduled to match k8s v1.19 but some last minute fixes are still pending so we didn't cut the release today (k8s did). I expect to have those merged by tomorrow.

BenTheElder on 27 Aug 2020

👍2

This is side note, but might be useful for someone. When I updated node image from 18.8 to 19.1 then NFS helm chart does not work properly: memory is filled up in few seconds. I investigated the problem and it seems rpc.statd from nfs-utils package is outdated and it is leaking memory.

koxu1996 on 17 Sep 2020

that's unfortunate. we're shipping the latest available in the distro at the moment (ubuntu 20.10), if it's fixed in ubuntu we'll pick it up in a future kind image.

BenTheElder on 17 Sep 2020

@BenTheElder Now I think it might be something different. That's how I reproduce issue:

$ kind create cluster --image [NODE_IMAGE]
$ helm install stable/nfs-server-provisioner --generate-name
# wait 30s until 100% memory is filled up

Issue is present when I use most recent node images:

kindest/node:~~v1.19.0~~v1.19.1 (98cf52888646)
kindest/node:v1.18.8 (f4bcc97a0ad6)

List of node images that works without problem:

kindest/node:v1.18.8 (I don't know digest, but it was version older than 4 days)
kindest/node:v1.18.6

Note: I tried building latest node image from _kind:v0.9.0_ sources and it works fine :confused:

koxu1996 on 18 Sep 2020

1.19.0 isn't a latest image (please see the kind release notes as usual) and all of the images that are current were built with the same version, there were no changed to the base image or node image build process between those builds and tagging the release.

BenTheElder on 18 Sep 2020

@BenTheElder Sorry, I pasted corrected digest 98cf52888646, but lower version - it should be latest v1.19.1:

$ docker pull kindest/node:v1.19.1
v1.19.1: Pulling from kindest/node
Digest: sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600
Status: Image is up to date for kindest/node:v1.19.1
docker.io/kindest/node:v1.19.1

So issue is present for latest node image. I am trying to track down what was changed during latest node images update.

koxu1996 on 18 Sep 2020

I' m almost sure is because of this
https://github.com/kubernetes-sigs/kind/pull/1799

but I keep thinking that is an nfs bug :smile:
https://github.com/kubernetes-sigs/kind/pull/760#issuecomment-519299299

@koxu1996 you should limit the filedescriptor at the OS level

aojea on 19 Sep 2020

@aojea Indeed, I bisected _KinD_ commits and this is the culprit: https://github.com/kubernetes-sigs/kind/commit/2f17d2532084a11472bb464ccdc1285caa7c4583.

I use Arch BTW :laughing: and kernel-limit of file descriptors is really high:

$ sudo sysctl -a | grep "fs.nr_open"
fs.nr_open = 1073841816

To workaround the _NFS issue_ you can change kernel-level limits, eg.

sudo sysctl -w fs.nr_open=1048576

or you could use custom node image.

Edit:

I asked _nfs-utils_ maintainer about this bug and got following reply:

This was fixed by the following libtirpc commit:

commit e7c34df8f57331063b9d795812c62cec3ddfbc17 (tag: libtirpc-1-2-7-rc3)
Author: Jaime Caamano Ruiz jcaamano@suse.com
Date: Tue Jun 16 13:00:52 2020 -0400
libtirpc: replace array with list for per-fd locks
Which is in the latest RC release libtirpc-1-2-7-rc4

koxu1996 on 19 Sep 2020

❤3

looks like libtirpc is not packaged yet. I'm not sure how we want to proceed here

BenTheElder on 25 Sep 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale