Zero-to-jupyterhub-k8s: Experiment and add pointers to storage options (NFS Ganesha, Rook)

Created on 15 May 2018  Â·  10Comments  Â·  Source: jupyterhub/zero-to-jupyterhub-k8s

It would be great to integrate a storage solution in the z2jh helm chart that does not lock in a certain cloud provider. The Rook project could allow us to do this!

Note that a very common challenge is to have storage for users that can be read/write-able by multiple users at the same time (ReadWriteMany), NFS could support that but perhaps CephFS implemented by Rook efficiently on the cluster without a cloud vendor lock in would be better?

Relevant presentations on KubeCon 2018

Recent relevant presentation on KubeCon 2018 December

/cc: @yuvipanda :heart: for introducing me to NFS options like Rook on Gitter before.

architecture

Most helpful comment

Advantage of NFS is it's relatively easy to setup, it's good enough for user storage, and you only need a single volume:

Main disadvantage of dynamic NFS provisioners is it's not recommended for sqlite so the Hub needs some other storage.

Another factor is whether you care about backups and data recovery. If you do you'll need to know what filesystems need to be backed up. For NFS it's relatively easy since everything is in one volume as long as you've got a copy of the Kubernetes PV/PVC objects. For distributed filesystems it's complicated.

All 10 comments

Thanks for the links, @consideRatio!

I think we should definitely expand our storage section to include options to Rook / NFS. I would prefer links to guides for Rook rather than having that content inline - supporting and running any Storage Solution is Extremely Serious Business that you shouldn't do unless you absolutely have to.

CephFS is Ceph + user space NFS (NFS Ganesha) so it isn't that much different than running NFS Ganesha on top of cloud provider storage - which I would recommend for its simplicity.

@yuvipanda ah excellent input, I'm really happy about being able to draw from your experience! I'm aiming to put in a lot of learning effort on this.

yj6nt

I am +1 to adding links to Z2JH!

@consideRatio Rook is definitely an interesting project, which I plan to eventually try out. Having said that, I would suggest to be careful with the idea of using Rook as a backend for Z2JH's persistent storage. Here are my rationale and related thoughts (take them with a grain of salt):

  • Unlike Kubernetes, which is clearly standard _de facto_, Rook is not a mainstream project
  • Even when/if Rook becomes mainstream (or, at least, popular enough), focusing on a single persistence project would, in my opinion, introduce a vendor lock-in type of project and deployment dependence
  • NFS support in Rook is a work in progress in the beginning phase and with no clear target time frame
  • Rook's current focus on Ceph as a storage backend introduces dependence on Ceph, which would result in increasing minimal and recommended resource requirements (that is, multiple nodes are needed) for Z2JH deployments; similar concern applies to CephFS- and GlusterFS-based K8s solutions
  • TL;DR: I suggest using K8s' NFS ReadWriteMany solution as standard Z2JH approach, while providing instructions on using other relevant solutions (CephFS, GlusterFS) for enabling Z2JH persistent storage

I think this is why it's a good idea to link out to those docs / projects, rather than to "officially" support them with our own instructions. We can give people general tips, but leave it at that.

I've learned some more now. I still lack a lot of knowledge about NFS solutions.

About storage types

  1. User storage should not be provided by _object storage_, but could be provided by _block storage_ (ReadWriteOnce) for private user storage or _file system storage_ (ReadWriteMany) for shared user storage.

About Rook

  1. Rook can provide us with a block storage _storageclass_, but does not yet support allowing us to utilize PVs referenced by PVCs as underlying storage.
  2. Rook can provide us with a file system storage storageclass supporting the access mode ReadWriteMany, but it isn't convenient.
  3. Rooks use of _erasure core storage_ can allow you to reduce the required amount of storage to 1.5x or 2x of the actually used storage (from 3x I think), but would also degrade performance and add complexity to the setup.

About NFS

  1. NFS is older but tested tech, most people seem to move away from it if possible.
  2. The following Helm chart exists to setup NFS from scratch.

    • NFS provisioner Helm chart: https://github.com/helm/charts/tree/master/stable/nfs-server-provisioner

    • WARNING: I don't think it supports multiple replicas to be run, as I think it would create new decoupled NFS servers but still use the same kubernetes Service resource to access them. My theory is that you could end up with different storage on restarts of a pod consuming it. This means that this solution will rely on a single pod to stay running as compared to ROOKs solution as far as I understand it.

  3. GCP has managed NFS:
  4. What to utilize?
  5. How to do it?

In reality, I investigate all this because I'm struggling to grasp if the following pain point...

  1. When your utilization % (% of total users active at any time) is very low, causing you to spend more on storage than compute.

... could be resolved by:

  • Rook's block storage...
  • Rook's shared file system storage...
  • Google's Filestore + nfs-client-provisioner...
  • @yuvipanda's nfs-flex-volume along with something more...
  • Google Filestore, ubuntu boot image, init containers - and this solution here by @yuvipanda: https://github.com/pangeo-data/dev.pangeo.io-deploy/issues/25

My single requirement is to only consume 1.23 GB of persistent storage for a user that has only written 1.23 GB of stuff to storage.

Advantage of NFS is it's relatively easy to setup, it's good enough for user storage, and you only need a single volume:

Main disadvantage of dynamic NFS provisioners is it's not recommended for sqlite so the Hub needs some other storage.

Another factor is whether you care about backups and data recovery. If you do you'll need to know what filesystems need to be backed up. For NFS it's relatively easy since everything is in one volume as long as you've got a copy of the Kubernetes PV/PVC objects. For distributed filesystems it's complicated.

Rook has progressed since this issue was first opened, I think it is not possible to use with PVCs and dynamic provisioning etc, which was one hurdle for me when I considered the use of Rook.

https://github.com/rook/rook/issues/2107

Closing this in favor of a summary referencing this issue.

Was this page helpful?
0 / 5 - 0 ratings