Zero-to-jupyterhub-k8s: Collecting and Grading Assignments on JupyterHub

Created on 12 Jun 2018  路  12Comments  路  Source: jupyterhub/zero-to-jupyterhub-k8s

Hello. I teach an introductory Python class at Berkeley, and I would very much like to use Jupyterhub for this fall's iteration. I've worked through the Zero to Data 8 Guide to the point that I have a bare-bones deployment running on Google Cloud. But I still don't understand how to distribute, collect, and grade assignments on the hub.

In the current design of the course, there are 12 homework assignments located on a private github repository. We use two methods of autograding. The first three assignments are graded by running custom python scripts (letting us check if students use input and print statements correctly). Some of the other assignments are graded with nbgrader.

On the distribution side, I've seen how to use interact links to pull notebooks, but I want to know if this can be done with a private repo.

For collection and grading, I'm wondering what the best practices are. Is nbgrader compatible with the zero-to-data-8 setup? Is there a manual way to collect submissions from student PVCs? I noticed that data 8 uses okpy for grading, and I'm wondering why this was the approach.

Thanks - any advice would be very appreciated!

Contents of config.yaml

hub:
  extraConfig: |
    c.JupyterHub.admin_access = True

proxy:
  secretToken: 
singleuser:
  image:
    name: berkeleydsep/datahub-user
    tag: 21be6ff
  memory:
    guarantee: 1G
    limit: 1G
  storage:
    capacity: 2Gi
auth:
  type: github
  github:
    clientId:
    clientSecret: 
    callbackUrl: 
    org_whitelist:
      - "MIDS-INFO-W18"
  whitelist:
    users:
      - 
  admin:
    users:
      - 

Most helpful comment

I should clarify that we used hostPath in combination with an NFS server running in the cluster. We did not use the default pv for a singleuser pod, i.e., one pv/pvc/disk per student. The nodes mounted an NFS export onto /data/homes and then kubernetes would attach /data/homes/{user} onto /home/jovyan within the pod. In addition, /data/homes/_nbgrader was mounted onto /srv/nbgrader. Using an NFS server allowed us to save a lot of cloud credits and reduced user server startup time. Disk attachment would cause user server startup to take around 15s while with hostPath mounting it was like 1-2s.

In Data8 and Data100, we did the NFS mounting with ansible -- after manual node provisioning and outside of kubernetes. In Data8x and in Data8 for this summer, we used an nfs-mounter container and a daemonset:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nfs-mounter
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 100%
  template:
    metadata:
      name: nfs-mounter
      labels:
        app: nfs-mounter
    spec:
      hostPID: true
      # Temporarily, I think in production we want to give this more time to exit!
      terminationGracePeriodSeconds: 0
      containers:
        - image: gcr.io/our-gke-project-name/mounter:v5
          name: nfs-mounter
          env:
          # These two variables changing will restart all the daemonset pods
          - name: FILESERVER
            value: "a.b.c.d"
          - name: MOUNT_PATH
            value: "/mnt/homes"
          securityContext:
            privileged: true
          workingDir: /srv/script
          securityContext:
              privileged: true

so that when a new node came up it would run the nfs-mounter pod and attach the NFS volume.

Since /srv/nbgrader was writeable, students could theoretically have altered it which was a concern, but students would have to know how to mess with the nbgrader shared directory in order to profit. It wasn't an issue for us. (we think)

We did use nbgrader for distribution.

With this and a config stanza like:

hub:
  extraConfigMap:
    volumes:
      homes:
        hostPath: "/data/homes"
        mountPath: "/srv/homes"
        users:
          - the.instructor
          - a.gsi

the instructor and a GSI would have the whole user tree, /data/homes, mounted onto /srv/homes within their pod. So this way all student files would be visible to them.

All 12 comments

On the distribution side, I've seen how to use interact links to pull notebooks, but I want to know if this can be done with a private repo.

nbgitpuller currently does not support private repositories.

Is nbgrader compatible with the zero-to-data-8 setup?

It isn't incompatible. nbgrader assumes a shared filesystem in places, where users can write to a shared volume. You'd need to attach this volume to user pods with a ReadWriteMany access mode. In Data100 this was accomplished with hostPath.

I noticed that data 8 uses okpy for grading, and I'm wondering why this was the approach.

One of Data 8's instructors created okpy for use in his CS classes and wanted to leverage that in DS.

Thanks Ryan, this is extremely helpful!!!

Can I see if I understand? It sounds like a student's PVC is not directly accessible by instructors at the file system level. One workaround is to create a separate PVC to hold the nbgrader exchange directory. I should create a file like

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nbgrader
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi

save that as disk.yaml and run kubectl apply -f disk.yaml. Then I add these lines to my helm chart:

singleuser:
  storage:
    type: hostPath
    extraVolumes:
      - name: nbgrader
        hostPath:
          path: /data/homes/_nbgrader
    extraVolumeMounts:
      - name: nbgrader
        mountPath: /srv/nbgrader

Rather than using the data 8 image, I should try the data100 image, since it includes nbgrader.

I have a few follow-up questions if you don't mind. I see that the data100 helm chart has two extra volumes, one called home and one called nbgrader. What is the purpose of the home volume?

Since the nbgrader volume is ReadWriteMany, can students read, change, or delete files copied in from other students? Are more steps needed to prevent this?

If I follow the steps above, can students use nbgrader's assignment list extension for distribution (allowing us to keep our assignments private?)

If students PVCs are not directly accessible by instructors, is there an alternative in which all student files are stored on a single PVC?

Thanks so much!

I should clarify that we used hostPath in combination with an NFS server running in the cluster. We did not use the default pv for a singleuser pod, i.e., one pv/pvc/disk per student. The nodes mounted an NFS export onto /data/homes and then kubernetes would attach /data/homes/{user} onto /home/jovyan within the pod. In addition, /data/homes/_nbgrader was mounted onto /srv/nbgrader. Using an NFS server allowed us to save a lot of cloud credits and reduced user server startup time. Disk attachment would cause user server startup to take around 15s while with hostPath mounting it was like 1-2s.

In Data8 and Data100, we did the NFS mounting with ansible -- after manual node provisioning and outside of kubernetes. In Data8x and in Data8 for this summer, we used an nfs-mounter container and a daemonset:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nfs-mounter
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 100%
  template:
    metadata:
      name: nfs-mounter
      labels:
        app: nfs-mounter
    spec:
      hostPID: true
      # Temporarily, I think in production we want to give this more time to exit!
      terminationGracePeriodSeconds: 0
      containers:
        - image: gcr.io/our-gke-project-name/mounter:v5
          name: nfs-mounter
          env:
          # These two variables changing will restart all the daemonset pods
          - name: FILESERVER
            value: "a.b.c.d"
          - name: MOUNT_PATH
            value: "/mnt/homes"
          securityContext:
            privileged: true
          workingDir: /srv/script
          securityContext:
              privileged: true

so that when a new node came up it would run the nfs-mounter pod and attach the NFS volume.

Since /srv/nbgrader was writeable, students could theoretically have altered it which was a concern, but students would have to know how to mess with the nbgrader shared directory in order to profit. It wasn't an issue for us. (we think)

We did use nbgrader for distribution.

With this and a config stanza like:

hub:
  extraConfigMap:
    volumes:
      homes:
        hostPath: "/data/homes"
        mountPath: "/srv/homes"
        users:
          - the.instructor
          - a.gsi

the instructor and a GSI would have the whole user tree, /data/homes, mounted onto /srv/homes within their pod. So this way all student files would be visible to them.

@ryanlovett :heart: this summary!

@paul-laskowski note that very few storage options support ReadWriteMany, NFS does though.

Wow, thanks for all this great info Ryan! Thanks for the contextual note @consideRatio!

I'll try to work on the solution you've outlined - makes sense for me to follow in your footsteps!

@paul-laskowski this one is still a moving target for everybody :-) I'm hoping to add some kind of suggestion to the Z2D8 guide soon...would be great to hear from others what has worked / hasn't / etc

Hi Ryan and everyone, I鈥檓 writing with an update. First, I picked up a copy of Kubernetes in Action by Marko Luksa and read half of it on my vacation. I really recommend this to other beginners since I feel like my comprehension has improved a lot.

After some thinking, I鈥檝e decided that it makes more sense for students to push and pull assignments from Github for submission. This matches other versions of the course we鈥檙e teaching and will let us reuse a lot of material - it may not be the best decision for other intro courses.

One part of this I鈥檓 still struggling with. For part of our course, students will need to find their own dataset for analysis. How can they transfer files to their volumes in the cloud? Is there a way for them to sftp into their pod?

How can they transfer files to their volumes in the cloud? Is there a way for them to sftp into their pod?

Not at the moment. I am trying to solve this for a course as well and thinking that the first piece of advice for students will be to find the URL of the dataset and include downloading it as part of their work (wget ,curl, urllib, ...). This won't work for all datasets as some need you to login or click buttons on a website. For those cases I have https://docs.syncthing.net/ on my list of tools to experiment with.

Aside: we are also working on a notebook authoring, distribution, handing in, autograding setup that works well on a z2jh cluster running on k8s. We have an authoring tool based on nbclean, distribution by ngbitpuller, handing in (small custom hub service, TBD), auto grading (custom based on https://github.com/data-8/gradememaybe, TBD). I will post some updates once we have something for each step.

Thanks @betatim - this is great info! As an instructor, can I just use kubectl cp to manually get students the files they want?

I'm definitely very interested in the setup you're working on! Will look forward to learning more!

can I just use kubectl cp to manually get students the files they want?

Should work, if(!) all the student containers are running.

Thanks Tim, that makes sense!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jgerardsimcock picture jgerardsimcock  路  4Comments

aurashn picture aurashn  路  4Comments

consideRatio picture consideRatio  路  3Comments

consideRatio picture consideRatio  路  4Comments

consideRatio picture consideRatio  路  3Comments