Binderhub: use docker-in-docker to get docker builds to run in pods

Created on 7 Dec 2017  路  5Comments  路  Source: jupyterhub/binderhub

Right now, we are running docker builds by simply mounting the docker socket into a container. Would using docker-in-docker (dind) allow us to run builds inside a pod? If so, this could give us access to init containers and other restrictions we are applying to user pods on builds. This could be deployment-specific, but I imagine some amount of support would be needed here.

I have no idea if this would work, but it's an idea!

architecture enhancement

Most helpful comment

ok, I have this working now!

  1. Set up a daemonset of DIND:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: dind
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      name:  dind
  template:
    metadata:
      labels:
        name: dind
    spec:
      containers:
      - name: dind
        image: docker:dind
        args:
          - dockerd
          - --storage-driver=overlay2
          - -H unix:///var/run/dind/docker.sock
        securityContext:
          privileged: true
        volumeMounts:
        - name: varlibdocker
          mountPath: /var/lib/docker
        - name: rundind
          mountPath: /var/run/dind
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlibdocker
        emptyDir: {}
      - name: rundind
        hostPath:
          path: /var/run/dind/
  1. Cherry pick #319
  2. Set c.BinderHub.docker_api_url = "/var/run/dind/docker.sock"
  3. Run BinderHub and check it out!

Note that if you're using the debug mode where you aren't using a registry, hub will fail to launch since kubernetes won't find the image - locally, these two docker repos have entirely different image caches.

All 5 comments

Hah, I've been thinking about this too!

The biggest disadvantage I could find for DIND was that we would not retain layer cache across builds. But then I realized that's only true if we used a sidecar DIND per build, but since we're on k8s we don't have to do that!

So instead I've been playing with running an additional docker daemon per host as a daemonset. We can use an emptydir volume to mount /var/lib/docker and a special hostPath to provide /var/run/docker.sock. This would allow repo2docker to easily connect to this new docker daemon, This gives us serveral advantages:

  1. Separates the docker daemon used to run k8s pods from the daemon used to build images, adding a layer of security + stability
  2. Allows us to use initcontainers and what not to restrict network usage of the build pods. This is great!
  3. We are no longer restricted to the docker version provided by GKE, which is always going to lag the latest. We can just ship a DIND daemonset in the BinderHub chart, which lets us control what version of docker is shipping easily.
  4. We can simply restart the daemonset pods every so often, which clears out the cache and possibly any bugs. This is much simpler than recycling nodes!

To determine:

  1. Is there any performance difference between building on the host vs building inside the daemonset pod?
  2. How do we set up memory requests here to prevent overloading of any given host?
  3. How can we easily collect metrics from this extra docker daemon?
  4. Does running two daemons fuck up the host?

But so far it looks very promising!

ok, I have this working now!

  1. Set up a daemonset of DIND:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: dind
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      name:  dind
  template:
    metadata:
      labels:
        name: dind
    spec:
      containers:
      - name: dind
        image: docker:dind
        args:
          - dockerd
          - --storage-driver=overlay2
          - -H unix:///var/run/dind/docker.sock
        securityContext:
          privileged: true
        volumeMounts:
        - name: varlibdocker
          mountPath: /var/lib/docker
        - name: rundind
          mountPath: /var/run/dind
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlibdocker
        emptyDir: {}
      - name: rundind
        hostPath:
          path: /var/run/dind/
  1. Cherry pick #319
  2. Set c.BinderHub.docker_api_url = "/var/run/dind/docker.sock"
  3. Run BinderHub and check it out!

Note that if you're using the debug mode where you aren't using a registry, hub will fail to launch since kubernetes won't find the image - locally, these two docker repos have entirely different image caches.

I believe this will also help us get rid of the 'disk space full' errors, since restarting the DIND pods will get rid of all state!

This was implemented with #319. It hasn't been turned on yet for mybinder.org however.

We do this now, and @minrk has turned it on for mybinder.org!

Was this page helpful?
0 / 5 - 0 ratings