Zero-to-jupyterhub-k8s: Support ingress-based proxy implementation

Created on 6 Jun 2018  Â·  6Comments  Â·  Source: jupyterhub/zero-to-jupyterhub-k8s

KubeSpawner has an implementation of the kubernetes proxy using ingress. It's probably appropriate to support choosing this proxy implementation (and thus disabling the proxy pod and service) in the helm chart.

enhancement

Most helpful comment

I've mixed feelings about that proxy implementation :) It has too many layers of abstraction in there - for each pod we create an endpoints object, a service object & an ingress object. These all happen asynchronously, and we have to wait for them to propagate. Each service object also adds a few iptables rules on each node, which I'm not happy about. You also get wildly different behaviors based on which ingress provider you are using, which also sucks... It definitely works and is more scalable than the current solution, but I'm weary of supporting it long term.

What I would like is:

  1. A proxy object that just puts an annotation / label on each pod with extra data we need
  2. We write a small shim for Envoy that watches all the pods, and routes based on the info in (1).

This gives us a large number of benefits:

  1. 0 extra kubernetes objects, so everything is much simpler
  2. The proxy implementation in JupyterHub is going to be very small & simple
  3. The Envoy shim / provider is generic, not tied to JupyterHub & easy to test
  4. You can run any number of envoy + shim proxies, scaling up & down as necessary.
  5. Pods & their routes do not go out of sync at all, since the route info is on the pod directly!

I'd rather do this and ship that as the default eventually than recommend the current ingress based solution.

All 6 comments

It sounds great to avoid having to schedule the proxy pod.

The autohttps deployment would still be needed for automatic setup with letsencrypt though right? kube-lego is involved here and I currently understand it as the successor cert-manager should replace it.

I've mixed feelings about that proxy implementation :) It has too many layers of abstraction in there - for each pod we create an endpoints object, a service object & an ingress object. These all happen asynchronously, and we have to wait for them to propagate. Each service object also adds a few iptables rules on each node, which I'm not happy about. You also get wildly different behaviors based on which ingress provider you are using, which also sucks... It definitely works and is more scalable than the current solution, but I'm weary of supporting it long term.

What I would like is:

  1. A proxy object that just puts an annotation / label on each pod with extra data we need
  2. We write a small shim for Envoy that watches all the pods, and routes based on the info in (1).

This gives us a large number of benefits:

  1. 0 extra kubernetes objects, so everything is much simpler
  2. The proxy implementation in JupyterHub is going to be very small & simple
  3. The Envoy shim / provider is generic, not tied to JupyterHub & easy to test
  4. You can run any number of envoy + shim proxies, scaling up & down as necessary.
  5. Pods & their routes do not go out of sync at all, since the route info is on the pod directly!

I'd rather do this and ship that as the default eventually than recommend the current ingress based solution.

Ah thanks for the summary @yuvipanda !!!

If we / someone want to secure the JupyterHub further, Istio may be good, and Istio will require Envoy as foundation. If some new tech is to be added, it feels good that it may be required for multiple purposes.

@minrk @yuvipanda I really appreciate being able to learn from you! :heart:

Related: KubeCon 2018 presentation about Envoy / Istio

IPVS-Based In-Cluster Service Load Balancing Graduates to General Availability

In this release, IPVS-based in-cluster service load balancing has moved to stable. IPVS (IP Virtual Server) provides high-performance in-kernel load balancing, with a simpler programming interface than iptables. This change delivers better network throughput, better programming latency, and higher scalability limits for the cluster-wide distributed load-balancer that comprises the Kubernetes Service model. IPVS is not yet the default but clusters can begin to use it for production traffic.


Is this relevant to consider? I'm lacking understanding in this domain currently.

related discussion about ambassador-based proxy approach: https://github.com/kubeflow/kubeflow/issues/239#issuecomment-394121234

Was this page helpful?
0 / 5 - 0 ratings