Cheers once again,
Reading through the documentation (z2jh as well as the jupyterhub readthedocs itself), I could not really find what role the proxy has when it comes to outgoing connections from the hub and single user servers.
So all incoming connections from the user to access its single user server are routed over the jupyterhub proxy.
But are outgoing connections from the single user server just going from the container directly to the destination? I thought that they would be routed via the jupyterhub proxy too, making the whole jupyterhub "cluster" interfaced by the proxy. But I cannot find a reference for which of the two possibilities is the case.
Also the jupyterhub documentation states that per default hub and single user servers only communicate via localhost.
Maybe a section stating that on a z2jh setup the hub and single user servers are communicating via HTTPS, since the kubernetes environment lets its nodes communicate via HTTPS (if configured correctly) if I am not mistaken, would also be good for people just digging into the topic of jupyterhub and kubernetes.
Thanks for reading & Happy about every feedback!
I'm clueless and now also very curious about the answer to this :D
Presumably the jupyterhub-proxy only handles inbound connections and the corresponding response traffic, and all pods are able to make independent outbound connections routed using the Kubernetes network plugin, so from the point of view of an external server the source IP would be one of the Kubernetes nodes. That's why you can for example limit outbound network connections from singleuser-servers using a network policy on the singleuser-servers only.
If pod-pod connections aren't encrypted by the process running in the pod then it depends on the K8s network plugin, some can encrypt everything, many don't.
This might indeed be worth a section in the documentation I guess. I would have never stumbled upon that information (and still am unsure what the kubernetes network plugin exactly is, but will investigate).
@manics summary is correct, or at the very least is also how I understand things. There has been recent work on JupyterHub itself to enable encrypted connections between user's servers and the hub (and the proxy) but I've not used that in practice yet.
@dkipping do you want to make a PR adding a section about the normal traffic between hub, user servers and the proxy? It might be worth checking the JupyterHub docs as they might already explain this. In that case linking to them might be the best thing to do.
I could not really find what role the proxy has when it comes to outgoing connections from the hub and single user servers.
None at all. The proxy is only for incoming traffic.
Also the jupyterhub documentation states that per default hub and single user servers only communicate via localhost.
This is the default for jupyterhub, but not a requirement (and not the case with kubernetes).
I think an overview of networking probably mostly belongs in an 'architecture diagram' in the jupyterhub docs. This would have helped put together our network policy that's currently in this repo!
Here's a quick sketch of who talks directly to whom and why in jupyterhub:
user-outgoing must include the hub, but typically includes the world if you want user code to be able to access public resources, etc. It doesn't need to, though, and can be limited to varying degrees via dns/cidr whitelists, SNI-validating proxies, etc. All of these are up to your deployment, though, and not part of this helm chart.
The same goes for encryption. That's maybe a kubernetes-level question ("I want to encrypt internal communication on my cluster") because jupyterhub itself uses HTTP. JupyterHub 1.0 implements support for internal TLS, where all internal communication is encrypted, but this takes some doing to set up. After JupyterHub 1.0 is released, we may want to enable internal TLS by default in this chart.
@minrk thank you very much for this elaboration! Also very good to know, that you are already planning on HTTPS in the hub for the future. I guess securing the kubernetes cluster with HTTPS between the kubelets should be enough for the moment being, as this already secures the jhub communication to everything outside of the cluster (having https from the hub itself and enabled per default would of course make setup easier and secure the jhub better within the cluster).
@betatim I can do this, especially after the explanation by min.
So I guess, the architecture diagram going into the jupyterhub documentation directly would be under technical reference, maybe as a section under the subsystems and in the z2jh documentation maybe a reference there with the explanation, that on kubernetes the kubelets are talking to each other via HTTP(S) and currently have to be configured and secured on kubernetes level.
I do not know if foreshadowing that HTTPS with jupyterhub 1.0 is planned would be good in the docs, I guess I'll leave that out for now.
I've added a note about providing an overview of the networking within another issue that reference this, and will close this in favor of that.
Most helpful comment
Presumably the jupyterhub-proxy only handles inbound connections and the corresponding response traffic, and all pods are able to make independent outbound connections routed using the Kubernetes network plugin, so from the point of view of an external server the source IP would be one of the Kubernetes nodes. That's why you can for example limit outbound network connections from singleuser-servers using a network policy on the singleuser-servers only.
If pod-pod connections aren't encrypted by the process running in the pod then it depends on the K8s network plugin, some can encrypt everything, many don't.