I want to use Spark on top of Mesos with this pyspark notebook but gives me this error:
Failed to connect to /172.17.0.7:55575
I set the properties that README says to must be set:
`# Spawn user containers from this image
c.DockerSpawner.container_image = 'jupyter/all-spark-notebook'
c.DockerSpawner.extra_create_kwargs.update({
'command': '/usr/local/bin/start-singleuser.sh'
})
`
But when it spawns a container to the specific user the container runs in bridge mode. I want to change that to host. With that host mode it will try to connect to the right ip address.
Any ideas to do that?
/cc @jtyberg
Our README covers how to configure a single jupyter/all-spark-notebook container with the options needed to make SparkContext work with a Mesos-managed cluster of Spark executors here: https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook#connecting-to-a-spark-cluster-on-mesos
I imagine you need to get the JupyterHub DockerSpawner to configure the same options like host networking and host pids. But I don't know if you can do that today with the DockerSpawner.
@minrk might have some advice also.
If you add:
c.DockerSpawner.extra_host_config = { 'network_mode': 'host' }
it should start with host networking.
Yes, { 'network_mode': 'host' } will work, to a degree.
DockerSpawner creates and starts the containers, then it tries to get the port from the container config. In cases where it cannot determine the Notebook server port, it assumes 8888.
https://github.com/jupyterhub/dockerspawner/blob/master/dockerspawner/dockerspawner.py#L430
Also, the singleuser.sh script that launches the Notebook server hard codes the port, assuming it's 8888 inside a container.
https://github.com/jupyterhub/dockerspawner/blob/master/singleuser/singleuser.sh#L11
So you could actually spawn the first Notebook container using port 8888 on the host, but the second container will fail to start due to the port conflict (8888 already taken). This is because the jupyterhub-singleuser script that spawns the Notebook server itself for use with JupyterHub has the port retry disabled.
https://github.com/jupyterhub/jupyterhub/blob/master/scripts/jupyterhub-singleuser#L178
Clearly, I've been playing around with this. I got something working by assigning random ports in DockerSpawner and modifying the singleuser.sh script to pass a the port assignment along. I'm not sure this is the best solution though.
Using docker links may be preferable to host networking.
Unfortunately, Spark networking requires all executor nodes to be able to connect back to the driver, which in this case is the kernel running in the notebook container. And, typically, in a Mesos cluster, all the executors are distributed across many host nodes which Docker link doesn't support (except maybe in Swarm but deploying Mesos + Spark on Swarm makes little sense).
EDIT: Spark networking requirements link: http://spark.apache.org/docs/latest/cluster-overview.html#components
I see. In that case, when using host-networking with docker-spawner, the Spawner picking the port prior to launching probably makes the most sense, as is done in the LocalSpawner case. This can be added to the launch command with '--port=%i' % port.
I got the random port idea from the default LocalSpawner. However, I'm not convinced that having JupyterHub assign the ports is the path forward. It's one thing to assign ports if we assume JupyterHub spawns multiple notebook servers on a single host, but if DockerSpawner's DOCKER_HOST is pointed at a Swarm cluster, it is now assigning ports for all the cluster nodes, and who knows what else might be running on them.
I don't think we can assume a single-host JupyterHub deployment when Spark gets thrown into the mix. Multiple users running multiple Spark drivers (notebook kernels) on the same host, all serializing who knows what back-and-forth from potentially multiple Spark clusters? Eek. I would opt to deploy notebook servers to some cluster scheduler, and let it handle port assignment.
We've actually deployed notebook servers to Mesos using Marathon, which has it's own way of spawning containers and handling ports. We were not using JupyterHub, but if we were, we probably would have come up with a custom spawner.
@zbence @minrk is correct, setting the following in jupyterhub_config.py will spawn a single notebook server on port 8888 on the host, but I don't think you'll get farther than that. Any more than one would result in port conflicts unless you use a custom spawner or some other way to manage port assignments.
c.DockerSpawner.extra_host_config = { 'network_mode': 'host' }
c.DockerSpawner.extra_start_kwargs = { 'network_mode': 'host' }
Hi!
Thanks, you guys very helpful. I could manage to spawn a container with host network mode. I used these additional settings:
c.DockerSpawner.extra_host_config = { 'network_mode': 'host' }
c.DockerSpawner.extra_start_kwargs = { 'network_mode': 'host' }
c.DockerSpawner.use_internal_ip = True
c.DockerSpawner.network_name = 'host'
For the second user I cant spawn a new container because of this error:
ERROR: the notebook server could not be started because no available port could be found.
So I thinking about what if my router can assign a new ip address for each newly spawned container. These ip dont have to be static just normal visible ip addresses on my local network.
So every new user has a different ip address and the port remain the default 8888.
Can I do that dhcp ip assign to docker container when I spawn a new container?
Can I do that dhcp ip assign to docker container when I spawn a new container?
Every Docker container gets its own private IP on the Docker bridge network. I don't know of a way to bridge those containers to the public interface so that they can get an IP address from your router.
I'm going to close this issue since it has been inactive for a few months. The answer given in the comments is to use a custom spawner. (I think with Jupyter Hub 0.7, spawners may also pick their own ports and report them back to the hub instead of vice versa.)
for anyone who finds this through googling the error messages I was seeing when trying to get host networking to work in DockerSpawner, the magic combination of options I needed was:
c.DockerSpawner.extra_host_config = {'network_mode': 'host'}
c.DockerSpawner.use_internal_ip = True
c.DockerSpawner.network_name = 'host'
Extra host config to tell it to use the host, use_internal_ip so it would inform jupyterhub which port to use, and network_name so it doesn't try to default to 'bridge' after setting use_internal_ip.
I also had to create a custom spawner, something roughly like this, to solve the re-using ports issue others brought up above.
from jupyterhub.utils import random_port
from tornado import gen
class custom_spawner(DockerSpawner):
@gen.coroutine
def get_ip_and_port(self):
return self.container_ip, self.container_port
@gen.coroutine
def start(self, *args, **kwargs):
self.container_port = random_port()
spawn_cmd = "sh /srv/singleuser/singleuser.sh --port={}".format(self.container_port)
self.extra_create_kwargs.update({"command": spawn_cmd})
# start the container
ret = yield DockerSpawner.start(self, *args, **kwargs)
return ret
I see. In that case, when using host-networking with docker-spawner, the Spawner picking the port prior to launching probably makes the most sense, as is done in the LocalSpawner case. This can be added to the launch command with
'--port=%i' % port.
@minrk Do you know how we can add this config to config.yaml for kubernetes?
An alternate solution below,
from jupyterhub.utils import random_port
from dockerspawner import DockerSpawner
class custom_spawner(DockerSpawner):
@property
def internal_hostname(self):
# Set FQDN or localhost
return 'localhost'
def _port_default(self):
#Do NOT set c.DockerSpawner.port in the config file
return random_port()
c.JupyterHub.spawner_class = custom_spawner
c.DockerSpawner.extra_host_config = {'network_mode': 'host'}
c.DockerSpawner.use_internal_hostname = True
# Set host_ip to the public ip say when running Spark
c.DockerSpawner.host_ip = 'c.c.c.c'
Most helpful comment
I also had to create a custom spawner, something roughly like this, to solve the re-using ports issue others brought up above.