at Twitter, our data science and machine learning teams are attempting to package up Jupyter notebook as a self-contained pex for easier distribution and compatibility with our internal build and execution environments.
presently, attempting to create a new notebook while running jupyter notebook from a pex results in a failure to launch the kernel:
[omerta show]$ wget -q https://github.com/pantsbuild/pex/releases/download/v1.2.7/pex27
[omerta show]$ chmod 700 pex27 && ./pex27 --version
pex27 1.2.7
[omerta show]$ pex "ipython<6.0" jupyter -e notebook.notebookapp:main -o ./jupyter_notebook.pex
[omerta show]$ ./jupyter_notebook.pex
[I 16:51:46.583 NotebookApp] Serving notebooks from local directory: /private/tmp/show
[I 16:51:46.583 NotebookApp] 0 active kernels
[I 16:51:46.583 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=11cd08f0df7e40e7bbe0cbf4b9fcfef57ef975f375ea8142
[I 16:51:46.583 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:51:46.584 NotebookApp]
[I 16:51:49.448 NotebookApp] 302 GET / (::1) 0.48ms
[I 16:51:53.755 NotebookApp] Creating new notebook in
[I 16:51:54.315 NotebookApp] Kernel started: ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[W 16:51:54.332 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20170705165146 (::1) 12.14ms referer=http://localhost:8888/notebooks/Untitled.ipynb?kernel_name=python2
[I 16:51:57.316 NotebookApp] KernelRestarter: restarting kernel (1/5)
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[I 16:52:00.323 NotebookApp] KernelRestarter: restarting kernel (2/5)
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[I 16:52:03.329 NotebookApp] KernelRestarter: restarting kernel (3/5)
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[W 16:52:04.342 NotebookApp] Timeout waiting for kernel_info reply from ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb
[I 16:52:06.339 NotebookApp] KernelRestarter: restarting kernel (4/5)
WARNING:root:kernel ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb restarted
[E 16:52:06.339 NotebookApp] KernelRestarter: restart callback <bound method ZMQChannelsHandler.on_kernel_restarted of ZMQChannelsHandler(ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb)> failed
Traceback (most recent call last):
File "/Users/kwilson/.pex/install/jupyter_client-5.1.0-py2.py3-none-any.whl.f35d5547733e40a744cea53c79345f75f659643d/jupyter_client-5.1.0-py2.py3-none-any.whl/jupyter_client/restarter.py", line 81, in _fire_callbacks
callback()
File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 435, in on_kernel_restarted
self._send_status_message('restarting')
File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 431, in _send_status_message
self.write_message(json.dumps(msg, default=date_default))
File "/Users/kwilson/.pex/install/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl.5c5ad8a4cbaf171bde97e76048ae70bd52a42971/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl/tornado/websocket.py", line 249, in write_message
raise WebSocketClosedError()
WebSocketClosedError
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
[W 16:52:09.347 NotebookApp] KernelRestarter: restart failed
[W 16:52:09.347 NotebookApp] Kernel ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb died, removing from map.
ERROR:root:kernel ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb restarted failed!
[E 16:52:09.348 NotebookApp] KernelRestarter: dead callback <bound method ZMQChannelsHandler.on_restart_failed of ZMQChannelsHandler(ee79965c-40c8-4f2b-ab3b-0493f3bbe2cb)> failed
Traceback (most recent call last):
File "/Users/kwilson/.pex/install/jupyter_client-5.1.0-py2.py3-none-any.whl.f35d5547733e40a744cea53c79345f75f659643d/jupyter_client-5.1.0-py2.py3-none-any.whl/jupyter_client/restarter.py", line 81, in _fire_callbacks
callback()
File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 439, in on_restart_failed
self._send_status_message('dead')
File "/Users/kwilson/.pex/install/notebook-5.0.0-py2.py3-none-any.whl.6e81571f8e672c859f4e9d322ebd477865a3f9b3/notebook-5.0.0-py2.py3-none-any.whl/notebook/services/kernels/handlers.py", line 431, in _send_status_message
self.write_message(json.dumps(msg, default=date_default))
File "/Users/kwilson/.pex/install/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl.5c5ad8a4cbaf171bde97e76048ae70bd52a42971/tornado-4.5.1-cp27-cp27m-macosx_10_4_x86_64.whl/tornado/websocket.py", line 249, in write_message
raise WebSocketClosedError()
WebSocketClosedError
^C[I 16:52:10.041 NotebookApp] interrupted
Serving notebooks from local directory: /private/tmp/show
0 active kernels
The Jupyter Notebook is running at: http://localhost:8888/?token=11cd08f0df7e40e7bbe0cbf4b9fcfef57ef975f375ea8142
Shutdown this notebook server (y/[n])? y
[C 16:52:11.840 NotebookApp] Shutdown confirmed
[I 16:52:11.841 NotebookApp] Shutting down kernels
the key output here being:
/opt/twitter_mde/package/python2.7/current/bin/python2.7: No module named ipykernel_launcher
which seems to indicate that jupyter is attempting to relaunch the equivalent of python -m ipykernel_launcher .... this is confirmed by looking at the kernel.json for Python 2:
{
"display_name": "Python 2",
"language": "python",
"argv": [
"python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
]
}
in the pex context, all transitive dependencies needed for execution are self contained within the pex as opposed to sourced from a traditional python environment (e.g. the interpreters site-packages or an outer venv). you can think of it kind of like a zipped executable virtualenv without any externalized environmental setup. so in the case of the attempted launch mode, the pex context will be lost leading to a failure to locate the ipykernel_launcher module in the base vanilla python interpreter's stdlib.
so in order to properly launch an ipykernel from within a pex, we'd need to be self referential and set environment variables. from the CLI, that would look like something along the lines of:
$ PEX_MODULE=ipykernel_launcher <sys.argv[0]> ...
FWICT, it seems like at least one way to accomplish this would be to overload/hijack the main notebook server entrypoint and spit out a custom "kernel spec" prior to server launch that would essentially look like:
{
"display_name": "Python 2/<path_to_the.pex>",
"env": {"PEX_MODULE": "ipykernel_launcher"},
"language": "python",
"argv": ["python2", "<path_to_the.pex>", "-f", "{connection_file}"]
}
however, it'd be great to avoid hacks like this in favor of a more first class support model for pex.
if anyone has any better solutions or a high level strategy on how to go about adding better first class support for pex in Jupyter, I'd be all ears - and more than willing to contribute the necessary PRs to realize that. thanks in advance!
at least one semi-reasonable strategy here that I can see would be to compose a shim/surrogate entrypoint that wraps the notebook launcher in the pex context that would:
1) create a temporary dir
2) emit the kernel.json as described above to the tmp dir under kernels/<id>/kernel.json
3) add the temporary dir to an exported JUPYTER_PATH
4) invoke the notebook server runner
5) cleanup the temporary dir in a finally block
this helps isolate the configuration to a per-run instance vs stashing keyed, per-run copies in e.g. ~/.jupyter.
I'm planning to run with this model now for the purposes of experimentation, but open to better strategies here if anyone has ideas.
If I interpret this line in launcher.py (jupyter_client) correctly, the kernel will inherit the notebook server's environment, unless the kernel spec defines an environment. So, if your kernel specs don't set any environment variables, you could provide what you need to the notebook server, and it will be available to the kernels.
If your kernel specs do set some environment variables, you could customize the launcher to pass selected environment variables from the notebook server to the kernels. Or you could customize the kernel manager to always pass an environment definition to the launcher. If you get the list of environment variables to be propagated from the configuration, you could create a PR and maybe get your changes merged.
the env var that needs to be set would specify the entrypoint of the kernel launcher, so in terms of concerns it'd be part of the "kernel configuration" (i.e. something we set only at kernel launch time vs something we'd want as a static env var in the parent, which in theory could potentially leak into other non-desired contexts or kernel launches). tho it seems already possible to embed a static env var like this directly into a kernel.json - so really the remaining gap is the self-reference bit (i.e. understanding and being able to parameterize the values of sys.executable and sys.argv[0] from the running notebook server context).
so afaict, to make this all first class it seems like jupyter would need a way to specify kernel configuration in a plugin type model (i.e. executable python code vs json). it might also be cool to use a registry/discovery type pattern against the installed plugins so that just e.g. their presence in the python environment could enable them for use. this would make it as easy as a pip install to add new kernel types.
fwiw, I've posted an initial implementation of the surrogate shim approach described above here which is working well for the moment.
to make this all first class it seems like jupyter would need a way to specify kernel configuration
The KernelSpecManager and KernelManager classes are the implementations of finding and launching kernels, respectively. These are swappable for alternate implementations via the kernel_manager_class and kernel_spec_manager_class configurables on NotebookApp.
I just put together pexnb which provides a KernelSpecManager that works with PEX and tells the notebook server to use it by default.
You should be able to build a notebook env with pex via:
pex notebook pexnb -m pexnb -o ./jupyter_notebook.pex
$PWD/jupyter_notebook.pex
It has the assumptions:
Quick reminder: I'm planning a revamp of the kernel finding machinery, described here: https://github.com/jupyter/jupyter_client/pull/261
thanks for the pointers and reference implementation @minrk - very helpful!