Notebook: Notebook disconnection from kernel

Created on 3 Mar 2016 · 32Comments · Source: jupyter/notebook

I experience relatively frequent disconnections of my notebook from the kernel, resulting in a notebook that is unresponsive to input. This occurs most frequently when I, say, close my laptop and return to a session, but can also occur while my machine is active, and I merely return to the browser tab from another task.

Here is a movie of the behavior, incase my description is not clear.

Looking in the terminal, I notice a timeout message:

WebSocket ping timeout after 1408799 ms.

Typically, reloading the page will restore the connection.

I am running IPython 4.1.1 and Jupyter 4.0.6 on Python 3.5.1 and OS X 10.11.3.

Notebook Server Needs Discussion Bug

Source

fonnesbeck

👍2

Most helpful comment

I see the issue being closed but I did not catch any solutions for it. For me, it repeats in jupyter lab. I updated my tornado to 5.1.1, updated my jupyter as suggested above, and my jupyter version is 4.4.0; I am using Anaconda 3.5.5 on an EC2 instance with a tmux session. But yet still, when ever I loose wifi connection or even switch between routers, I encounter this websocker timeout error which eventually shuts down my jupyter kernel.
Is there an existing cure for this and I happen to miss it, or there is not and the issue just got closed?
Thank you!

BanuSelinTosun on 27 Sep 2019

👍4

All 32 comments

Two bugs:

we should make the ws ping robust to sleep (requiring missing multiple pings should do it)
we are doing something wrong on the client-side, since it should have a connection-lost state indicator

We should probably have an explicit _Reconnect_ action for users, for when the connection has been lost and we fail to reconnect properly.

minrk on 4 Mar 2016

This happened again, and I noticed the following in the console:

[E 15:22:33.670 NotebookApp] Uncaught exception GET /api/kernels/ebd806b9-c8f8-47e3-aa93-4799128e4c07/channels?session_id=1B0F667ED0314126A4124BD9C6E6DCA3 (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/kernels/ebd806b9-c8f8-47e3-aa93-4799128e4c07/channels?session_id=1B0F667ED0314126A4124BD9C6E6DCA3', version='HTTP/1.1', remote_ip='::1', headers={'Pragma': 'no-cache', 'Origin': 'http://localhost:8888', 'Connection': 'Upgrade', 'Upgrade': 'websocket', 'Cache-Control': 'no-cache', 'Sec-Websocket-Version': '13', 'Host': 'localhost:8888', 'Sec-Websocket-Extensions': 'x-webkit-deflate-frame', 'Sec-Websocket-Key': 'usZeo/+ZaCp/yQUNG63glg==', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/601.4.4 (KHTML, like Gecko) Version/9.0.3 Safari/601.4.4'})
    Traceback (most recent call last):
      File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/tornado/web.py", line 1401, in _stack_context_handle_exception
        raise_exc_info((type, value, traceback))
      File "<string>", line 3, in raise_exc_info
      File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 314, in wrapped
        ret = fn(*args, **kwargs)
      File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 191, in <lambda>
        self.on_recv(lambda msg: callback(self, msg), copy=copy)
      File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/notebook/base/zmqhandlers.py", line 253, in _on_zmq_reply
        self.write_message(msg, binary=isinstance(msg, bytes))
      File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/tornado/websocket.py", line 215, in write_message
        raise WebSocketClosedError()
    tornado.websocket.WebSocketClosedError

fonnesbeck on 17 Mar 2016

@fonnesbeck Thanks for passing along the log message. This will help us track down the cause.
cc/@minrk

willingc on 17 Mar 2016

@minrk Looking back at the sources from the traceback, here are some thoughts. Since somewhere along the line when the program gets to write_message, the system has either prematurely closed a websocket or thinks the websocket is closed so an error is thrown.

Here are a couple of PRs that may be related:

747
898
In this PR, stream is set to None in WebSocketMixin class. When we get further down to where one of the traceback issues was found in zmqhandlers file and ZMQStreamHandler, it's possible that stream is still None when ZMQStreamHandler is used which may cause the error handler to be called and the traceback emitted. My gut instinct is this is where the problem is, but I don't know the code base well enough to be sure.

Alternatively, this could be context issue where we are not handling the context correctly in eventloop.

# From tornado/stack_context.py
`StackContext` shifts the burden of restoring that state
from each call site (e.g.  wrapping each `.AsyncHTTPClient` callback
in ``async_callback``) to the mechanisms that transfer control from
one context to another (e.g. `.AsyncHTTPClient` itself, `.IOLoop`,
thread pools, etc).
...
Most applications shouldn't have to work with `StackContext` directly.
Here are a few rules of thumb for when it's necessary:
* If you're writing an asynchronous library that doesn't rely on a
  stack_context-aware library like `tornado.ioloop` or `tornado.iostream`
  (for example, if you're writing a thread pool), use
  `.stack_context.wrap()` before any asynchronous operations to capture the
  stack context from where the operation was started.
* If you're writing an asynchronous library that has some shared
  resources (such as a connection pool), create those shared resources
  within a ``with stack_context.NullContext():`` block.  This will prevent
  ``StackContexts`` from leaking from one request to another.
* If you want to write something like an exception handler that will
  persist across asynchronous calls, create a new `StackContext` (or
  `ExceptionStackContext`), and make your asynchronous calls in a ``with``
  block that references your `StackContext`.

willingc on 17 Mar 2016

👍1

@minrk @willingc is it realistic to fix this fairly soon for 4.3, or should it be bumped to a later release?

takluyver on 12 Nov 2016

At the very least, we can do the explicit reconnect action, since it should be an easy escape hatch when we have gotten into the wrong state.

minrk on 12 Nov 2016

We've already got the Kernel > reconnect action in the menu that I was referring to as the minimum bar escape hatch for 4.3. Apparently we've had it for ages (https://github.com/jupyter/notebook/commit/59b54eba). Bumping this one to 4.4 for a real investigation of the disconnects / what state it's in.

minrk on 23 Nov 2016

👍2

I was getting this error repeatedly when running a long script that iterated through about 30k loops, each time printing out a completed message. When I commented out the print, I did not get the timeout error -- potential temporary solution.

ashishsingal1 on 29 Dec 2016

👍1

I have the same issue, also while training neural networks with keras (which prints a lot) in Jupyter Notebooks. Not sure if it helps but here's the stack trace:

[W 23:01:11.586 NotebookApp] WebSocket ping timeout after 96750 ms.
[E 23:01:11.604 NotebookApp] Uncaught exception GET /api/kernels/83b72aa1-bce1-4f12-bda7-1e2229b06947/channels?session_id=6C9D4325C7194B9BB2AF77D5EB45065E (127.0.0.1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/kernels/83b72aa1-bce1-4f12-bda7-1e2229b06947/channels?session_id=6C9D4325C7194B9BB2AF77D5EB45065E', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Sec-Websocket-Extensions': 'permessage-deflate; client_max_window_bits', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'en-US,en;q=0.8', 'Sec-Websocket-Key': 'EUwPR3Nt58mWj8789yl5dQ==', 'Origin': 'http://localhost:8888', 'Host': 'localhost:8888', 'Upgrade': 'websocket', 'Sec-Websocket-Version': '13', 'Cache-Control': 'no-cache', 'Connection': 'Upgrade', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36', 'Pragma': 'no-cache'})
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/tornado/web.py", line 1425, in _stack_context_handle_exception
        raise_exc_info((type, value, traceback))
      File "<string>", line 3, in raise_exc_info
      File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 314, in wrapped
        ret = fn(*args, **kwargs)
      File "/usr/lib/python3/dist-packages/zmq/eventloop/zmqstream.py", line 191, in <lambda>
        self.on_recv(lambda msg: callback(self, msg), copy=copy)
      File "/usr/local/lib/python3.5/dist-packages/notebook/services/kernels/handlers.py", line 373, in _on_zmq_reply
        super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg)
      File "/usr/local/lib/python3.5/dist-packages/notebook/base/zmqhandlers.py", line 260, in _on_zmq_reply
        self.write_message(msg, binary=isinstance(msg, bytes))
      File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 210, in write_message
        raise WebSocketClosedError()
    tornado.websocket.WebSocketClosedError

Falkenjack on 15 Jan 2017

Can we do more for this for 5.0, or do we bump it to 5.1?

takluyver on 2 Feb 2017

@takluyver Not sure. Probably bump depending on 5.0 deadline.

@minrk Looking at the second traceback it's a bit different than Chris' probably due to some changes in the code related to checking iopub rate limit and message rate limit. https://github.com/jupyter/notebook/blame/master/notebook/services/kernels/handlers.py#L346
I'm going to do a bit of checking on how we test these. I think that we may be in a state waiting to resume or just coming out of resume and the socket times out and we're somehow not handling it correctly.

willingc on 2 Feb 2017

I have a similar issue. When I login to my jupyterhub from my example.com website, I get this websocket error which prevents kernel connections. However, when login from my LAN and I use example.com:8000, then kernels connect successfully. My hub is unreachable from WAN connections to example.com:8000. I would be grateful for help debugging.

cmbuford on 4 Apr 2017

If LAN is okay and WAN is not, that suggests that there is a proxy/firewall on the WAN that is blocking websockets.

minrk on 5 Apr 2017

It is likely that proxys/firewalls on WAN cause errors in some circumstances. But what would cause LAN websocket errors on example.com but not on example.com:8000 and how might I debug it?

cmbuford on 6 Apr 2017

At the very least, we can do the explicit reconnect action, since it should be an easy escape hatch when we have gotten into the wrong state.

@minrk Simple enough 👍 When exactly should we call that.notebook.kernel.reconnect()?

gnestor on 27 Jul 2017

I think the thing Min was talking about there was the menu entry that was already added. I'm not sure what else this issue is waiting for, so I'm going to bump it to backlog.

@minrk feel free to change the milestone back if there is something to do on this for 5.1.

takluyver on 31 Jul 2017

I'm getting these messages training Keras/Tensorflow models in Jupyter.
Is this the same issue? If yes, Is there a known workaround?

[W 16:48:58.146 NotebookApp] WebSocket ping timeout after 90000 ms.
[E 16:48:58.151 NotebookApp] Uncaught exception GET /api/kernels/3071efa5-0136-4c03-86e0-75e726d40144/channels?session_id=E3D0A943934F4339A6C589506C8F370C (75.140.157.194)
HTTPServerRequest(protocol='https', host='52.8.16.250:8888', method='GET', uri='/api/kernels/3071efa5-0136-4c03-86e0-75e726d40144/channels?session_id=E3D0A943934F4339A6C589506C8F370C', version='HTTP/1.1', remote_ip='75.140.157.194', headers={'Origin': 'https://52.8.16.250:8888', 'Upgrade': 'websocket', 'Sec-Websocket-Extensions': 'x-webkit-deflate-frame', 'Sec-Websocket-Version': '13', 'Connection': 'Upgrade', 'Sec-Websocket-Key': 'DXuOKL4oqhM160BU1OdqYQ==', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8', 'Host': '52.8.16.250:8888', 'Cookie': '_xsrf=2|75ec6b90|49ec26caa2d08b2381e5121833c57f87|1502997123; username-52-8-16-250-8888="2|1:0|10:1504628399|25:username-52-8-16-250-8888|44:MzljZTEzMGNiMTFhNGJiMGFmNWJiYjllODU0NTA1NGU=|cbbfacaf2d57802873e8361968259eba5afeaf018a53ecea8f9fd258cc7070f9"; username-52-8-16-250-8889="2|1:0|10:1503435884|25:username-52-8-16-250-8889|44:OTdkZWFkMjE0MWVkNDdmZGI4ZDgzMmZhOGU1YzJiMmY=|0db22d15f6392e710dd45299ee022b3750210f9fbd2df88c04f48810a27c1bc6"', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache'})
Traceback (most recent call last):
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tornado/web.py", line 1401, in _stack_context_handle_exception
raise_exc_info((type, value, traceback))
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(args, *kwargs)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 191, in
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/notebook/services/kernels/handlers.py", line 373, in _on_zmq_reply
super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/notebook/base/zmqhandlers.py", line 258, in _on_zmq_reply
self.write_message(msg, binary=isinstance(msg, bytes))
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tornado/websocket.py", line 215, in write_message
raise WebSocketClosedError()
WebSocketClosedError
[I 16:50:35.953 NotebookApp] Saving file at /Untitled1.ipynb
[I 16:51:39.129 NotebookApp] Adapting to protocol v5.0 for kernel 3071efa5-0136-4

dbl001 on 5 Sep 2017

I updated jupyter and it worked for me.
"conda update jupyter"

hapaa on 17 Jul 2018

👍1

@hapaa, Just out of interest what version did you upgrade from/to?

My experience is every time a WebSocket closes, Jupyter lab becomes unresponsive. Particularly the terminal. The logs only state WebSocket Closed and chrome console states that there has been an "uncaught exception". Any input results in the incrementing message "WebSocket Closed or in closing state".

Notebook version: 5.2.2
Tornado version: 4.5.3

Note that I have my notebook deployed on kubernetes behind a Nginx ingress controller serviced by a "Network Load Balancer".

djsd123 on 9 Aug 2018

This was a few weeks ago. I've done so many changes since then.
This said, I can't find the previous version of Jupyter, but the current version I'm using is 4.4.0.

hapaa on 10 Aug 2018

👍1

I can confirm that upgrading the base image of my notebook to jupyter/datascience-notebook:265297f221de seems to have improved the experience.

Resulting versions:

Notebook: 5.6.0
Tornado: 5.1

Websockets close but Jupyter is able to reconnect __without__ the UI becoming unresponsive and I do not need to refresh.

djsd123 on 10 Aug 2018

@takluyver @gnestor Since our recommendation is that folks use notebook 5.6.0 or higher as that has an updated version of MathJax, can we close this issue now? I personally haven't seen disconnects in this version.

willingc on 10 Aug 2018

Sure. Let's close this and if anyone encoutners this issue in notebook 5.6.0 or above, we can reopen.

gnestor on 14 Aug 2018

Thanks @gnestor.

willingc on 14 Aug 2018

~I'm still seeing this problem after trying to upgrade my base image:~
~FROM jupyter/datascience-notebook:177037d09156~
~notebook 5.6.0~
~tornado 5.1~

~[W 20:39:35.840 LabApp] WebSocket ping timeout after 90001 ms.~

EDIT: Just realized it's labelled "LabApp" and not "NotebookApp". Maybe my problem is not specific to JupyterLab.

jimmywan on 21 Aug 2018

Sorry = I am seeing Jupyter notebook kernel disconnect from the server while running a training loop
using fastai library and using pytorch (NOT keras). I am running Jupyter notebook version 5.6.0
Again seems to be related to WebSocket ping time out as can be seen from my terminal msgs.

The kernel is still running and so if I re-run the command it sometimes finishes but more often than not
the same time out error repeats and I just see the command not complete.

Happens frequently enough, that I cannot proceed to completion on the training.

W 21:18:41.515 NotebookApp] WebSocket ping timeout after 119989 ms.
[I 21:18:46.517 NotebookApp] Starting buffering for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5ba
[I 21:19:08.344 NotebookApp] Adapting to protocol v5.1 for kernel 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd
[I 21:19:08.345 NotebookApp] Restoring connection for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5ba

skr1125 on 21 Feb 2019

What happens if you try training this model to ‘jupiter lab'?

On Feb 20, 2019, at 7:36 PM, Srinivas notifications@github.com wrote:

Sorry = I am seeing Jupyter notebook kernel disconnect from the server while running a training loop
using fastai library and using pytorch (NOT keras). I am running Jupyter notebook version 5.6.0
Again seems to be related to WebSocket ping time out as can be seen from my terminal msgs.

The kernel is still running and so if I re-run the command it sometimes finishes but more often than not
the same time out error repeats and I just see the command not complete.

Happens frequently enough, that I cannot proceed to completion on the training.

W 21:18:41.515 NotebookApp] WebSocket ping timeout after 119989 ms.
[I 21:18:46.517 NotebookApp] Starting buffering for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5ba
[I 21:19:08.344 NotebookApp] Adapting to protocol v5.1 for kernel 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd
[I 21:19:08.345 NotebookApp] Restoring connection for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5ba

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1164#issuecomment-465849336, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9i29_iwjya3qQsXaQHR2FnEsjo_NFjks5vPhRGgaJpZM4Hodlz.

dbl001 on 21 Feb 2019

BanuSelinTosun on 27 Sep 2019

👍4

I can’t answer to why the issue was ‘closed’.
However, I’ve switched to ‘jupyter lab’ and haven’t had issues.

On Sep 26, 2019, at 7:20 PM, B. Selin Tosun, Ph.D. notifications@github.com wrote:

I see the issue being closed but I did not catch any solutions for it. For me, it repeats in jupyter lab. I updated my tornado to 5.1.1, updated my jupyter as suggested above, and my jupyter version is 4.4.0; I am using Anaconda 3.5.5 on an EC2 instance with a tmux session. But yet still, when ever I loose wifi connection or even switch between routers, I encounter this websocker timeout error which eventually shuts down my jupyter kernel.
Is there an existing cure for this and I happen to miss it, or there is not and the issue just got closed?
Thank you!

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1164?email_source=notifications&email_token=AAXWFWYPWRVDPJ27DOJHW7TQLVUXLA5CNFSM4B5B3FZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7XQBHI#issuecomment-535756957, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWFW5W5OKLELCMJZPVKTTQLVUXLANCNFSM4B5B3FZQ.

dbl001 on 27 Sep 2019

I'm getting this same problem. Kernel stops responding during certain function calls, even when those function calls aren't particularly length or computationally expensive. And then I see the websocket timeout messages in the jupyter log. And updates on how to fix this? Here is my version stack:

$ jupyter --version
jupyter core     : 4.6.3
jupyter-notebook : 6.0.3
qtconsole        : 4.7.1
ipython          : 7.13.0
ipykernel        : 5.2.0
jupyter client   : 6.1.0
jupyter lab      : not installed
nbconvert        : 5.6.1
ipywidgets       : 7.5.1
nbformat         : 5.0.4
traitlets        : 4.3.3

jlashner on 6 Jun 2020

👍2 👀1

I switched to Jupyter lab and haven’t had the issue.

On Jun 5, 2020, at 3:39 PM, Jack Lashner notifications@github.com wrote:

I'm getting this same problem. Kernel stops responding during certain function calls, even when those function calls aren't particularly length or computationally expensive. And then I see the websocket timeout messages in the jupyter log. And updates on how to fix this? Here is my version stack:

$ jupyter --version
jupyter core : 4.6.3
jupyter-notebook : 6.0.3
qtconsole : 4.7.1
ipython : 7.13.0
ipykernel : 5.2.0
jupyter client : 6.1.0
jupyter lab : not installed
nbconvert : 5.6.1
ipywidgets : 7.5.1
nbformat : 5.0.4
traitlets : 4.3.3
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1164#issuecomment-639872886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXWFWZQVQ6DTDLY25WX5VDRVFX2HANCNFSM4B5B3FZQ.

dbl001 on 6 Jun 2020

Hi I am using jupyter/datascience-notebook:1386e2046833 on Jupyterhub (with EKS on AWS) and still having this issue

getting kernel restarts - sometimes after getting a timeout from the websocket,
SingleUserLabApp zmqhandlers:182] WebSocket ping timeout after 90002 ms SingleUserLabApp kernelmanager:217] Starting buffering for 533a90f9-e00f-4019-8044-59727faba7a5:de0312ca-a1f7-478b-a24a-1fe22593ec5f

sometimes just kernel buffer and restart
kernelmanager:172] Kernel started: 1d7deac9-4b49-49c4-913c-490b6cb1d754