I experience relatively frequent disconnections of my notebook from the kernel, resulting in a notebook that is unresponsive to input. This occurs most frequently when I, say, close my laptop and return to a session, but can also occur while my machine is active, and I merely return to the browser tab from another task.
Here is a movie of the behavior, incase my description is not clear.
Looking in the terminal, I notice a timeout message:
WebSocket ping timeout after 1408799 ms.
Typically, reloading the page will restore the connection.
I am running IPython 4.1.1 and Jupyter 4.0.6 on Python 3.5.1 and OS X 10.11.3.
Two bugs:
We should probably have an explicit _Reconnect_ action for users, for when the connection has been lost and we fail to reconnect properly.
This happened again, and I noticed the following in the console:
[E 15:22:33.670 NotebookApp] Uncaught exception GET /api/kernels/ebd806b9-c8f8-47e3-aa93-4799128e4c07/channels?session_id=1B0F667ED0314126A4124BD9C6E6DCA3 (::1)
HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/kernels/ebd806b9-c8f8-47e3-aa93-4799128e4c07/channels?session_id=1B0F667ED0314126A4124BD9C6E6DCA3', version='HTTP/1.1', remote_ip='::1', headers={'Pragma': 'no-cache', 'Origin': 'http://localhost:8888', 'Connection': 'Upgrade', 'Upgrade': 'websocket', 'Cache-Control': 'no-cache', 'Sec-Websocket-Version': '13', 'Host': 'localhost:8888', 'Sec-Websocket-Extensions': 'x-webkit-deflate-frame', 'Sec-Websocket-Key': 'usZeo/+ZaCp/yQUNG63glg==', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/601.4.4 (KHTML, like Gecko) Version/9.0.3 Safari/601.4.4'})
Traceback (most recent call last):
File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/tornado/web.py", line 1401, in _stack_context_handle_exception
raise_exc_info((type, value, traceback))
File "<string>", line 3, in raise_exc_info
File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 191, in <lambda>
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/notebook/base/zmqhandlers.py", line 253, in _on_zmq_reply
self.write_message(msg, binary=isinstance(msg, bytes))
File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/tornado/websocket.py", line 215, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
@fonnesbeck Thanks for passing along the log message. This will help us track down the cause.
cc/@minrk
@minrk Looking back at the sources from the traceback, here are some thoughts. Since somewhere along the line when the program gets to write_message, the system has either prematurely closed a websocket or thinks the websocket is closed so an error is thrown.
Here are a couple of PRs that may be related:
stream is set to None in WebSocketMixin class. When we get further down to where one of the traceback issues was found in zmqhandlers file and ZMQStreamHandler, it's possible that stream is still None when ZMQStreamHandler is used which may cause the error handler to be called and the traceback emitted. My gut instinct is this is where the problem is, but I don't know the code base well enough to be sure.Alternatively, this could be context issue where we are not handling the context correctly in eventloop.
# From tornado/stack_context.py
`StackContext` shifts the burden of restoring that state
from each call site (e.g. wrapping each `.AsyncHTTPClient` callback
in ``async_callback``) to the mechanisms that transfer control from
one context to another (e.g. `.AsyncHTTPClient` itself, `.IOLoop`,
thread pools, etc).
...
Most applications shouldn't have to work with `StackContext` directly.
Here are a few rules of thumb for when it's necessary:
* If you're writing an asynchronous library that doesn't rely on a
stack_context-aware library like `tornado.ioloop` or `tornado.iostream`
(for example, if you're writing a thread pool), use
`.stack_context.wrap()` before any asynchronous operations to capture the
stack context from where the operation was started.
* If you're writing an asynchronous library that has some shared
resources (such as a connection pool), create those shared resources
within a ``with stack_context.NullContext():`` block. This will prevent
``StackContexts`` from leaking from one request to another.
* If you want to write something like an exception handler that will
persist across asynchronous calls, create a new `StackContext` (or
`ExceptionStackContext`), and make your asynchronous calls in a ``with``
block that references your `StackContext`.
@minrk @willingc is it realistic to fix this fairly soon for 4.3, or should it be bumped to a later release?
At the very least, we can do the explicit reconnect action, since it should be an easy escape hatch when we have gotten into the wrong state.
We've already got the Kernel > reconnect action in the menu that I was referring to as the minimum bar escape hatch for 4.3. Apparently we've had it for ages (https://github.com/jupyter/notebook/commit/59b54eba). Bumping this one to 4.4 for a real investigation of the disconnects / what state it's in.
I was getting this error repeatedly when running a long script that iterated through about 30k loops, each time printing out a completed message. When I commented out the print, I did not get the timeout error -- potential temporary solution.
I have the same issue, also while training neural networks with keras (which prints a lot) in Jupyter Notebooks. Not sure if it helps but here's the stack trace:
[W 23:01:11.586 NotebookApp] WebSocket ping timeout after 96750 ms.
[E 23:01:11.604 NotebookApp] Uncaught exception GET /api/kernels/83b72aa1-bce1-4f12-bda7-1e2229b06947/channels?session_id=6C9D4325C7194B9BB2AF77D5EB45065E (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/kernels/83b72aa1-bce1-4f12-bda7-1e2229b06947/channels?session_id=6C9D4325C7194B9BB2AF77D5EB45065E', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Sec-Websocket-Extensions': 'permessage-deflate; client_max_window_bits', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'en-US,en;q=0.8', 'Sec-Websocket-Key': 'EUwPR3Nt58mWj8789yl5dQ==', 'Origin': 'http://localhost:8888', 'Host': 'localhost:8888', 'Upgrade': 'websocket', 'Sec-Websocket-Version': '13', 'Cache-Control': 'no-cache', 'Connection': 'Upgrade', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36', 'Pragma': 'no-cache'})
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tornado/web.py", line 1425, in _stack_context_handle_exception
raise_exc_info((type, value, traceback))
File "<string>", line 3, in raise_exc_info
File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(*args, **kwargs)
File "/usr/lib/python3/dist-packages/zmq/eventloop/zmqstream.py", line 191, in <lambda>
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/usr/local/lib/python3.5/dist-packages/notebook/services/kernels/handlers.py", line 373, in _on_zmq_reply
super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg)
File "/usr/local/lib/python3.5/dist-packages/notebook/base/zmqhandlers.py", line 260, in _on_zmq_reply
self.write_message(msg, binary=isinstance(msg, bytes))
File "/usr/local/lib/python3.5/dist-packages/tornado/websocket.py", line 210, in write_message
raise WebSocketClosedError()
tornado.websocket.WebSocketClosedError
Can we do more for this for 5.0, or do we bump it to 5.1?
@takluyver Not sure. Probably bump depending on 5.0 deadline.
@minrk Looking at the second traceback it's a bit different than Chris' probably due to some changes in the code related to checking iopub rate limit and message rate limit. https://github.com/jupyter/notebook/blame/master/notebook/services/kernels/handlers.py#L346
I'm going to do a bit of checking on how we test these. I think that we may be in a state waiting to resume or just coming out of resume and the socket times out and we're somehow not handling it correctly.
I have a similar issue. When I login to my jupyterhub from my example.com website, I get this websocket error which prevents kernel connections. However, when login from my LAN and I use example.com:8000, then kernels connect successfully. My hub is unreachable from WAN connections to example.com:8000. I would be grateful for help debugging.
If LAN is okay and WAN is not, that suggests that there is a proxy/firewall on the WAN that is blocking websockets.
It is likely that proxys/firewalls on WAN cause errors in some circumstances. But what would cause LAN websocket errors on example.com but not on example.com:8000 and how might I debug it?
At the very least, we can do the explicit reconnect action, since it should be an easy escape hatch when we have gotten into the wrong state.
@minrk Simple enough đ When exactly should we call that.notebook.kernel.reconnect()?
I think the thing Min was talking about there was the menu entry that was already added. I'm not sure what else this issue is waiting for, so I'm going to bump it to backlog.
@minrk feel free to change the milestone back if there is something to do on this for 5.1.
I'm getting these messages training Keras/Tensorflow models in Jupyter.
Is this the same issue? If yes, Is there a known workaround?
[W 16:48:58.146 NotebookApp] WebSocket ping timeout after 90000 ms.
[E 16:48:58.151 NotebookApp] Uncaught exception GET /api/kernels/3071efa5-0136-4c03-86e0-75e726d40144/channels?session_id=E3D0A943934F4339A6C589506C8F370C (75.140.157.194)
HTTPServerRequest(protocol='https', host='52.8.16.250:8888', method='GET', uri='/api/kernels/3071efa5-0136-4c03-86e0-75e726d40144/channels?session_id=E3D0A943934F4339A6C589506C8F370C', version='HTTP/1.1', remote_ip='75.140.157.194', headers={'Origin': 'https://52.8.16.250:8888', 'Upgrade': 'websocket', 'Sec-Websocket-Extensions': 'x-webkit-deflate-frame', 'Sec-Websocket-Version': '13', 'Connection': 'Upgrade', 'Sec-Websocket-Key': 'DXuOKL4oqhM160BU1OdqYQ==', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8', 'Host': '52.8.16.250:8888', 'Cookie': '_xsrf=2|75ec6b90|49ec26caa2d08b2381e5121833c57f87|1502997123; username-52-8-16-250-8888="2|1:0|10:1504628399|25:username-52-8-16-250-8888|44:MzljZTEzMGNiMTFhNGJiMGFmNWJiYjllODU0NTA1NGU=|cbbfacaf2d57802873e8361968259eba5afeaf018a53ecea8f9fd258cc7070f9"; username-52-8-16-250-8889="2|1:0|10:1503435884|25:username-52-8-16-250-8889|44:OTdkZWFkMjE0MWVkNDdmZGI4ZDgzMmZhOGU1YzJiMmY=|0db22d15f6392e710dd45299ee022b3750210f9fbd2df88c04f48810a27c1bc6"', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache'})
Traceback (most recent call last):
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tornado/web.py", line 1401, in _stack_context_handle_exception
raise_exc_info((type, value, traceback))
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tornado/stack_context.py", line 314, in wrapped
ret = fn(args, *kwargs)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 191, in
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/notebook/services/kernels/handlers.py", line 373, in _on_zmq_reply
super(ZMQChannelsHandler, self)._on_zmq_reply(stream, msg)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/notebook/base/zmqhandlers.py", line 258, in _on_zmq_reply
self.write_message(msg, binary=isinstance(msg, bytes))
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/tornado/websocket.py", line 215, in write_message
raise WebSocketClosedError()
WebSocketClosedError
[I 16:50:35.953 NotebookApp] Saving file at /Untitled1.ipynb
[I 16:51:39.129 NotebookApp] Adapting to protocol v5.0 for kernel 3071efa5-0136-4
I updated jupyter and it worked for me.
"conda update jupyter"
@hapaa, Just out of interest what version did you upgrade from/to?
My experience is every time a WebSocket closes, Jupyter lab becomes unresponsive. Particularly the terminal. The logs only state WebSocket Closed and chrome console states that there has been an "uncaught exception". Any input results in the incrementing message "WebSocket Closed or in closing state".
Notebook version: 5.2.2
Tornado version: 4.5.3
Note that I have my notebook deployed on kubernetes behind a Nginx ingress controller serviced by a "Network Load Balancer".
This was a few weeks ago. I've done so many changes since then.
This said, I can't find the previous version of Jupyter, but the current version I'm using is 4.4.0.
I can confirm that upgrading the base image of my notebook to jupyter/datascience-notebook:265297f221de seems to have improved the experience.
Resulting versions:
Websockets close but Jupyter is able to reconnect __without__ the UI becoming unresponsive and I do not need to refresh.
@takluyver @gnestor Since our recommendation is that folks use notebook 5.6.0 or higher as that has an updated version of MathJax, can we close this issue now? I personally haven't seen disconnects in this version.
Sure. Let's close this and if anyone encoutners this issue in notebook 5.6.0 or above, we can reopen.
Thanks @gnestor.
~I'm still seeing this problem after trying to upgrade my base image:~
~FROM jupyter/datascience-notebook:177037d09156~
~notebook 5.6.0~
~tornado 5.1~
~[W 20:39:35.840 LabApp] WebSocket ping timeout after 90001 ms.~
EDIT: Just realized it's labelled "LabApp" and not "NotebookApp". Maybe my problem is not specific to JupyterLab.
Sorry = I am seeing Jupyter notebook kernel disconnect from the server while running a training loop
using fastai library and using pytorch (NOT keras). I am running Jupyter notebook version 5.6.0
Again seems to be related to WebSocket ping time out as can be seen from my terminal msgs.
The kernel is still running and so if I re-run the command it sometimes finishes but more often than not
the same time out error repeats and I just see the command not complete.
Happens frequently enough, that I cannot proceed to completion on the training.
W 21:18:41.515 NotebookApp] WebSocket ping timeout after 119989 ms.
[I 21:18:46.517 NotebookApp] Starting buffering for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5ba
[I 21:19:08.344 NotebookApp] Adapting to protocol v5.1 for kernel 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd
[I 21:19:08.345 NotebookApp] Restoring connection for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5ba
What happens if you try training this model to âjupiter lab'?
On Feb 20, 2019, at 7:36 PM, Srinivas notifications@github.com wrote:
Sorry = I am seeing Jupyter notebook kernel disconnect from the server while running a training loop
using fastai library and using pytorch (NOT keras). I am running Jupyter notebook version 5.6.0
Again seems to be related to WebSocket ping time out as can be seen from my terminal msgs.The kernel is still running and so if I re-run the command it sometimes finishes but more often than not
the same time out error repeats and I just see the command not complete.Happens frequently enough, that I cannot proceed to completion on the training.
W 21:18:41.515 NotebookApp] WebSocket ping timeout after 119989 ms.
[I 21:18:46.517 NotebookApp] Starting buffering for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5ba
[I 21:19:08.344 NotebookApp] Adapting to protocol v5.1 for kernel 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd
[I 21:19:08.345 NotebookApp] Restoring connection for 97bae0b8-150a-4f0b-bad9-e714b4a8f2cd:22c97cb8911a49f885225d53ff47f5baâ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1164#issuecomment-465849336, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9i29_iwjya3qQsXaQHR2FnEsjo_NFjks5vPhRGgaJpZM4Hodlz.
I see the issue being closed but I did not catch any solutions for it. For me, it repeats in jupyter lab. I updated my tornado to 5.1.1, updated my jupyter as suggested above, and my jupyter version is 4.4.0; I am using Anaconda 3.5.5 on an EC2 instance with a tmux session. But yet still, when ever I loose wifi connection or even switch between routers, I encounter this websocker timeout error which eventually shuts down my jupyter kernel.
Is there an existing cure for this and I happen to miss it, or there is not and the issue just got closed?
Thank you!
I canât answer to why the issue was âclosedâ.
However, Iâve switched to âjupyter labâ and havenât had issues.
On Sep 26, 2019, at 7:20 PM, B. Selin Tosun, Ph.D. notifications@github.com wrote:
I see the issue being closed but I did not catch any solutions for it. For me, it repeats in jupyter lab. I updated my tornado to 5.1.1, updated my jupyter as suggested above, and my jupyter version is 4.4.0; I am using Anaconda 3.5.5 on an EC2 instance with a tmux session. But yet still, when ever I loose wifi connection or even switch between routers, I encounter this websocker timeout error which eventually shuts down my jupyter kernel.
Is there an existing cure for this and I happen to miss it, or there is not and the issue just got closed?
Thank you!â
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1164?email_source=notifications&email_token=AAXWFWYPWRVDPJ27DOJHW7TQLVUXLA5CNFSM4B5B3FZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7XQBHI#issuecomment-535756957, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWFW5W5OKLELCMJZPVKTTQLVUXLANCNFSM4B5B3FZQ.
I'm getting this same problem. Kernel stops responding during certain function calls, even when those function calls aren't particularly length or computationally expensive. And then I see the websocket timeout messages in the jupyter log. And updates on how to fix this? Here is my version stack:
$ jupyter --version
jupyter core : 4.6.3
jupyter-notebook : 6.0.3
qtconsole : 4.7.1
ipython : 7.13.0
ipykernel : 5.2.0
jupyter client : 6.1.0
jupyter lab : not installed
nbconvert : 5.6.1
ipywidgets : 7.5.1
nbformat : 5.0.4
traitlets : 4.3.3
I switched to Jupyter lab and havenât had the issue.
On Jun 5, 2020, at 3:39 PM, Jack Lashner notifications@github.com wrote:
I'm getting this same problem. Kernel stops responding during certain function calls, even when those function calls aren't particularly length or computationally expensive. And then I see the websocket timeout messages in the jupyter log. And updates on how to fix this? Here is my version stack:
$ jupyter --version
jupyter core : 4.6.3
jupyter-notebook : 6.0.3
qtconsole : 4.7.1
ipython : 7.13.0
ipykernel : 5.2.0
jupyter client : 6.1.0
jupyter lab : not installed
nbconvert : 5.6.1
ipywidgets : 7.5.1
nbformat : 5.0.4
traitlets : 4.3.3
â
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1164#issuecomment-639872886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXWFWZQVQ6DTDLY25WX5VDRVFX2HANCNFSM4B5B3FZQ.
Hi I am using jupyter/datascience-notebook:1386e2046833 on Jupyterhub (with EKS on AWS) and still having this issue
getting kernel restarts - sometimes after getting a timeout from the websocket,
SingleUserLabApp zmqhandlers:182] WebSocket ping timeout after 90002 ms
SingleUserLabApp kernelmanager:217] Starting buffering for 533a90f9-e00f-4019-8044-59727faba7a5:de0312ca-a1f7-478b-a24a-1fe22593ec5f
sometimes just kernel buffer and restart
kernelmanager:172] Kernel started: 1d7deac9-4b49-49c4-913c-490b6cb1d754
Most helpful comment
I see the issue being closed but I did not catch any solutions for it. For me, it repeats in jupyter lab. I updated my tornado to 5.1.1, updated my jupyter as suggested above, and my jupyter version is 4.4.0; I am using Anaconda 3.5.5 on an EC2 instance with a tmux session. But yet still, when ever I loose wifi connection or even switch between routers, I encounter this websocker timeout error which eventually shuts down my jupyter kernel.
Is there an existing cure for this and I happen to miss it, or there is not and the issue just got closed?
Thank you!