Unfortunatly I did not find a safe way to reproduce this bug, I saw it (only) a few times, like 2 times in a couple dozens of prints …
I've got 1.4.1 installed, but saw that before, likely with 1.4.0.
Due to the lack of a way to reproduce, I could not reproduce this in safe mode.
My setup:
The observed problem:
I obeserved the problem while printing, when the printer suddendly started to stutter. With stutter I mean that it seems not to get the gcode fast enough, making tiny moves and then tiny pauses and then continues…
What I did (it might not be related, I try to be as verbose as possible):
Examining the journal for the octoprint.service when I guess the problem started:
Aug 13 08:39:01 bananapi2 octoprint[580]: 2020-08-13 08:39:01,742 - octoprint.plugins.detailedprogress - INFO - Printing started. Detailed progress started.
Aug 13 08:39:01 bananapi2 octoprint[580]: 2020-08-13 08:39:01,767 - octoprint.plugins.stats - INFO - Printer Stats - on_event
Aug 13 08:39:03 bananapi2 octoprint[580]: 2020-08-13 08:39:03,279 - octoprint.plugins.stats - INFO - Printer Stats - on_event
Aug 13 08:39:08 bananapi2 octoprint[580]: 2020-08-13 08:39:08,542 - octoprint.server.util.flask.PreemptiveCache - INFO - Adding entry for / and {'query_string': u'l10n=de', 'path': u'/
', u'_count': 1, u'_timestamp': 1597300748.541694, 'base_url': u'http://bananapi2:5000/'}
Aug 13 08:39:48 bananapi2 octoprint[580]: 2020-08-13 08:39:48,527 - octoprint.server.util.sockjs - INFO - New connection from client: 10.x.x.68
Aug 13 08:39:48 bananapi2 octoprint[580]: 2020-08-13 08:39:48,990 - octoprint.server.util.flask - INFO - Passively logging in user _admin from 10.x.x.68
Aug 13 08:39:49 bananapi2 octoprint[580]: 2020-08-13 08:39:48,998 - octoprint.access.users - INFO - Logged in user: _admin
Aug 13 08:39:57 bananapi2 octoprint[580]: 2020-08-13 08:39:57,700 - octoprint.filemanager.analysis - INFO - Analysis of entry local:smd-60mm-container-lid.gcode finished, needed 68.44s
Aug 13 08:39:58 bananapi2 octoprint[580]: 2020-08-13 08:39:58,113 - octoprint.server.util.sockjs - INFO - User _admin logged in on the socket from client 10.x.x.68
Aug 13 08:50:52 bananapi2 octoprint[580]: 2020-08-13 08:50:52,806 - octoprint.server.heartbeat - INFO - Server heartbeat <3
Aug 13 08:51:43 bananapi2 octoprint[580]: 2020-08-13 08:51:43,630 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 958}
Aug 13 09:05:52 bananapi2 octoprint[580]: 2020-08-13 09:05:52,808 - octoprint.server.heartbeat - INFO - Server heartbeat <3
Aug 13 09:06:43 bananapi2 octoprint[580]: 2020-08-13 09:06:43,596 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 1858}
Aug 13 09:20:52 bananapi2 octoprint[580]: 2020-08-13 09:20:52,817 - octoprint.server.heartbeat - INFO - Server heartbeat <3
Aug 13 09:21:43 bananapi2 octoprint[580]: 2020-08-13 09:21:43,716 - octoprint.plugins.tracking - INFO - Sent tracking event ping, payload: {'octoprint_uptime': 2758}
Aug 13 09:22:14 bananapi2 octoprint[580]: 2020-08-13 09:22:14,870 - tornado.general - WARNING - Write error on <socket._socketobject object at 0xa578b0d8>: [Errno 113] No route to host
Aug 13 09:22:16 bananapi2 octoprint[580]: 2020-08-13 09:22:16,721 - octoprint.server.util.sockjs - INFO - Client connection closed: 10.x.x.68
Aug 13 09:22:19 bananapi2 octoprint[580]: 2020-08-13 09:22:19,914 - tornado.application - ERROR - Future exception was never retrieved: Traceback (most recent call last):
Aug 13 09:22:19 bananapi2 octoprint[580]: File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/gen.py", line 1141, in run
Aug 13 09:22:19 bananapi2 octoprint[580]: yielded = self.gen.throw(*exc_info)
Aug 13 09:22:19 bananapi2 octoprint[580]: File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/websocket.py", line 876, in wrapper
Aug 13 09:22:19 bananapi2 octoprint[580]: raise WebSocketClosedError()
Aug 13 09:22:19 bananapi2 octoprint[580]: WebSocketClosedError
Aug 13 09:22:20 bananapi2 octoprint[580]: 2020-08-13 09:22:20,009 - tornado.application - ERROR - Future exception was never retrieved: Traceback (most recent call last):
Aug 13 09:22:20 bananapi2 octoprint[580]: File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/gen.py", line 1141, in run
Aug 13 09:22:20 bananapi2 octoprint[580]: yielded = self.gen.throw(*exc_info)
Aug 13 09:22:20 bananapi2 octoprint[580]: File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/websocket.py", line 876, in wrapper
Aug 13 09:22:20 bananapi2 octoprint[580]: raise WebSocketClosedError()
Aug 13 09:22:20 bananapi2 octoprint[580]: WebSocketClosedError
The tornardo future exectpion is then repeated several times a second and brings the CPU utilization to 100%, as glances showed me.
This is around the point of time where the stuttering stopped. Again, this could be coincidents, but I believe it is related:
Aug 13 09:30:27 bananapi2 octoprint[580]: WebSocketClosedError
Aug 13 09:30:27 bananapi2 octoprint[580]: 2020-08-13 09:30:27,192 - tornado.application - ERROR - Future exception was never retrieved: Traceback (most recent call last):
Aug 13 09:30:27 bananapi2 octoprint[580]: File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/gen.py", line 1141, in run
Aug 13 09:30:27 bananapi2 octoprint[580]: yielded = self.gen.throw(*exc_info)
Aug 13 09:30:27 bananapi2 octoprint[580]: File "/home/octoprint/OctoPrint/venv/local/lib/python2.7/site-packages/tornado/websocket.py", line 876, in wrapper
Aug 13 09:30:27 bananapi2 octoprint[580]: raise WebSocketClosedError()
Aug 13 09:30:27 bananapi2 octoprint[580]: WebSocketClosedError
Aug 13 09:30:29 bananapi2 octoprint[580]: 2020-08-13 09:30:29,138 - octoprint.server.util.sockjs - INFO - New connection from client: 10.x.x.68
Aug 13 09:30:29 bananapi2 octoprint[580]: 2020-08-13 09:30:29,855 - octoprint.server.util.flask - INFO - Passively logging in user _admin from 10.x.x.68
Looking at the PC's journal, S2R was entered there on:
Aug 13 09:06:00 isildor systemd-sleep[17216]: Suspending system...
and resumed on
Aug 13 09:25:37 isildor systemd-sleep[17216]: System resumed.
see above
see above
Due to the low occurence, I could not reproduce it unfortunatly in safe mode.
at least 1.4.0 and 1.4.1
(unknown if it happens on earlier versions, my logs are not old enough to check this)
Ultimake Clone, heavily modified; Marlin 2.0.x
OS: Debian (Testing/Unstable)
Browser: Firefox Quantum 68.11.0esr (64-Bit)
systemd's journal*: https://sviech.de/s/XtsCjWkfxCZFBNs/download
octoprint's logs directoy: https://sviech.de/s/CP5zgwTbKCNeLXm/download (tar.gz)
(* Be aware that the systemd log starts way in the past… I didnt want to limit it to "today")
sorry, did not record that information.
sorry, did not record that information.
n/a
I have read the FAQ.
Thanks for reporting this! I think this might be a bug in the now vendored sockjs dependency. Due to a tornado update it had to be ported to asyncio(-ish) and this might be an issue with that implementation that I need to root out and fix.
After just having spent the majority of the day trying to understand what might be going on here, I think I found at least the reason for the Future exception was never retrieved log entry, and possible for the whole storm of them.
The actual result of writing to the web socket is async and a future, since write_message returns a future. The error handling inside send_pack however assumes synchronous behaviour: https://github.com/OctoPrint/OctoPrint/blob/1.4.2/src/octoprint/vendor/sockjs/tornado/transports/websocket.py#L94 (same for rawwebsocket.py). Thus the exception handler there will not trigger if there's an error while writing, and thus the connection will not have on_close called.
The above commit works around this through a done_callback on the future, which should hopefully solve it. I say should because even though I did my best in trying to simulate your error scenario, I was never able to trigger even the "retrieved" message, let alone a whole storm of them.
So... Just in case this isn't it after all (because I haven't been able to reproduce the actual error and just have been going by code), I'm going to throw a bunch more links in here to code sections possibly related to this, so I don't have to start again every time I try to wrap my head around Tornado internals in order to debug this...
exception clears traceback log: https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/concurrent.py#L275write calls exception on write future: https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/iostream.py#L583gen.coroutine around future resolving wrapper rewraps it into a future: https://github.com/tornadoweb/tornado/blob/v5.1.1/tornado/websocket.py#L871Hey 👋
This issue was reproducible in 1.4.2 by:
If you don't reconnect, it seems to get never logged.
With the current maintenance (e781e9c3e8) the issue is not reproducible after 5 cycles of the above procedure. So seems fixed 👍
Perfect, thank you for reporting back on this! Considering this solved then and will close once 1.5.0 rolls out.
Most helpful comment
Hey 👋
This issue was reproducible in 1.4.2 by:
If you don't reconnect, it seems to get never logged.
With the current
maintenance(e781e9c3e8) the issue is not reproducible after 5 cycles of the above procedure. So seems fixed 👍