Octoprint: Running into some 'weird' websocket problem

Created on 11 Oct 2018  路  7Comments  路  Source: OctoPrint/OctoPrint

What were you doing?

Running OctoPod against OctoPrint 1.3.9 (and also latest from master). OctoPod connects to 'raw websocket' (i.e. /sockjs/websocket) to get updates from OctoPrint. FYI, OctoPod uses Starscream (also open source) for reading from the raw websocket.

What did you expect to happen?

OctoPod should to be able to _always_ read whatever data is being sent via raw websocket.

What happened instead?

There are 2 things going on that wonder if they are coming from the same root issue. OctoPod randomly (i.e. no clear steps to reproduce) receives what I call 'garbage' data. Here is an example

"tTimeLeft\": null, \"printTimeOriRecv: ok\", \"1 M19\"Send:11N8 M83*16\",rigin\": null}}}{\"current\": {\"logs\": [\"Recv: w452\"]820offsets\": {}, \"serverTime\": .......

When this happens I see this WARNING being printed in OctoPrint's log:

octoprint.server.util.sockjs - WARNING - Could not send message to client 192.168.252.3: Existing exports of data: object cannot be re-sized

I tried running OctoPrint in PyCharm to collect more data about this. I added some breakpoints that do not suspend the execution but print data being sent over websockets. OctoPrint was regularly sending {'current' ..} notifications just fine. And when an {'event'} notification and a {'current' ..} notification tried to be sent the above WARNING was printed. Here are an example of the 2 messages that tried to be sent over the raw websocket.

{'event': {'type': 'Home', 'payload': None}}

{'current': {'logs': [u'Recv: ok', u'Send: N11 M190 S60*110', u'Recv: wait'], 'offsets':

I did a quick google search on that WARNING and found this thread: https://github.com/tornadoweb/tornado/issues/2271. I wonder if OctoPrint has another thread, other than the original ioloop thread writing to the websocket as mentioned in that issue.

The other problem that I see is that sometimes the client receives the following text in a single frame:

{\"timelapse\": null}{\"event\": {\"type\": \"ClientOpened\", \"payload\": {\"remoteAddress\": \"98.232.252.169\"}}}

When that notification was sent, I saw this in OctoPrint's log:

octoprint.server.util.sockjs - WARNING - Could not send message to client 98.232.252.169:

One last peace of info, I'm pretty sure that I saw these 2 problems with OctoPrint 1.3.8. However, I do not recall seeing them in 1.3.7. Would need to start an old version to confirm this if that would help.

Did the same happen when running OctoPrint in safe mode?

No idea. Running with vanilla OctoPrint. No added plugins except what comes out of the box.

Version of OctoPrint

1.3.9. Master from github. 1.3.8.

Operating System running OctoPrint

OctoPrint running on raspbian. Also OctoPrint running on latest version of MacOS.

Printer model & used firmware incl. version

Happens with Prusa MK3. Happens with virtual printer.

Browser and version of browser, operating system running browser

N/A. Not using a browser to create this issue.

Link to octoprint.log

Pasted above.

Link to contents of terminal tab or serial.log

N/A

Link to contents of Javascript console in the browser

N/A

Screenshot(s)/video(s) showing the problem:

N/A

I have read the FAQ.

yes. :)

PS: Posted about this in the forums (https://discourse.octoprint.org/t/running-into-some-weird-websocket-problem/4530) but I think that this might be the proper place for this. Let me know if this is not the case.

Thanks a lot for your help.
Gaston

bug done

All 7 comments

BIG DISCLAIMER: First time playing with Python and also first time reading OctoPrint's code.

Having said that, I made a local fix for this based on the previous mentioned tornado issue and this new tornado issue that I found. From my understanding, the new tornado issue includes a test that exactly replicates our problem (i.e unsafe to call write_message from another thread) and also our symptom (i.e. it works most of the time until it doesn't).

I've been testing the fix for a couple of hours and the problem is now gone. I have no clue if this is the optimal way to solve the problem but it does solve it based on the testing. I do not see any regression so far. Will keep testing.

The fix was applied to transports/rawwebsocket.py and transports/websocket.py. I changed #send_pack(self, message, binary=False) to this new code:

    def send_pack(self, message, binary=False):
        if IOLoop.current(False) == self.server.io_loop:
            # Running in Main Thread
            # Send message
            try:
                self.write_message(message, binary)
            except IOError:
                self.server.io_loop.add_callback(self.on_close)
        else:
            # Not running in Main Thread so use proper thread to send message
            self.server.io_loop.add_callback(lambda: self.send_pack(message, binary))

If this looks ok, I can create a PR for this.

Thanks,
Gaston

Quick update and good news.

Applied the fix to OctoPrint 1.3.9 running in OctoPi and also to OctoPrint (master) running in my local machine with PyCharm and the error is no longer happening. Received data from raw websocket is always good.

Both WARNINGs are also gone:

octoprint.server.util.sockjs - WARNING - Could not send message to client 192.168.252.3: Existing exports of data: object cannot be re-sized

octoprint.server.util.sockjs - WARNING - Could not send message to client 98.232.252.169:

FYI, this other WARNING is still present in the logs but is not causing any problem that I can tell:

WARNING - Could not send message to client 2601:1c0:4600:6080:dd18:c40c:daee:6eab: 'NoneType' object has no attribute 'send_pack'

Interesting... Looks good, please send a PR for that against the maintenance branch (so it will be included in 1.3.10)! Now that the sockjs-tornado dependency is bundled there's nothing preventing us from making it more resilient.

I added this commit a while ago to prevent concurrent sending from different threads, but I of course forgot that this would still not solve the issue 100% considering that it didn't guarantee to be sent from the main loop 馃う

PR created: https://github.com/foosel/OctoPrint/pull/2858

Let me know if I can help with anything else.

Thanks,
Gaston

PR is merged to maintenance, soon devel, will be part of 1.3.10.

Closing since fix was already merged and locally tested.

Thanks @gdombiak
This fixed the issue for me too.

Was this page helpful?
0 / 5 - 0 ratings