Octoprint: Failure To Resume From Pause

Created on 14 Apr 2018  路  28Comments  路  Source: OctoPrint/OctoPrint

What were you doing?

  1. Start Print
  2. Click "Pause"

What did you expect to happen?

  • The print pauses
  • OctoPrint prints "Changing monitoring state from "Printing" to "Paused" in the Terminal tab
  • OctoPrint " renames the "Pause" button to "Resume" and enables the button

What happened instead?

  • The print pauses
  • OctoPrint prints "Changing monitoring state from "Printing" to Pausing in the Terminal tab
  • The Pause button is not renamed and remains disabled.
  • Can't even cancel print, only option seems to be to restart OctoPrint and lose print

Did the same happen when running OctoPrint in safe mode?

Yes, it also happens in safe mode. In reproducing it I noticed it worked the first two times, but got into the frozen state after the 3rd pause.

Version of OctoPrint

OctoPrint 1.3.8 running on OctoPi 0.14.0

Note: I've been using OctoPrint for a long time (have also been a patreon supporter since 2016 fwiw) and have never had this problem. I only just upgraded, and this seems to have just started.

Operating System running OctoPrint

OctoPi 0.14.0

Printer model & used firmware incl. version

Printrbot Simple Metal. Unsure about Firmware, but purchased 4/22/2015 and have never upgraded firmware afaik.

Browser and version of browser, operating system running browser

Chrome 65.0.3325.181 from Windows 10

Link to octoprint.log

https://pastebin.com/ycpUXe8H

Link to contents of terminal tab or serial.log

https://pastebin.com/g5zWceJA

Link to contents of Javascript console in the browser

https://pastebin.com/NPB8nkyW

Screenshot(s)/video(s) showing the problem:

octoprinterror

I have read the FAQ.

Yes

bug done

Most helpful comment

serial.log

log created with standard configuration (I did not change "Log position on pause" option)

I started printing and Paused/Resumed until it was stuck at Pausing. Each time a Octoprint was stuck I added "STUCK" to the serial log with echo "STUCK" > serial.log

As a Workaround for a stuck print the button "Fake Acknowledgement" in the Advanced options of theTerminal` tab kicks the print back into a normal state. I then can Resume the print again.

The log has 4 "STUCK" events and a few more pauses

It could be my imagination, but I've got the feeling that the print gets stuck more frequently if the printer immediately stops after the pause. I think all goes well when the printer finishes a few commands (sometimes even returning a Recv: echo:busy: processing line) and then change from Pausing to Paused

I hope the log helps

All 28 comments

Looks like the issue was introduced in 1.3.7. I believe that because:

  1. Downgraded to 1.3.7, still seeing same issue. Pause worked twice, then broke after 3rd try.
  2. Restarted Pi, tried again, it failed on the 1st pause.
  3. Downgraded to 1.3.6 and pause now works 100% reliably.

Canceling print is also broken in 1.3.7, I believe it's the same cause.

@lprichar what happens after this:

Send: N576 G1 X106.012 Y122.210 E77.36645*82
Recv: ok
Send: N577 G1 X104.380 Y124.763 E77.51762*91
Changing monitoring state from "Printing" to "Pausing"
Recv: ok

It should be sending an M400/M114 combo. This looks like it never sends that out, which is unusual. Past issues with cancelling/pausing have been caused by the firmware having an issue with M114 (either no support or wrong response format). I've never seen it not be sent in the first place.

See here where it worked correctly:

Send: N443 G1 X104.440 Y161.551 E59.58453*80
Recv: ok
Send: N444 G1 X105.951 Y164.261 E59.73933*88
Changing monitoring state from "Printing" to "Pausing"
Recv: ok
Send: N445 M400*34
Recv: ok
Send: N446 M114*33
Recv: X:105.95Y:164.26Z:0.30E:59.74 Count X: 105.95Y:164.26Z:0.57
Recv: ok
Changing monitoring state from "Pausing" to "Paused"
Send: M105
Recv: ok T:244.8 /245.0 B:89.1 /90.0 T0:244.8 /245.0 T1:92.4 /0.0 T2:96.1 /0.0 @:102 B@:127

What you could try is disabling Settings > Serial Connection > Behaviour > Event based position logging > Log position on pause (and also Log position on cancel if that also causes issues). That might work around things since OctoPrint will no longer attempt to log the pause position (downside: no smart pause/resume scripts possible that way). Doesn't fix the core issue however which I'd really like to narrow down on.

@isilcala Logs please. I can't fix bugs I can't reproduce, and cancelling and pausing worked flawlessly in all my tests.

Correct, when pausing succeeds it sends out something like "Send: N111 M400*38". When it fails the last thing printed to the console is

Send: N179 G1 X91.050 Y142.383 E23.62361*110
Changing monitoring state from "Printing" to "Pausing"
Recv: ok

Then it goes deadish.

Bad news: Unchecking "Log position on pause" does not resolve the issue. I could share more logs if you like, but they don't look any different afaik.

The lack of consistency is odd. Pausing works correctly between 0 and 2 times before failing to send M400/M114.

btw I did discover that disconnecting and reconnecting from the printer gets it back to a working state. That's a nice discovery since restarting octoprint on failure was getting old fast.

@lprichar can you prepare a full serial.log from connecting to the printer to the failure mode? It looks like the whole send loop goes dead, so I fear I will have to start count acknowledgements to get to the ground of this. I sadly still can't reproduce this.

I turned on serial logging and it started working. Then I turned off serial logging and it was still working. I guess there's a delay in applying "Log position on pause"? Odd.

It could be my imagination but the movement seems a bit janky after resuming from pause, is that expected with Log position turned off?

I'm happy to be on 1.3.7/1.3.8 again and am willing to call this good enough. But I'd also be happy to help debug it further. I'll take your lead.

Debugging it further would definitely be preferable since if one person encounters it, more might encounter it, and if this isn't figured out it will most likely remain in the code.

If we can figure out the firmware that's running on your Printrbot, I could try to reproduce against that exact one. But in any case, a serial.log of a reproduction might shed some light on things.

I've also looked into adding some fail safes in the code to prevent stalls due to M114 not being acknowledged/parseable, but sadly your case looks more like the communication stall happens even before that since the M400/M114 combo isn't even sent. So I'm currently a bit at a loss how best to address that to be honest. Might get more ideas once I have a full comms log to go with though.

Issues like this are incredibly hard to look into - you can reproduce it fairly reliable, I can't at all, and it isn't even something that I can imagine how it could happen in theory, so that makes things extra tricky. Hmpf.

serial.log

log created with standard configuration (I did not change "Log position on pause" option)

I started printing and Paused/Resumed until it was stuck at Pausing. Each time a Octoprint was stuck I added "STUCK" to the serial log with echo "STUCK" > serial.log

As a Workaround for a stuck print the button "Fake Acknowledgement" in the Advanced options of theTerminal` tab kicks the print back into a normal state. I then can Resume the print again.

The log has 4 "STUCK" events and a few more pauses

It could be my imagination, but I've got the feeling that the print gets stuck more frequently if the printer immediately stops after the pause. I think all goes well when the printer finishes a few commands (sometimes even returning a Recv: echo:busy: processing line) and then change from Pausing to Paused

I hope the log helps

@NeroBurner awesome, thanks! That log certainly helps. Looks like there might be a race condition at play here, and probably related to #2587 (for which I've got an idea for a fix).

Just a question: I see that it got "unstuck" in the log after each STUCK - was that you pressing the "Fake Acknowledgement" button as well or did it do that by itself? I'm asking since you also explicitly stated when you pressed the button in some locations in the log, but those are after the communication took off again.

@foosel yes, the workflow for generating the log was the following

  • press Pause
  • if not stuck

    • press Resume

  • if stuck

    • then call echo "STUCK" >> serial.log from bash

    • press the "Fake Acknowledgement" button

    • call echo "FAKE ACKNOWLEDGEMENT" >> serial.log

    • press Resume button

  • Repeat

Therefore the "FAKE ACKNOWLEDGEMENT" lines in the log file are shortly after the Communication took off again

I'm very happy to help :) the commit bf3da2a is already the fix? That is awesome! Thanks!

@NeroBurner That fits my assumption perfectly then. And bf3da2a should indeed be the fix for both this issue and also #2587. Until 1.3.9 is out with that in place (currently traveling, so can't push that due to lack of a proper test environment), the "Fake Acknowledgement" button is indeed a good workaround.

I'll install from this commit and see how it works. I cancel dozens of times a day while testing, so it won't take long :) Thanks for this and I hope your trip is going as well as possible!

Well, I'm getting an error in tcpserver.py around line 138. self.io_loop is undefined, and it doesn't look like __init__ is being called. Guessing this has to do with your debug configuration somehow. Please feel free to ignore this since you are traveling, but if it's easy I'd love to do some testing for you!

FWIW, I just experienced this same issue with a Octoprint 1.3.7 and a genuine prusa i3 mk2. Pressing Fake Acknowledgement in the terminal got me going again as well.

@FormerLurker Is this against that specific commit or against the full maintenance branch? The latter should definitely boot up without any errors and I'm a bit confused about the issues you are seeing there. Sounds like some tornado issue, maintenance has been upgraded to Tornado 5 (even though I'll probably need to downgrade that to 4.5 again due to memory leaks that were observed), and that change some stuff in IOLoop handling. But the code base on the maintenance branch should be adjusted to deal with this.

I did a checkout from origin/maintenance, then a pull. I decided against pulling from that individual commit because I figured I might be missing something important that was subsequently checked in.

I can test the commit if you want. Here is the full log.

Enjoy your evening and forget about this for a while :)

@FormerLurker when you got that error, did you also remember to do a pip install -e .? It looks like some mismatched dependencies and there were some changes recently due to some Tornado back and forth on maintenance

I may not have. I'll try again and will report back.

Looks like that worked. Thanks for the assistance! I'll test out cancel after work today.

Just to make this easier to find...

Workaround for this issue can be found here

I'm having a new issue now. When I try to start a print I get the following error:
AttributeError: 'Printer' object has no attribute '_setPrintingUser'
Here is a link to the log.

I just tried a pull from the maintenance branch, and it seems to be up-to-date.

By the way, I checked out my pycharm debug configuration and see the following external tool is executed before launch:

install -e .[develop,plugins]

Should that not always update all the prerequisites? I've only been using pycharm for a month or two now, so excuse my ignorance please.

@FormerLurker try again, I just fixed the above which was also reported as #2610

But yeah, that should always update the dependencies before start. Huh. I have no idea then. I certainly can't trigger this error (or another one reported by someone else after firing up maintenance). So far it's a mystery, I just hope that I will yet figure that out, stuff like that makes me nervous.

Not sure, but I wouldn't sweat it too much. I understand too well why things like this make you nervous, though. I was able to get the print to start after pulling the latest commit, so your update fixed my immediate problem. I'll let you know if I have new issues while testing, and will watch for new commits.

I have Just encountered
this same Pause behaviour.
It looks like you have a reasonable start on the issuse
I am on a r-pi 1 1.38 on octopi Version 0.13.0
Late est stable Marlin on a ramps if thats worth anything

attached is the Gcode I didn't capture a log file this time around as I didn't realize what I was looking for but I will run the gcode again after i get this print thats now running done
filamentchangegcode.txt
if you need more info please ask

This is an older issue but I want to confirm that I'm experiencing the same issue using the latest stable Marlin 1.1.x on both the Anet A8 and Mini RAMBo boards. For now disabling position capturing worked great for me.

One thing to note is that if you use the action command responses from configuration_adv.h to send the pause and resume from the M600 thats been run locally on the printer either by lcd or runout sensor, the state changes to pausing and the machine parks as expected. When you click resume on the LCD, the nozzle comes back to place and octoprint then changes to paused and completely misses the resume command because it is still in a pausing state when it arrives. You then need to resume octoprint manually. This worked fine in 1.3.7, but 1.3.8 broke it. Ive just been too busy to track down anything more to look for a fix, much bigger fish to fry! If someone is physically present to click resume on the lcd, clicking resume on octoprint next is a very small issue.

This issue has been mentioned on OctoPrint Community Forum. There might be relevant details there:

https://community.octoprint.org/t/flashforge-creator-pro-2016-stuck-in-cancelling-state/25026/1

Was this page helpful?
0 / 5 - 0 ratings