Sorry to create another issue... A few users have been reporting that my plugin Octolapse locked up while taking a snapshot. After a LOT of back and forth, debugging and the like, I finally have enough data to see what is happening. I'm not able to replicate it myself yet. The user who provided the logs has a Wanhao I3 with stock firmware. Fortunately this printer acknowledges commands with a line number, making it somewhat easier to see what's going on.
I believe the issue involves resend requests when the job lock has been acquired via set_job_on_hold(True).
Any insight at all would be appreciated. I am presently trying to write some code that will force the issue to appear and will amend the ticket if it works.
I guess I figured the resend would happen. I honestly hadn't thought of the possibility before.
The resend doesn't get executed until the job lock is released.
Unfortunately I can't test this because the plugin must be enabled.
1.3.7 rc2
OctoPi v 0.13.0
Wanhao I3 stock firmware
NA - Not a UI issue
I apologize in advance because the user truncated the logs somewhat, but I think pretty much everything useful is there. I will hound and beg (pay?) for more logs, and will attach more as they become available.
I believe plugin_octolapse.log will be most informative. You'll see on line 340 that the M114 is sent, but no response is generated.
I have 2 terminal logs. This one corresponds to the plugin_octolapse.log file above. You'll notice on line 242 is where the M114 was sent to the printer.
This log contains the sequence where I asked the user to manually enter an M114 into the terminal. I have annotated this log to clarify what happened when. Unfortunately the user didn't give me the entire terminal output and it was gone when I asked for the rest. I'm trying to get more logs for you as we speak, but the issue is transient so they are difficult to acquire from users.
NA - No UI issues at all
NA
I have read the FAQ.
It confuses me tremendously that the resend request isn't even received while you are holding the lock. That shouldn't be possible unless you are somehow blocking the monitoring thread that is responsible of reading from the serial, and you don't appear to do that because otherwise I don't see how you could receive the oks from the printer that you are receiving. And the printer not sending it until provoked also would be a new one but to be honest, at this point that wouldn't surprise me anymore, too many printer vendors doing too weird sh... with their firmware ;)
While you are trying to create a minimal viable reproduction example (which btw would be greatly appreciated!) I'll also see if I can't trigger a resend request while holding a job lock against the virtual printer and see what's happening.
One question though, you write:
Following the threading.Event pattern you suggested, I use a Event.wait until the position is received by another function through the standard Octoprint event hook.
Where are you waiting? In the thread you spawned or someplace else?
Good news (or bad, depending on your point of view), I can reproduce this with a small plugin against the virtual printer:
# coding=utf-8
from __future__ import absolute_import
import octoprint.plugin
import time
import threading
class TestJobOnHoldPlugin(octoprint.plugin.OctoPrintPlugin):
def handle(self, comm_instance, phase, cmd, cmd_type, gcode, tags=None, *args, **kwargs):
if gcode != "G80" or (tags and "plugin:test_job_on_hold" in tags):
return
def process():
try:
self._logger.info("Going to sleep for a bit")
time.sleep(20)
self._logger.info("... and back!")
# now send the command
self._printer.commands(cmd, tags=tags)
finally:
self._printer.set_job_on_hold(False)
pass
if self._printer.set_job_on_hold(True):
thread = threading.Thread(target=process)
thread.daemon = True
thread.start()
return "!!DEBUG:trigger_resend_checksum"
__plugin_implementation__ = TestJobOnHoldPlugin()
__plugin_hooks__ = {
"octoprint.comm.protocol.gcode.queuing": __plugin_implementation__.handle
}
Repro steps: Install that, start a print, send a G80 through the terminal tab. Print will stop, resend request trigger will be sent to the virtual printer, but for some reason the resend request won't be received until another command is sent. Which doesn't make a whole ton of sense for me right now and I'll have to dig deeper.
Looks like we'll need a 1.3.7rc3 however since this is a bug in newly added functionality and that's a reason for rolling out a new RC. Meh ;)
I stand corrected. The reason I could reproduce this was an implementation detail of the virtual printer, not the job_on_hold flag. So no reproduction so far and still a huge question mark over my head.
edit For reference, new test plugin:
# coding=utf-8
from __future__ import absolute_import
import octoprint.plugin
import time
import threading
class TestJobOnHoldPlugin(octoprint.plugin.OctoPrintPlugin):
def handle(self, comm_instance, phase, cmd, cmd_type, gcode, tags=None, *args, **kwargs):
if gcode != "G80" or (tags and "plugin:test_job_on_hold" in tags):
return
def process():
try:
self._logger.info("Going to sleep for a bit")
time.sleep(20)
self._logger.info("... and back!")
# now send the command
self._printer.commands(cmd, tags=tags)
finally:
self._printer.set_job_on_hold(False)
pass
if self._printer.set_job_on_hold(True):
thread = threading.Thread(target=process)
thread.daemon = True
thread.start()
return ["M117 sleeping...", "!!DEBUG:trigger_resend_checksum", "M400", "M114"]
__plugin_implementation__ = TestJobOnHoldPlugin()
__plugin_hooks__ = {
"octoprint.comm.protocol.gcode.queuing": __plugin_implementation__.handle
}
And generated (annotated) output:
Send: N62 G1 F1500*126 <- last line from job
Recv: ok N62
Send: N63 M117 sleeping...*16 <- triggered by the G80, job is now on hold
Send: !!DEBUG:trigger_resend_checksum <- triggered by the G80
Recv: echo:sleeping...
Send: N64 M400*21 <- triggered by the G80
Recv: ok N63
Send: N65 M114*20 <- triggered by the G80
Recv: Error: Checksum mismatch <- response to N64
Recv: Resend:63
Recv: ok N62
Send: N63 M117 sleeping...*16 <- response to resend request
Recv: Error: expected line 63 got 65 <- caused by N65 sent before resend request for N63 triggered by N64 was received, ignored
Recv: Resend:63
Recv: ok N62
Send: N64 M400*21 <- continued resend chain
Recv: echo:sleeping... <- N63 output
Recv: ok N63 <- N63 output
Send: N65 M114*20 <- continued resend chain
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: ok N64 <- N64 output
Recv: ok N65 X:69.21 Y:69.21 Z:0.4 E0:20.1281 <- N65 output
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64 <- we are now sleeping due to the job being on hold
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Send: N66 G80*33 <- G80 sent, job no longer on hold
(Note to self: the resend request triggering in the virtual printer behaves very oddly with regards to what command it triggers for... need to improve that)
It confuses me too, believe me. I'm having a hard time believing it's a bug in OctoPrint, but I am also having a hard time believing this is a problem with Octolapse, lol! Ug, and I'm sorry to keep bothering you with this stuff. It seems like this is the last kink to iron out though, so that's good.
Here is the sequence of events for a snapshot, both before and after:
You can see step 2 on line 699 here: https://github.com/FormerLurker/Octolapse/blob/d28fe47f463293175ec39f22405a3abe17784588/octoprint_octolapse/timelapse.py
I'm still working reproducing the problem. I'll let you know if/when I have some success.
I have also requested very specific logs from a user: serial.log, terminal output too just in case, Octolapse logs with everything turned on, and a screen recording of the terminal during and after the lockup just so that we are 100% sure what's going on.
Also, for the record, most users aren't having this issue. For a few folks it happens every print.
THANK YOU so much for looking at this for me!
Edit: I made a mistake in step 7 and swapped M114 and M400, fixed.
So I have gotten great logs from a really accommodating user.
octoprint.log - You can see a lot of resend requests, and they all appear to be happening inside of my snapshot routine.
serial.log - Look at line 2474, this is where the missing acknowledgement is (printer line 1073). Line 2611 is where the user sent M114(sent as printer line 1075) through the terminal window.
plugin_octolapse.log - See line 8764 for the acknowledgement of printer line 1073.
Video of the terminal interaction - The user waited around a minute before entering M114 to make sure a reasonable amount of time elapsed.
I'm still trying to reproduce.
Edit: These logs came from the same kind soul who provided the previous logs.
Hm... taking a look at these logs looks more like communication issues being at fault here (possibly triggered though by OctoPrint sending too many lines, I'll have to take a closer look at that). See here
2018-03-28 20:25:25,344 - Send: N1066 M117 [## ] 21%*21
2018-03-28 20:25:25,350 - Recv: ok 1066
2018-03-28 20:25:25,371 - Send: N1067 M83*40
2018-03-28 20:25:25,377 - Send: N1068 G1 F2400*119
2018-03-28 20:25:25,382 - Send: N1069 G91*47
2018-03-28 20:25:25,386 - Send: N1070 G1 Z0.500 F9000*32
2018-03-28 20:25:25,390 - Send: N1071 G90*39
2018-03-28 20:25:25,392 - Send: N1072 G1 X100.000 Y150.000*24
2018-03-28 20:25:25,396 - Send: N1073 M400*18
2018-03-28 20:25:25,398 - Send: N1074 M114*21 <- this whole block of lines confuses me, I'd expect a more staggered sending by the send loop
2018-03-28 20:25:25,552 - Recv: ok 1067
2018-03-28 20:25:25,554 - Recv: ok 1068
2018-03-28 20:25:25,556 - Recv: ok 1069
2018-03-28 20:25:25,557 - Recv: ok 1070
2018-03-28 20:25:25,559 - Recv: ok 1071
2018-03-28 20:25:25,561 - Recv: ok 1072
2018-03-28 20:25:25,563 - Recv: ok 1073 <- N1074 never gets acknowledged
2018-03-28 20:25:31,111 - Recv: wait <- This shows us that OctoPrint is receiving, so the read is not blocked, otherwise the wait
wouldn't arrive. So it's probably not a block by the job_on_hold flag after all
2018-03-28 20:25:32,119 - Recv: wait
[...]
2018-03-28 20:27:50,345 - Recv: wait
2018-03-28 20:27:50,380 - Send: N1075 M114*20 <- THIS however the printer does receive...
2018-03-28 20:27:50,393 - Recv: Error:expected line 1074 got 1075 <- ... then notices that something's amiss since it never got N1074...
2018-03-28 20:27:50,395 - Recv:
2018-03-28 20:27:50,397 - Recv: Resend:1074 <- ... so it requests a resend of that
2018-03-28 20:27:50,401 - Recv: ok
2018-03-28 20:27:50,404 - Send: N1074 M114*21 <- resend of N1074
2018-03-28 20:27:50,408 - Recv: ok 1074 <- ok of N1074
2018-03-28 20:27:50,410 - Send: N1075 M114*20 <- resend of N1075
2018-03-28 20:27:50,424 - Recv: X:100.00 Y:150.00 Z:3.98 E:0.00 <- output of N1074 (repetier sends output AFTER ok)
2018-03-28 20:27:50,428 - Recv: ok 1075 <- ok of N1075
2018-03-28 20:27:50,431 - Recv: X:100.00 Y:150.00 Z:3.98 E:0.00 <- output of N1075
This doesn't sound like receiving is blocked at all or a resend request that never gets received, it's more like N1074's first iteration simply never arrives at the printer.
The other variant:
2018-03-28 20:25:18,005 - Send: N1025 M83*46
2018-03-28 20:25:18,013 - Send: N1026 G1 F2400*125
2018-03-28 20:25:18,016 - Send: N1027 G91*37
2018-03-28 20:25:18,028 - Send: N1028 G1 Z0.500 F9000*45
2018-03-28 20:25:18,032 - Send: N1029 G90*42
2018-03-28 20:25:18,034 - Send: N1030 G1 X100.000 Y150.000*30
2018-03-28 20:25:18,036 - Recv: ok 1025
2018-03-28 20:25:18,041 - Send: N1031 M400*20
2018-03-28 20:25:18,045 - Send: N1032 M114*23
2018-03-28 20:25:18,161 - Recv: ok 1026
2018-03-28 20:25:18,163 - Recv: ok 1027
2018-03-28 20:25:18,165 - Recv: ok 1028
2018-03-28 20:25:18,167 - Recv: ok 1029
2018-03-28 20:25:18,193 - Recv: ok 1030
2018-03-28 20:25:18,195 - Recv: ok 1031
2018-03-28 20:25:18,735 - Recv:
2018-03-28 20:25:18,737 - Recv: Resend:1032
2018-03-28 20:25:18,741 - Recv: ok
I find it interesting that the firmware doesn't state WHY it's requesting a resend here. It's just sending an empty line, then the resend request. It's also interesting that the resend is always requested for the M114, and it always happens when we have this huge bulk send that I'm very suspicious of.
I wonder if this might be a buffer overflow on the printer's side caused by too many lines being sent all at once (again, right now I don't understand how this could even happen, OctoPrint's send loop should strictly limit the line sending based on received oks).
Are all affected printers running Repetier firmware or do you also have reports from other firmwares?
Short update:
The block sending doesn't appear to be caused by some general bug. I mirrored your plugin's behaviour (eat up command in queuing phase but start thread that enqueues a whole bunch of commands for sending while job is held) and that behaves as expected:
Send: N155 G1 X68.76 Y56.89 E37.9556*101
Recv: ok N155
Send: N156 M117 command #0*82
Recv: echo:command #0
Recv: ok N156
Send: N157 M117 command #1*82
Recv: echo:command #1
Recv: ok N157
Send: N158 M117 command #2*94
Recv: echo:command #2
Recv: ok N158
Send: N159 M117 command #3*94
Recv: echo:command #3
Recv: ok N159
Send: N160 M117 command #4*83
Recv: echo:command #4
Recv: ok N160
Send: N161 M117 command #5*83
Recv: echo:command #5
Recv: ok N161
Send: N162 M117 command #6*83
Recv: echo:command #6
Recv: ok N162
Send: N163 M117 command #7*83
Recv: echo:command #7
Recv: ok N163
Send: N164 M117 command #8*91
Recv: echo:command #8
Recv: ok N164
Send: N165 M117 command #9*91
Recv: echo:command #9
Recv: ok N165
Send: N166 M117 command #10*96
Recv: echo:command #10
Recv: ok N166
Send: N167 M117 command #11*96
Recv: echo:command #11
Recv: ok N167
Send: N168 M117 command #12*108
Recv: echo:command #12
Recv: ok N168
Send: N169 M117 command #13*108
Recv: echo:command #13
Recv: ok N169
Send: N170 M117 command #14*99
Recv: echo:command #14
Recv: ok N170
Send: N171 M117 command #15*99
Recv: echo:command #15
Recv: ok N171
Send: N172 M117 command #16*99
Recv: echo:command #16
Recv: ok N172
Send: N173 M117 command #17*99
Recv: echo:command #17
Recv: ok N173
Send: N174 M117 command #18*107
Recv: echo:command #18
Recv: ok N174
Send: N175 M117 command #19*107
Recv: echo:command #19
Recv: ok N175
Send: N176 M117 command #20*98
Recv: echo:command #20
Recv: ok N176
Send: N177 M117 command #21*98
Recv: echo:command #21
Recv: ok N177
Send: N178 M117 command #22*110
Recv: echo:command #22
Recv: ok N178
Send: N179 M117 command #23*110
Recv: echo:command #23
Recv: ok N179
Send: N180 M117 command #24*111
Recv: echo:command #24
Recv: ok N180
Send: N181 M117 command #25*111
Recv: echo:command #25
Recv: ok N181
Send: N182 M117 command #26*111
Recv: echo:command #26
Recv: ok N182
Send: N183 M117 command #27*111
Recv: echo:command #27
Recv: ok N183
Send: N184 M117 command #28*103
Recv: echo:command #28
Recv: ok N184
Send: N185 M117 command #29*103
Recv: echo:command #29
Recv: ok N185
Send: N186 M117 command #30*108
Recv: echo:command #30
Recv: ok N186
Send: N187 M117 command #31*108
Recv: echo:command #31
Recv: ok N187
Send: N188 M117 command #32*96
Recv: echo:command #32
Recv: ok N188
Send: N189 M117 command #33*96
Recv: echo:command #33
Recv: ok N189
Send: N190 M117 command #34*111
Recv: echo:command #34
Recv: ok N190
Send: N191 M117 command #35*111
Recv: echo:command #35
Recv: ok N191
Send: N192 M117 command #36*111
Recv: echo:command #36
Recv: ok N192
Send: N193 M117 command #37*111
Recv: echo:command #37
Recv: ok N193
Send: N194 M117 command #38*103
Recv: echo:command #38
Recv: ok N194
Send: N195 M117 command #39*103
Recv: echo:command #39
Recv: ok N195
Send: N196 M117 command #40*106
Recv: echo:command #40
Recv: ok N196
Send: N197 M117 command #41*106
Recv: echo:command #41
Recv: ok N197
Send: N198 M117 command #42*102
Recv: echo:command #42
Recv: ok N198
Send: N199 M117 command #43*102
Recv: echo:command #43
Recv: ok N199
Send: N200 M117 command #44*98
Recv: echo:command #44
Recv: ok N200
Send: N201 M117 command #45*98
Recv: echo:command #45
Recv: ok N201
Send: N202 M117 command #46*98
Recv: echo:command #46
Recv: ok N202
Send: N203 M117 command #47*98
Recv: echo:command #47
Recv: ok N203
Send: N204 M117 command #48*106
Recv: echo:command #48
Recv: ok N204
Send: N205 M117 command #49*106
Recv: echo:command #49
Recv: ok N205
Send: N206 M400*35
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: ok N206
Send: N207 M114*34
Recv: ok N207 X:68.76 Y:56.89 Z:0.4 E0:37.9556
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Recv: T:21.30 /0.00 B:21.30 /0.00 @:64
Send: N208 G80*27
Recv: ok N208
Send: N209 G1 X68.76 Y56.19 E38.0094*107
Recv: ok N209
I took a closer look at the number of sent lines vs. number of received oks in the serial.log:
$ grep -c "Recv: ok" full_serial.log
1340
$ grep -c "Send:" full_serial.log
1345
So it looks like we sent 5 lines more than we received ok for. So I went to hunt those down and I think I might have found them:
2018-03-28 20:22:27,184 - Send: N23 M400*22
2018-03-28 20:22:27,198 - Recv: ok 23
2018-03-28 20:22:27,200 - Send: N24 M114*17
2018-03-28 20:22:27,215 - Recv: ok 24
2018-03-28 20:22:35,888 - Recv: X:100.00 Y:150.00 Z:0.78 E:0.00
2018-03-28 20:22:35,896 - Recv: wait
2018-03-28 20:22:36,209 - Send: N25 G1 X90.037 Y88.742*18
2018-03-28 20:22:36,214 - Send: N26 G91*37
2018-03-28 20:22:36,232 - Recv: ok 25
2018-03-28 20:22:36,237 - Send: N27 G1 Z-0.500*97
2018-03-28 20:22:36,238 - Recv: ok 26
Note the wait sent right after the M114 output. What I think is happening here is that Repetier knows it already sent an ok for the M114 (before generating its output), since we had an M400 right before the planner buffer is now empty, so Repetier sends a wait. When OctoPrint receives a wait while in state "Printing" it assumes an ok must have gotten lost and hence reads that wait as an ok and sends the next enqueued line (N25N26 here). And while that is usually the correct approach, here it causes a problem. This exact scenario happens a bunch of times throughout the job (search for wait and you'll see it accumulate) and in the ends leads to OctoPrint haveing clearance for way too many lines.
My current hypothesis is that this situation combined with the bunch of lines you enqueue on taking a snapshot causes an overflow in the firmware, which makes it always hiccup on the last line in your sequence, the M114, causing those excessive amounts of resend requests here.
Now, that is only a theory, and it could all be shot to bits if there are also occurrences of this issue with non Repetier firmware, but at the very least it gives me something to take another look at int he code (wait handling when the job is on hold, which as far as I can see on first look is currently broken, but instead of blind fixing I'll first see if I can reproduce this scenario somehow with the virtual printer).
(Sorry for the monologue)
I have to correct my reasoning for the wait being generated there. After taking another look at Repetier it looks like it sends a wait if it's been over 1s since the last command was received if no commands are still in the command buffer (I mentioned the planner buffer, that was wrong). So the reason for the wait here is simply that the M400 gets immediately acknowledged and starts executing in the background, so OctoPrint sends the M114 which also gets immediately acknowledged but output for it only gets generated once the M400 finishes processing about 8s later. The last command received at that point was M114 8s ago, so a wait is generated.
What I can do here is make OctoPrint ignore wait if the job is on hold (so make it behave the same as if the printer is idle). That would probably already solve the issue your user is seeing there. However it would not be an actual fix of the general behaviour here, which is caused by the specific combination of a long running command (M400) + waiting for the output of another command sent after that (M114) on Repetier Firmware. That will always generate an ok while printing even though it shouldn't and I'm a bit unsure right now how best to tackle this in general. Somewhat unsatisfying, but something that'll have to wait considering that it's been in there for a while now and so far hasn't caused any kind of apparent trouble, only surfaced thanks to the specific circumstances of your plugin's operation.
In any case, RC3 it is after all. I'll see that I can get this out ASAP.
Ug, I was hoping you were going to tell me it's an obvious bug with my code. That way I could fix it without postponing your release :(
I'm slowly starting to get the hang of reading these logs. Your thorough explanations and detailed notes really help.
If there is any way I can make this a special case within my plugin (detect repetier firmware, use alternative function/commands) and take the burden off of you, please let me know. I'm planning to add a timeout/retry loop for the M114 (currently only times out, doesn't retry) to catch any other undiscovered odd cases. The main issue with this is determining what exactly the timeout value should be.
I've only seen this for Repetier so far, btw, so it seems specific (thankfully). Do you think this can happen with other long running commands too, other than M400?
Regarding the
huge bulk send that I'm very suspicious of
It seems very odd indeed that this behavior wasn't replicated during your testing. I'll print a few things and will check the logs from my Mk2 to see if it behaves differently (I believe it OKs after each send, but will double check).
I didn't realize how popular that printer was before this issue turned up. Wow, there are lots of those running around out there!
Thank you so much!
I was able to get something of a repro after I tuned my virtual printer a bit. I'm just running through the final update test for 1.3.7rc3 and then will push out the release that pretty much only consists of one actual commit. But hey, this definitely was a bug in OctoPrint, so it's good it was found now!
I thought a bit more about the actual reproduction scenario and I think usually this should actually never happen. The reason this arose is specifically setting the job on hold while printing on a Repetier firmware backed machine and then waiting for output triggered after a long running command. That is a very specific scenario that during normal printing shouldn't arise, and ignoring wait when the job is on hold should hopefully also fix it in your case.
I so hopefully because while I could reproduce the issue, I'm not 100% sure there isn't something else I might have overlooked now. We'll have to see.
You might also have been able to patch against this problem yourself btw by hooking into the receiving hook and rewriting any received waits while having the job on hold to empty lines. If push comes to shove and this patch doesn't solve this after all for you this is something you could still try.
Ahh good suggestion. I actually already added the receiving hook maybe two
days ago to assist in debugging this problem. I was thinking of getting
rid of M114 entirely and waiting for an OK from M400, but that's easier
said than done (though if you know a way, I'd be happy to try). I do
occasionally need the M114 (after home), but I'm less worried about an
issue that breaks a print at the START instead of towards the end :)
On Thu, Mar 29, 2018 at 9:29 AM, Gina Häußge notifications@github.com
wrote:
I was able to get something of a repro after I tuned my virtual printer a
bit. I'm just running through the final update test for 1.3.7rc3 and then
will push out the release that pretty much only consists of one actual
commit. But hey, this definitely was a bug in OctoPrint, so it's good it
was found now!I thought a bit more about the actual reproduction scenario and I think
usually this should actually never happen. The reason this arose is
specifically setting the job on hold while printing on a Repetier firmware
backed machine and then waiting for output triggered after a long running
command. That is a very specific scenario that during normal printing
shouldn't arise, and ignoring wait when the job is on hold should
hopefully also fix it in your case.I so hopefully because while I could reproduce the issue, I'm not 100%
sure there isn't something else I might have overlooked now. We'll have to
see.You might also have been able to patch against this problem yourself btw
by hooking into the receiving hook and rewriting any received waits while
having the job on hold to empty lines. If push comes to shove and this
patch doesn't solve this after all for you this is something you could
still try.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/foosel/OctoPrint/issues/2524#issuecomment-377253220,
or mute the thread
https://github.com/notifications/unsubscribe-auth/Af0UuLc_RmdIIy78a2DpqtiWJ6tRB5wgks5tjO-5gaJpZM4S9zCq
.
As we've seen with Repetier firmware here, waiting for the ok for the M400 wouldn't work since the ok is be generated on receive of the command, not when execution of the command is finished. That's a Repetier quirk (which actually even makes sense for flow control, but makes things like this a bit more complicated). So you'd pretty much immediately see the ok and take your snapshot at the wrong time.
edit just to clarify this since my sentence up could be misread - it would work with anything but Repetier firmware (or older Marlin forks which don't send ok in the same order as the commands where sent in, and there are a lot of those out there).
You have to differentiate between buffered and unbuffered commands. Marlin (and Sprinter and Smoothie and...) send an ok for buffered commands (G0 through G4, G28 and some others I believe) when they are popped from the command queue and pushed into the planner buffer (which is the source for the motion planner). Well, or - in the case of the aforementioned bug - when they are read from serial, which causes issues. For unbuffered commands they send an ok when the command has fully finished processing after being retrieved from the command queue.
Repetier on the other hand sends the ok right after the command is popped from the internal command queue, so when it makes space for a new one to be read from serial. Only then it starts processing it. That behaviour is AFAIK identical regardless of whether the command is buffered or unbuffered.
That means for Marlin etc you get something like this:
Send: M114
Recv: X: ....
Recv: ok
whereas for Repetier you get something like this:
Send: M114
Recv: ok
Recv: X: ...
I just installed Rc3 and am running a test print. I also notified the user who provided the logs and will report back ASAP regarding the efficacy of the fix.
So I've heard back now that 2 prints have completed without locking up on Repetier firmware, so if this isn't fixed it is GREATLY reduced in severity. I think we can consider it fixed and deal with any other issues as they arise. Thanks yet again!
That's great news!