PR "Encapsulate Stepper, Planner, Endstops in singleton classes" #3631 can potentially affect the ISR-runtime.

Blue-Marlin on 5 May 2016

@Sebastianv650
What host program do you use? What BAUDRATE do you use? Is the error rate significantly dropped when using a lower BAUDRATE?

If you are using RepetierHost in not ping-pong-mode, try ping-pong, or reducing 'Receive Cache Size' to about 75%.

If you have the suspicion your LIN_ADVANCE code takes much extra time like when the 'normal' stepperISR goes in dubblestep/quadstep-mode, try to put in:

      #ifndef USBCON
        customizedSerial.checkRx(); // Check for serial chars.
      #endif

AnHardt on 5 May 2016

I'm using pronterface with 250000 baud, changed it to 115200 with no sucess.
I'm not using speed ranges where double stepping would occur in the last days, I went up to 80% of single step mode.

Sebastianv650 on 5 May 2016

3631 can potentially affect the ISR-runtime

Not if the compiler is smart. Firstly, using myObject.functionCall() is not any more expensive than functionCall(). So that alone should not add any cycles. In "Embedded C++" we assiduously avoid virtual, ~~this->~~, objPointer->, etc. to ensure flatter compilation.

The compiler should also be smart about this:

ISR(TIMER1_COMPA_vect) { stepper.isr(); }

It should recognize that it should not do this…

ISR_HANDLER:
  JSR Stepper_isr
  RET

…but that it should do this…

ISR_HANDLER:
  JMP Stepper_isr

…and, frankly, it should recognize that:

ISR_HANDLER == Stepper_isr

Am I putting too much trust in the compiler?

thinkyhead on 5 May 2016

I just don't agree with this->method() being less efficient than method() when inside the class, the compiler will throw away the this reference and do a JMP func() in both cases, the advantage of using this-> is increased readability.

jbrazio on 6 May 2016

👍3

I'm using pronterface with 250000 baud, changed it to 115200 with no sucess.
I'm not using speed ranges where double stepping would occur in the last days, I went up to 80% of single step mode.

Using Promterface translates to me as using ping-pong, That makes overfilling the RX-buffer unlikely.
No improvement with 115200bd means doublet time between arriving chars. That makes missing to fetch a char from the register unlikely.
Errors on the USB between host and AT32U2 (used as the USB chip) is unlikely because of the USB block checksum.
Errors on the serial line between AT32U2 and AT2560 (hardware problem). That's unlikely to change with release versions. But more likely with increased frequency of high current lines near the serial lines.
RX-buffer corruption. Is unlikely with 8bit buffer indices and interrupt protection. Likelihood increases with the amount of buffer content - unlikely with ping-pong. 'Strange' pointer from somewhere? Wrong array indices in a near by array?

Hard to say what is going on.
Maybe @AnHardt's little input debug helpers can give a hint.
https://github.com/AnHardt/Marlin/pull/32

Blue-Marlin on 6 May 2016

I'm using pronterface with 250000 baud, changed it to 115200 with no sucess.

This for me somehow clears the ISR delay errors.. otherwise we would need to be really screwing it up to still affect 115.2bps.

Are we positive this is now hardware ? Could it be noise on the serial line ? Noise could increase with the higher stepping on the motors at higher speeds. Do you have a scope ?

jbrazio on 6 May 2016

Sorry, no scope available. I can't imagine it's hardware. I'm in the process of printing 70 identical parts, so all tests are done with the exact same gcode. RC4 runs "smooth" even with advance (maybe 4-5 errors per hour), RCBugFix on the other side is flooding the pronterface status window with error messages and 1-2 times per hour the printer is even doing wrong "print" moves to one edge of the print bed (only limited by software end limits I guess).

This for me somehow clears the ISR delay errors.. otherwise we would need to be really screwing it up to still affect 115.2bps.

I might be wrong, but do you think so? If we need to transfer 10 commands for a print, say each one takes 0.5s (I know, only for easy calculation) at 115200 baud and 0.25 at 250000 baud and the ISR is running at a fixed rate. Than it should be more likely that a ISR is "hitting" the transfer of the 0.5s long slow 115200 baud than it's hitting a only 0.25s long transfer?
For me as a hobby-programmer I would say higher rate should be better to not influence with stepper ISR.

I will build AnHardt#32 into my code and run some tests with it, but it will take some time to get results as my printer is blocked for the next 9h.

Sebastianv650 on 6 May 2016

Yes but lowering the baud rate you'd expect to see a difference in the number of errors, if no difference then one of two things could be true:

it's not the ISR
it's the ISR with a deep screw up

But the ISR changes were minimal.

So last good version was RC4 right ?

jbrazio on 6 May 2016

OK to be honest I can't say for sure if lowering the baud rate made it worse. In both cases, the errors come one after the other. If we want to know that precise, I have to do several runs with both baud rates and count an average value.

Yes I'm seeing errors also in RC4, but without advance maybe 1 in 10h print. Don't nail be down on the number, but it's a rate where I'm not worried about.

Sebastianv650 on 6 May 2016

RCBugFix on the other side is flooding the pronterface status window with error messages

@Sebastianv650 Are you talking about "host keepalive" messages like "busy:processing", or line number / checksum errors?

thinkyhead on 7 May 2016

All versions of error messages: Mostly checksum, some line number mismatch, some "no checksum with line number".

Sebastianv650 on 7 May 2016

I did @AnHardt debug changes, here is an example:

SENT: N803 G1 X145.808 Y146.311 E0.14351*107
RECV: ok N802 P0 B2 R128 "N802 G1 X147.472 Y147.976 E0.11133"
SENT: N804 G1 X144.915 Y146.058 E0.15620*108
RECV: ok N803 P0 B2 R128 "N803 G1 X145.808 Y146.311 E0.14351"
SENT: N805 G1 X147.099 Y148.241 E0.19841*98
RECV: ok N804 P0 B2 R128 "N804 G1 X144.915 Y146.058 E0.15620"
SENT: N806 G1 X146.764 Y148.544 E0.20460*98
RECV: ok N805 P0 B2 R128 "N805 G1 X147.099 Y148.241 E0.19841"
SENT: N807 G1 X143.891 Y145.671 E0.26015*107
RECV: Error:checksum mismatch, Last Line: 806 "N803 G1 X145.808 Y146.311 E0.14351"
[ERROR] Error:checksum mismatch, Last Line: 806 "N803 G1 X145.808 Y146.311 E0.14351"
RECV: Resend: 807
SENT: N807 G1 X143.891 Y145.671 E0.26015*107
RECV: ok N806 P0 B3 R128 "N806 G1 X146.764 Y148.544 E0.20460"
SENT: N808 G10*24
RECV: ok N806 P0 B1 R128 "N806 G1 X146.764 Y148.544 E0.20460"

Somebody has an idea? The line looks OK to me..

Sebastianv650 on 8 May 2016

The actual data exchange looks good, although we don't see all the checksums. But I am puzzled by the error message - is it 806 or 803 that has the error? And why does it then resync at 807? What really happened to 803 to 806... odd.

bobc on 8 May 2016

The line looks OK to me..

Looks bizarre to me. It's telling us that when it tries to interpret line "806" it is seeing the text of line "803."

Did I already ask which version of Arduino you are using to build with?

thinkyhead on 8 May 2016

I'm using 1.6.8.
I did a test, adding customizedSerial.checkRx(); also in the extruder ISR. This reduced the error rate significantly, now it's ~4 for a 6min print, without this second check its >> 12.

Sebastianv650 on 8 May 2016

Sorry.
https://github.com/AnHardt/Marlin/pull/32 is debug code that used to work before serial_line_buffer[] was introduced in get_serial_commands(). Now when the errors happen the relevant string is still in serial_line_buffer[] and not copied to command_queue[cmd_queue_index_w] yet.
The wrong line is printed.
Sorry.

AnHardt on 9 May 2016

@Sebastianv650 Well, that is interesting. I cannot (quite) imagine what would make the stepper isr _significantly_ slower between RC4 and RC6, but it sure seems to be the case. I'll do more research on optimizing C++ code and see if there's a way to keep the encapsulated Stepper::isr method but still maintain top performance.

thinkyhead on 9 May 2016

@AnHardt @Sebastianv650
If the gcode_line_error function is altered to take a string pointer, then you can pass serial_line_buffer to it from within get_serial_commands and print that string instead.

thinkyhead on 9 May 2016

Thanks for the hints, @AnHardt, @thinkyhead ! I did the modification and it's now printing the corrupted line, here one of the many ones I get:

SENT: N967 G1 X134.346 Y136.203 E2.96084*105
RECV: ok N967 P0 B3 R128 "N967 G1 X134.346 Y136.203 E2.96084"
SENT: N968 G1 X134.403 Y135.652 E2.96970*97
RECV: ok N968 P0 B3 R128 "N968 G1 X134.403 Y135.652 E2.96970"
SENT: N969 G1 X139.652 Y140.901 E3.08866*96
RECV: Error:checksum mismatch, Last Line: 968 "N969 G1 X19.652 Y140.901 E3.08866*96"
[ERROR] Error:checksum mismatch, Last Line: 968 "N969 G1 X19.652 Y140.901 E3.08866*96"

It's missing single chars in commands, in this example the "3" in the X coordinate. Now it makes also sense that my change yesterday (putting customizedSerial.checkRx() also in the stepper ISR) made it less worse: The "input" is checked more frequently and missing chars are not happening that often any more.

But I'm not smart enough to know the real answer: Why it is missing chars when the main stepper ISR takes more time?

Sebastianv650 on 9 May 2016

After thinking about what @AnHardt wrote:

Missing chars -> an ISR takes too long, or a CRITICAL_SECTION is not closed properly. The serials RX-interrupt is executed too late - the RX-registers content was replaced by the next char.

and looking how Marlin handles available chars, I would sum up as followed. Please correct me if I'm wrong:
.) At a baud rate of 250000, there is a new char every 40µs. If the char is not read, it gets replaced by the next one sent by the host.
.) There are two ways of storing a new char:
The interrupt triggered when a new char is recieved - but this one has nearly the lowest possible priority available on the Atmega so it may be "postponed" by timer0 interrupts used by the steppers, it get's even worse with the timer1 used by the advance stepper ISR.
That may be the reason why there is another checkpoint inside the stepper ISR, which tries to store a char at every loop. But if this loop takes more than 40µs, the char is gone - my error messages are triggered now.

So the answer why I get much more errors with enabled advance is clear: I'm doing a few more calculations inside stepper ISR. This may take the ISR close to the 40µs.
That there are even more errors with RC6 means that the stepper ISR is a tad slower than in RC4. Maybe not much, but 40µs are not much too..

To proof this, I set the baud rate to 115200 again. I did this already as I saw the errors when I started the advance coding, without success. But I optimized the code since theese day. With 115200 baud the Atmega has 2x the time to get a char from the buffer. With the latest advance code, this solved 100% of the error messages! :-) (Remember: I'm using the RC4 version. I have to see if that's also true for RC6.)

This leads two following options:
.) Live with "only" 115200 baud. Was sombody testing if this ever has an impact to printing? Maybe the impact due to the fast char impact at 250000 baud is worse than 115200 baud "slow" transfer rates?
.) Find some ways to improve the stepper ISR so it takes less cycles.
.) Test if it helps when the serial receive thing has it's own ISR, fired at a fixed rate depending on the selected baud rate. This way, no char should be lost. In worst case, this will brake down the rest of the code due to the high check rate, but in this case the current implementation shouldn't be better as than you get the missed chars..

Your opinion?

Sebastianv650 on 9 May 2016

Find some ways to improve the stepper ISR so it takes less cycles.

I'm considering that option, which basically means reverting all the Singletons back to (still better-encapsulated) flat C code with static linkage. But before I do that, I'm looking to exhaust every other option, for example finding out of the C++ compiler can be "smarter" and produce machine code that is just as efficient as the old code.

Honestly, I do not know how much the compiler output truly differs. But my expectation was that the compiler would be smart enough to see the Singletons as basically just being used as "namespaces" to encapsulate some functions. But linkage is something compilers are weirdly religious about sometimes.

I wish we had better profiling so we could see where all our cycles are being spent. That Marlin Simulator could probably help with this, since it would also be affected by the linkage.

That there are even more errors with RC6 means that the stepper ISR is a tad slower than in RC4.
Remember: I'm using the RC4 version. I have to see if that's also true for RC6.

If the performance issues and communication errors are manifest in RC4 then perhaps the current Singletons aren't the main culprit. Running an Advance ISR at 3-4 times the rate of the Stepper ISR could actually be more problematic, and perhaps mutliple-stepping in the Advance ISR is the better solution.

Again, proper _profiling_ would be most helpful!

thinkyhead on 9 May 2016

Something worth reading…

http://www.drdobbs.com/implementing-interrupt-service-routines/184401485

thinkyhead on 9 May 2016

It really points to starvation, that is the processor is running out of time and pretty much only working on the ISR. I`m not sure if what is being generated with c++ coding is slower than C, though I must say the probability of that is quite high.

It does depends on the compiler flags/settings. Note that arduino IDE uses a quite optimized version of these flags. It won`t get (much) better than that.

andrebstv on 9 May 2016

I love it, when a theory can be confirmed by a test.

250000 <-> 115200 baud. Receiving with 115200bd should be ok. Sending with 115200bd may be a problem. Marlin is busy waiting while writing. (https://github.com/MarlinFirmware/Marlin/blob/RC/Marlin/MarlinSerial.h#L122 https://github.com/MarlinFirmware/Marlin/blob/RC/Marlin/MarlinSerial.h#L152 ++)
Writing for 80 chars means about 3.2ms at 250000bd, more than 6ms at 115200bd. That could make a difference as soon we are not only sending 'ok's but lots of debug output. However the ISRs can interrup the busy waiting.

serial receive thing has it's own ISR

It already has. But no (software) interrupt can interrupt an other interrupt. The interrupt priority only schedules, who is the next, if more than one interrupt request is in the register, after the current interrupt has finished.

AnHardt on 9 May 2016

I love it, when a theory can be confirmed by a test.

Thank you testers, feedbackers! Without you we who know code well, but not the platform or the compiler, often feel a little lost in the dark. But through this kind of experience, much is learned. Deep gratitude to all of you!

thinkyhead on 10 May 2016

Or you could put transmition on a TxInterrupt routine....
On May 10, 2016 12:16 PM, "Roxy-3DPrintBoard" [email protected]
wrote:

Sending with 115200bd may be a problem. Marlin is busy waiting while
writing.

This is kind of a brute force way to work around it... But this would work
for either 115,000 or 250000 baud rate. And there isn't much penalty
because we are busy waiting anyway. As long as we are spinning waiting to
see if we can transmit the character, we probably should be checking to see
if a character came in the receive side of the UART. We would just add a
little bit of jitter to when we return:
FORCE_INLINE void write(uint8_t c) {
  while (!TEST(M_UCSRxA, M_UDREx))
    checkRx();
  M_UDRx = c;
}
—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/MarlinFirmware/Marlin/issues/3680#issuecomment-218190029

andrebstv on 10 May 2016

Or you could put transmition a Tx-Interrupt routine....

Yes. And if that happened, it might make sense to check the Rx status while in the Interrupt handler. Adding a few lines of code to it:

      if (TEST(M_UCSRxA, M_RXCx)) 
          checkRx();

Roxy-3D on 10 May 2016

I've always printed at 250kbps, I can confirm that with RC6 I'm also seeing random checksum errors.

Resend: 9399
Error:checksum mismatch, Last Line: 11315
[ERROR] Error:checksum mismatch, Last Line: 11315

Resend: 11316
Error:checksum mismatch, Last Line: 12547
[ERROR] Error:checksum mismatch, Last Line: 12547

Resend: 12548
Error:checksum mismatch, Last Line: 12580
[ERROR] Error:checksum mismatch, Last Line: 12580

Resend: 12581
Error:checksum mismatch, Last Line: 13270
[ERROR] Error:checksum mismatch, Last Line: 13270

Resend: 13271
Error:checksum mismatch, Last Line: 14503
[ERROR] Error:checksum mismatch, Last Line: 14503

What is the current status of this thread ?

jbrazio on 14 May 2016

I've always printed at 250kbps, I can confirm that with RC6 I'm also seeing random checksum errors.

I think AnHardt already did this, but it would be good to print out the line that has a checksum error. Most likely, some characters got lost. If that is the case, we would at least know that we need to do a better job keeping up with the incoming character stream.

Roxy-3D on 14 May 2016

OK I've been doing some testing..

Prointerface: Random checksum errors during a 5m print but prints successfully
Repetier Host (Pin-ping option enabled): Random checksum errors during a 5m print but prints successfully
Repetier Host (Pin-ping option disabled): Printing stutters, unable to finish printing due to massive checksum errors

All tests have been done with DISABLE_HOST_KEEPALIVE, RC6 without any modifications.

jbrazio on 14 May 2016

@jbrazio
Could you please try

Repetier Host (Pin-ping option disabled) Buffersize decreased to 50% (64)

Blue-Marlin on 16 May 2016

I bet you can also see missing chars with AnHardts mods..
By the way, I did some time measurements that proofes my assumption about the runtime of the main stepper ISR:

ISR runtime with enabled Advance, but K=0: 81µs
ISR runtime with enabled Advance with K<>0: 95µs
So it's clear it can't take 250000 baud with the current way of handling incomming chars..

But a calculation of a new segment (add it to the buffer and recalc all the existing segments) takes 3.15ms, so I would say it different: Running the serial connection at 250000 baud is useless (except for sending things) as the FW can't calculate all the new commands in time:
If you asume 30chars per command, you can send about 384 commands per second at 115200 baud. But with a planner needing 3.15ms per new line, you can handle only up to 317 lines (commands) per second. If we take into account that the FW has more things to do than only recalculating incomming line segments, figures get even worse..

I think there is a lot of potential for improvements in the planner calculation speed, I will have a look at them in the next days.

Sebastianv650 on 16 May 2016

Ahh. Numbers. :-)
How deep is your planner buffer?

Blue-Marlin on 16 May 2016

Could you please try
Repetier Host (Pin-ping option disabled) Buffersize decreased to 50% (64)

I'm currently SD printing, this evening I'll try and report back to you.

If you asume 30chars per command, you can send about 384 commands per second at 115200 baud. But with a planner needing 3.15ms per new line, you can handle only up to 317 lines (commands) per second

This maybe valid if you assume a sync state, that's why we have buffers.

jbrazio on 16 May 2016

I have not changed the buffers size, so it should be whatever it is inside the default RCBugFix code.
@jbrazio , thats true, it's a worst case calculation. But it's true even with buffer: If the buffer is full, you need no baud rate at all (no space to calculate new lines). If it is empty, more than 317 lines can't be handled / s, so 115200 is the max. usable baud rate. If the code could get all chars at 250000 baud, of course I would see no disadvantage of using 250000!

Sebastianv650 on 16 May 2016

The ultimate fix (or at least improvement) for comm errors is going to be re-writing Stepper (and Planner, Temperature, Endstops) to use static data and methods, because currently all these non-static method calls have the additional parameter this, and elements of the stepper ISR are slowed just enough to make it run too long. Any other clever optimizations we can think of will also help!

3730 is the topic for that bit of backtracking.

So far I am getting a lot of compiler complaints (link errors) as I make things static throughout Stepper, so looking for helpful guidance in perfecting this. I can post a link to my branch if anyone would like to offer code tips as comments on the commits.

thinkyhead on 17 May 2016

On the AVR gcc behaves like a retard or I'm not understanding why "this->" is a hogger when compared with a static var.

Static keyword hints the compiler that a variable is a class variable thus shared among all instances of a class, in the other hand a "standard" class var is only accessible by the instance itself (or other instances of the same class but that's another story).

"this" is nothing more than a pointer to the class instance.

AVR-gcc should optimize and do a direct address "jump" either is accessing a static var or a var being called like "this->foo".

jbrazio on 17 May 2016

If you want to investigate the results in depth - https://github.com/MarlinFirmware/Marlin/issues/3212

Blue-Marlin on 17 May 2016

👍1

I do believe static vars will be optimized away in interrupts. Unless they
are declared as volatile
On May 16, 2016 8:03 PM, "Makaira" [email protected] wrote:

If you want to investigate the results in depth - #3212
https://github.com/MarlinFirmware/Marlin/issues/3212

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/MarlinFirmware/Marlin/issues/3680#issuecomment-219574226

andrebstv on 17 May 2016

@jbrazio The C++ compiler has no clue that when calling stepper.my_instance_method() that there is only ever going to be a single instance of that class. So, there is always a hidden this variable passed into the method, on the stack, as the first argument. This is just the normal calling convention for _instance_ methods. To get rid of the this hidden argument, you use static, and the method is now a _class_ method. And of course you have no this, but you can synthesize a "me" variable, for example, (a fairly common practice with singletons) to address the instance.

Sorry if this is all just a C++ review.

When it comes to the ISR, the ISR itself is (normally) a global static C-style function. For the Stepper ISR it currently calls stepper.isr(). This passes &stepper on the stack to the Stepper::isr() method, which receives it as the "this" variable. (Stepper::isr is not a FORCE_INLINE, but I suppose it could be, by moving the method body into the stepper.h header. That will likely save the one call to stepper.isr() while also not passing stepper as this on the stack, but it doesn't do anything for calls to the other singletons, all of whose _instance methods_ are taking that extra this parameter.

To get leaner C-function-style calling conventions and save over the instance method calls, the only option is to declare class data and methods as static. They can still be volatile or inline or even FORCE_INLINE with the static keyword.

Portions of a class that aren't performance-sensitive can remain instance data and methods, if that makes sense, but most of the class needs to be "flattened" into a single static "class instance" to get maximum performance via this compiler.

I have tried the -fwhole-program flag too, which can do extra magic static optimizations, but I have not been able to successfully compile with it. I was curious to see if it would affect performance, if it automated some of this stuff. But it looks like I have to do more "by hand" and sort out the compile errors.

thinkyhead on 17 May 2016

So I followed the ASM path and learned new things.

The test subject.

class aClass {
  public:
    char i;
    static char j;

  void foo() {
    this->i = 5;
  }

  static void bar() {
    aClass::j = 10;
  }
};

int main (void)
{
  aClass a;
  a.foo();
  a.bar();
}

Disassembly code for a.foo();.

void aClass::foo() { ; 34 clk, 2.125 uS
 132: cf 93         push  r28              ; 2 clk
 134: df 93         push  r29              ; 2 clk
 136: 1f 92         push  r1               ; 2 clk
 138: 1f 92         push  r1               ; 2 clk
 13a: cd b7         in  r28, 0x3d ; 61     ; 1 clk
 13c: de b7         in  r29, 0x3e ; 62     ; 1 clk
 13e: 9a 83         std Y+2, r25  ; 0x02   ; 2 clk
 140: 89 83         std Y+1, r24  ; 0x01   ; 2 clk
  this->i = 5;
 142: 89 81         ldd r24, Y+1  ; 0x01   ; 2 clk
 144: 9a 81         ldd r25, Y+2  ; 0x02   ; 2 clk
 146: 25 e0         ldi r18, 0x05 ; 5      ; 1 clk
 148: fc 01         movw  r30, r24         ; 1 clk
 14a: 20 83         st  Z, r18             ; 2 clk
}
 14c: 0f 90         pop r0                 ; 2 clk
 14e: 0f 90         pop r0                 ; 2 clk
 150: df 91         pop r29                ; 2 clk
 152: cf 91         pop r28                ; 2 clk
 154: 08 95         ret                    ; 4 clk

Disassembly code for a.bar();.

void aClass::bar() { ; 17 clk, 1.062 uS
 156: cf 93         push  r28             ; 2 clk
 158: df 93         push  r29             ; 2 clk
 15a: cd b7         in  r28, 0x3d ; 61    ; 1 clk
 15c: de b7         in  r29, 0x3e ; 62    ; 1 clk
  aClass::j = 10;
 15e: 8a e0         ldi r24, 0x0A ; 10    ; 1 clk
 160: 80 93 00 02   sts 0x0200, r24       ; 2 clk
}
 164: df 91         pop r29               ; 2 clk
 166: cf 91         pop r28               ; 2 clk
 168: 08 95         ret                   ; 4 clk

Conclusion: to update a class variable takes 2.125 uS (34 clock cycles), to update a static class variable takes 1.062 uS (17 clock cycles) which is exactly half the time.

The test subject.

class aClass {
  public:
    char i;

  void foo() {
    this->i = 5;
  }

  void baz() {
    i = 15;
  }
};

int main (void)
{
  aClass a;
  a.foo();
  a.baz();
}

Disassembly code for a.foo();.

void aClass::foo() {
 12c:   cf 93           push    r28
 12e:   df 93           push    r29
 130:   1f 92           push    r1
 132:   1f 92           push    r1
 134:   cd b7           in  r28, 0x3d   ; 61
 136:   de b7           in  r29, 0x3e   ; 62
 138:   9a 83           std Y+2, r25    ; 0x02
 13a:   89 83           std Y+1, r24    ; 0x01
  this->i = 5;
 13c:   89 81           ldd r24, Y+1    ; 0x01
 13e:   9a 81           ldd r25, Y+2    ; 0x02
 140:   25 e0           ldi r18, 0x05   ; 5
 142:   fc 01           movw    r30, r24
 144:   20 83           st  Z, r18
}
 146:   0f 90           pop r0
 148:   0f 90           pop r0
 14a:   df 91           pop r29
 14c:   cf 91           pop r28
 14e:   08 95           ret

Disassembly code for a.baz();.

void aClass::baz() {
 150:   cf 93           push    r28
 152:   df 93           push    r29
 154:   1f 92           push    r1
 156:   1f 92           push    r1
 158:   cd b7           in  r28, 0x3d   ; 61
 15a:   de b7           in  r29, 0x3e   ; 62
 15c:   9a 83           std Y+2, r25    ; 0x02
 15e:   89 83           std Y+1, r24    ; 0x01
  i = 15;
 160:   89 81           ldd r24, Y+1    ; 0x01
 162:   9a 81           ldd r25, Y+2    ; 0x02
 164:   2f e0           ldi r18, 0x0F   ; 15
 166:   fc 01           movw    r30, r24
 168:   20 83           st  Z, r18
}
 16a:   0f 90           pop r0
 16c:   0f 90           pop r0
 16e:   df 91           pop r29
 170:   cf 91           pop r28
 172:   08 95           ret

Conclusion: using this to increase the overall class readeability is irrelevant from a performance perspective because the compiler create exactly the same ASM code.

jbrazio on 19 May 2016

❤1

Yep, just as expected. Using this-> _inside of instance methods_ is perfectly harmless, but we take a hit by calling _instance methods_ from anywhere else. So, static methods are going to be our friend, especially where we're currently calling _instance methods_ from within ISRs.

thinkyhead on 19 May 2016

Could you please tell how you managed to disassemble with the C++
references?
Good job btw

On Wed, May 18, 2016 at 10:17 PM, Scott Lahteine [email protected]
wrote:

Yep, just as expected. Using this-> inside of _class methods_ is
perfectly harmless, but we take a hit by calling _instance methods_ from
anywhere else. So, static methods are going to be our friend, especially
where we're currently calling _instance methods_ from within ISRs.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/MarlinFirmware/Marlin/issues/3680#issuecomment-220203062

andrebstv on 19 May 2016

Where would you use this without being inside the instance ?..

You know to use static class variables is basically the same thing as using global vars ? (Rhetoric question I know you know)

To put things into perspective if inside a ISR we update four class non static vars we are still inside the resolution margin of the micros() function on a 16Mhz board.

So.. there are some places and some very special objects where static var class maybe make sense so we can squeeze one micro seconds.

The dump was done with avr-objdump -F -S, when the binary is unstripped and you do it inside the source code folder it seems to inline the thing.

jbrazio on 19 May 2016

Where would you use this without being inside the instance?

I never intended to bring up "this" as a topic of concern. It's fine.

some places and some very special objects where static var

Improving performance by replacing instance methods with static class methods is the main task at hand. Vars marked as static are also going to help, for similar reasons. No need for dereferencing.

I just want to make sure that especially the stepper ISR is as fast as it can possibly be. But still keep the class structure, namespace, and access privileges.

thinkyhead on 19 May 2016

@jbrazio

You know to use static class variables is basically the same thing as using global vars ? (Rhetoric question I know you know)

In ASM yes, but the main advantage over global variables is encapsulation and the namespace context! So the class structure is indeed the right way...

CONSULitAS on 19 May 2016

👍1

@jbrazio
BTW: Great analysis. So now we know what we speak about!

CONSULitAS on 19 May 2016

👍2

Improving performance by replacing instance methods with static class methods is the main task at hand.

Let's measure the impact then.. I will reuse the previous assembly code because we can perfectly see the callings, by starting with the biggest asm block which is the code for a.foo();, the _non-static method_ we do the diff to the a.bar(); asm block, the _static method_, the ending result is the total overhead.

void aClass::foo() {
 (...)
 136: 1f 92         push  r1               ; 2 clk
 138: 1f 92         push  r1               ; 2 clk
 (...)
 13e: 9a 83         std Y+2, r25  ; 0x02   ; 2 clk
 140: 89 83         std Y+1, r24  ; 0x01   ; 2 clk
 [ignoring here the specific var type IO]
}
 (...)

This leaves us with an overhead of 8 clock cycles meaning 0.5 uS (yes.. half a micro second).

jbrazio on 19 May 2016

Some empirical data..

It's infeasible to try to measure the parts of the code we have been discussing [method calls/overhead], so what I have done is to measure the execution time of a whole method for a static and a non-static. This will not give us real numbers but at least will allows use to see the difference between one call type and another in a comparison. It's also important to note that with Arduino the minimum time difference between two micro() is _4 microseconds_, even if less time has passed between them, trying to even this out.. I call each method 10000 so I can calculate an average.

The text subject.

class aClass {
  public:

  void foo() {
    return;
  }

  static void bar() {
    return;
  }
};

void setup() {
  Serial.begin(115200);

  aClass a;
  uint32_t s, e, j;

  const int32_t cycles = 10000;

  for (uint8_t i = 0; i < 5; i++) {
    s = micros();

    for (j = 0; j < cycles; j++) {
      a.foo();
    }

    e = micros();
    Serial.print("a.foo(): ");
    Serial.print((float) (e-s) /j);
    Serial.println("uS");

    s = micros();

    for (j = 0; j < cycles; j++) {
      a.bar();
    }

    e = micros();
    Serial.print("a.bar(): ");
    Serial.print((float) (e-s) /j);
    Serial.println("uS");

    Serial.println();
  }
}

void loop() {}

Which outputs:

Welcome to minicom 2.7

OPTIONS: I18n 
Compiled on Feb  7 2016, 13:37:27.
Port /dev/ttyACM0, 17:36:00

Press CTRL-A Z for help on special keys

a.foo(): 4.42uS
a.bar(): 3.48uS

a.foo(): 4.43uS
a.bar(): 3.48uS

a.foo(): 4.43uS
a.bar(): 3.48uS

a.foo(): 4.43uS
a.bar(): 3.48uS

a.foo(): 4.43uS
a.bar(): 3.48uS

Conclusion: the empirical difference per method call is ~0.95 uS which leads me to conclude that my previous analytical assumption of 8 clock cycles by "diffing" is not 100% accurate.

jbrazio on 19 May 2016

I had a similar checksum error @ 60mm/sec and above. @ 50mm/sec is was almost gone. @ 40mm/sec it was under control. changing the baud rate made no diff. It went away, but is back in 8aa594c to a lesser extent. 60 is almost good now.

ruggb on 22 May 2016

I want to put my 2 cents in, with what I had suspected, confirmed by someone else with regards to the random over temp problem I had been seeing with using the MAX31855 and MAX6675 thermocouples.

https://github.com/Smoothieware/Smoothieware/issues/894

Problem could quite possibly be down to the high EMI from the stepper motors impacting the SPI lines.

Which would be nearly impossible to fix in firmware, as it is a hardware grounding/shielding issue.

Grogyan on 24 Jun 2016

@Grogyan this problem here is definitely not EMI, it's quite easy to reproduce it with different codebases and enabling/disabling LIN_ADVANCE. The CPU is simply overloaded and can't handle the incoming chars just in time depending on config and Marlin version.

Sebastianv650 on 24 Jun 2016

@Sebastianv650 I wonder if inserting the following into the advance_isr steps loop would make serial connection more reliable… or if it would only slow other things…

#ifndef USBCON
  customizedSerial.checkRx(); // Check for serial chars.
#endif

thinkyhead on 25 Jun 2016

my thought is still that the loops are too big and the software cannot service the serial stream fast enough. It isn't a matter of reliable serial communications.
When I run the printer at 40mm/sec or less I get no errors, When it runs above that, the faster it runs the more errors I get, especially doing honeycomb infill. The errors indicate it is skipping steps and therefore returning the wrong response. It needs to be tested with a command intensive stream at whatever max speed you want to specify Marlin capable of. WHICH BTW I don't see specified. S/W isn't all about features and I don't think it is the hardware that is limiting the speed - thought it could have something to do with Arduino. But I believe past releases worked just fine. It has been so long ago that they did, I can no longer remember for sure. Probably the current released build of 1.0.2

ruggb on 25 Jun 2016

the loops are too big and the software cannot service the serial stream fast enough

@ruggb Yes, we already have consensus on that. While I've been focusing on cleaning up other areas of the code, I am definitely open to any ideas to further optimize the stepper ISR. Please have a look at the code and contribute your 2¢ if you see any opportunities to reduce load.

I'd also be interested in any ideas on how we can collect timing information and get some idea just how much load there is, comparing different optimization flags.

When I run the printer at 40mm/sec or less I get no errors

Is that Cartesian performance?

whatever max speed you want to specify Marlin capable of. WHICH BTW I don't see specified

We need data collected on a variety of hardware, then we can easily specify that stuff.

S/W isn't all about features

When integrating new features, firstly they are disabled by default. Second, they must be non-disruptive. In other words, we make sure, wherever possible, not to rewrite the more general code just to accommodate them. So although there are some new features in the queue which affect how steppers are commanded, and we've wrapped the stepper functions in a class, the low-level code for stepping has been left very much the same. Here and there we have merged patches (such as an acceleration fix from Printrbot) that add a little more overhead to the Planner. But we want the low-level stepper functions to continue to be as performant as possible.

I hear that Grbl has been getting a lot of changes. Perhaps we can borrow some ideas from that project, if they have found good keys for tuning performance, and less intensive code to handle stepping.

past releases worked just fine

And the 1.1.0 release will be great too, I am sure. We haven't overloaded the stepper routines with "features" — but we do have more robust code in many places, and in some instances it may be slower.

thinkyhead on 25 Jun 2016

Is that Cartesian performance?

We had better be able able to do Cartesian at 40mm/sec!

Roxy-3D on 25 Jun 2016

@thinkyhead I was playing with additional checkRX position when this thread was started, with no real success.
But as I wrote before (https://github.com/MarlinFirmware/Marlin/issues/3680#issuecomment-219460929), I don't think it's important that Marlin can handle 250000 baud as long as the other calculations for a new line segment and stepper ISRs are not capable of handling that much imput. That's why I stoped experimenting with it and I'm running at 115200 baud now.

Up to now, I wasn't able to find any real improvement potential in the Planner::buffer_line, which is the heaviest command I think.

Sebastianv650 on 25 Jun 2016

which is the heaviest command I think

Functions outside of the stepper ISR are really non-critical and unrelated to movement speed. I'm sure we're getting enough "line segments per second" in the planner, especially on Cartesian machines, where lines are usually left their full length. The main issue is the the amount of time that is left over by the ISR functions, so that they don't collide with themselves, eat up all the main loop time, or get in each other's way.

We haven't exactly "ballooned" the stepper ISR, but I'm sure it's eating more CPU cycles than the stepper ISR in Marlin 1.0.2-1. So we really need to find as many strategies as we can to reduce the load here, and if possible make the stepper ISR even faster than the one in 1.0.x.

_"To make it go faster, make it do less."_

So I've been looking at the latest Grbl code… to see where they've been optimizing since Marlin first split off from it. But it's complicated! It might be simpler to optimize what we have, or just take each of their ideas bit by bit.

thinkyhead on 26 Jun 2016

@thinkyhead have a look at github/grbl the base for marlin which just released a new version. 0.9j

(x4)+ Faster Planner: Planning computations improved four-fold or more by optimizing >end-to-end operations, which included streamlining the computations and introducing >a planner pointer to locate un-improvable portions of the buffer and not waste cycles >recomputing them.

This code runs on an arduino r3 uno @1153200 baudrate which i use to run my chinese k40 laser cutter. I have dived into the code yet since i have been busy with my home projects inclusive of converting the k40. However when i have time i will compare and see what trick they use to achieve such huge improvements.

Sent from my iPad

On 26 Jun 2016, at 9:14 AM, Scott Lahteine [email protected] wrote:

which is the heaviest command I think

Functions outside of the stepper ISR are really non-critical and unrelated to movement speed. I'm sure we're getting enough "line segments per second" in the planner, especially on Cartesian machines, where lines are usually left their full length. The main issue is the the amount of time that is left over by the ISR functions, so that they don't collide with themselves, eat up all the main loop time, or get in each other's way.

We haven't exactly "ballooned" the stepper ISR, but I'm sure it's eating more CPU cycles than the stepper ISR in Marlin 1.0.2-1. So we really need to find as many strategies as we can to reduce the load here, and if possible make the stepper ISR even faster than the one in 1.0.2.

"To make it go faster, make it do less."

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

paulusjacobus on 26 Jun 2016

@thinkyhead My machine is a CoreXY homebrew with Arduino Mega 2560 and Ramps 1.4 - I have no idea of what it is capable of on its own. But is seems like it will run 120-150 OK maybe even 200.

Unfortunately I don't have enough coding knowledge to be helpful here.

MAYBE the wrong issue is being addressed re missed steps & errors????
I was under the impression that Marlin issued a pause when it couldn't accept anymore commands. If it did that it should stop the comm errors and missing steps since Repetier wouldn't be sending commands that are missed and waiting for the wrong response. If the command processing is slow and/or the hardware is slow that just limits the speed and doesn't mess up the model because of missed steps and produce error messages. If Marlin does not issue a timely pause, then steps are missed and errors produced. Maybe a slower max BAUD rate would help give Marlin enough time to respond - though not an ideal solution.

RE lots of data --
If the speed of Marlin is specified relative to a "STANDARD" board (whatever that may be), and you need not worry about getting data from lots of hardware. If my board or machine is slower, then I know where the bottleneck is. If it is faster, then I know it can handle anything Marlin can give it.

If a feature affects the print command processing time then you need to specify what the speed limitation is. IE., using this feature limits print speed to 50mm/sec.

I think you should first determine why the interface allows steps to be missed.

ruggb on 26 Jun 2016

@ruggb, its not that easy. Marlin can stop the transfer if the buffer is full, but that's not the problem. The errors happens within one transfer when the buffer isn't full. Lets take G1 Y20 F2000 as an example. Marlin says OK, one more command! Then the host will start to send G1 Y20 F2000, each char with 40µs time steps (=250000 baud) - Marlin can't stop this stream now. The serial interrupt has a very low priority, so if there are other interrupts that are triggered they will be called first. If it takes slightly more than 40µs until Marlin finds some time to listen to the serial port again, a char is lost. Now the string reads "G1 Y2 F2000" for example.
The only chance to solve this is to keep all ISR as small and fast as possible.

All boards that run Marlin should run at nearly exact the same speed, because all use the same ATMEGA family with 16MHz. Only things like the used display type and enabled options should change the execution speed significantly.

It's not possible to give a speed impact on a function I think. Some functions will interact themselves, so feature A may reduce the power to 99%, B might give 98%, but both may not give 99*98%.. Even if you want to give a number, it will not be mm/sec. The real number can only be a frequency combined with a constraint, like:

max. movement speed without new sended commands: 30.000 steps/second.
max. new commands per second, given a 30 char command length and no active stepper ISR: 200/s

Edit: But I think we can say that an CoreXY system will impact the max printing (command processing) speed due to the not small amount of needed extra calculations.

Sebastianv650 on 26 Jun 2016

👍1

CoreXY system will impact the max printing (command processing) speed

This is true. There are some extra calculations for COREXY at the stepper level. Also, COREXY machines exercise the stepper motors a lot more than Cartesian machines. The great thing about CoreXY is that you can use lower-torque steppers in many cases because CoreXY moves the steppers both faster and in combination. This does result in a lot more step pulses per second, and thus it does have an impact on the stepper ISR and thus additionally limits your max speed. You might even be able to get 50% - 100% faster printing on a Cartesian versus a CoreXY on the same CPU.

Deltas, by contrast, can actually reduce the amount of stepping required to do the same length moves, which slightly makes up for their extra processing requirements.

thinkyhead on 26 Jun 2016

@Sebastianv650 - thx, great explanation. So if I reduce my baud rate when I increase my SPS over 60mm/s that actually may help to reduce the missed steps. It may slow it down in some areas but how fast does the baud rate need to be, not considering the ISR, to support say 100mm/sec?

ruggb on 27 Jun 2016

You can print at 100mm/s with any baud rate. As soon as the command is transfered (even if that would take minutes), the printer will execute it at the speed given by the F parameter. The question is how many points you have to execute per second and that depends on your stl resolution, your slicer settings and your print speed. If you try to travel over 100 points due to a circle within 1s, you have to be sure you can transfer 100 commands per second over the serial connection. Than again it depends on your average command length, because the baud rate gives you "chars per second" and your command can have a varying amount of chars..

So you have two options if you want to know it precise: Do test prints and find your best values, or try to do an asumption based on excel calculations. The choice is yours, there is no easy answer ;-)

Sebastianv650 on 27 Jun 2016

Is this one fixed? have it been tested with RC7?

boelle on 11 Aug 2016

Hi everyone, i installed the RC7 and I think i'm hitting this issue on a PrintrBoard Rev F. I set it at 115k bauds, checksum (on Octoprint), and still, even a simple move of the bed (not during a print) leads to what sounds like a skipped step, but in reality it's something in the firmware. Any idea on what else I could do to debug this?

lucacri on 31 Aug 2016

@lucacri For looking into this, let's start with RCBugFix. There've been a lot of patches since RC7, so this will be a better place to begin. Give it a try and see how it behaves. Then we can look further.

If you find that the issue still exists, please post your Configuration.h and Configuration_adv.h files. You can just make a ZIP of them and drop them on your reply.

thinkyhead on 31 Aug 2016

Hi everyone, i installed the RC7 and I think i'm hitting this issue on a PrintrBoard Rev F. I set it at 115k bauds, checksum (on Octoprint), and still, even a simple move of the bed (not during a print) leads to what sounds like a skipped step, but in reality it's something in the firmware. Any idea on what else I could do to debug this?

@lucacri May I ask that you send a message to the PrintRboard people and tell them their lack of interest in letting the latest Arduino work with their board is causing causing customer dissatisfaction? You might want to point them at this message and ask them to respond to it. They are telling people to use Arduino V .22 and they think that qualifies as 'Arduino Support'.

The Marlin Team tries hard to support any hardware that wants to run Marlin. But a hardware vendor that refuses to update its compiler support more often than every 3 years makes it very difficult to take serious. You might suggest that the PrintRboard people contact us and ask what we need to support their customers better. We would be happy to have a dialog with them. But the sad truth is, they don't even want to spend a few days to get the latest version of Arduino running on their board. (And trust me... I know... I have a PrinteRboard that I'm going to replace with a RAMPS setup because that works with the latest Arduino!!!!)

Right now my advice to anybody that is willing to listen is: DON'T BUY A PrintRboard !!!! You will have problems because the company doesn't really care about supporting it. They only care about cashing your check.

Roxy-3D on 31 Aug 2016

@thinkyhead Thanks for helping! I uploaded my configuration.h and configuration_adv.h here.

I wasn't aware of the RCBugFix branch, so I just recompiled the firmware and uploaded it to the Play. I'm printing a Benchy with the same settings as before, hopefully I don't get "zits" on the straight lines. I gotta say tho, if this issue is fixed, I will have the best looking prints I have ever had, thanks to LIN_ADVANCE. Outstanding job, guys!

@Roxy-3D wow, i didn't know the situation was so complex! I thought that Printrbot was forking Marlin just so they could point users there to do an easier compilation.. Am I wrong in even trying vanilla Marlin on it?

lucacri on 31 Aug 2016

LIN_ADVANCE is a pet favorite of mine too. @Sebastianv650 did a bang-up job getting it together and testing. I just submitted a PR today to help optimize it a little bit, especially for mixing extruders.

thinkyhead on 31 Aug 2016

wow, i didn't know the situation was so complex! I thought that Printrbot was forking Marlin just so they could point users there to do an easier compilation.. Am I wrong in even trying vanilla Marlin on it?

@lucacri I think it is actually even more complex than what was just disclosed. Part of the problem is the Arduino group split into two competing halves, and both are selling hardware boards. They provide support for many different boards that they sell, but the vendors that sell competing boards are not directly supported. And as the compiler's IDE changes, those board support files need to be updated.

I think the PrinteRboard people got their board file to work 3 years ago with version .22 and never looked back. They just say to use Arduino v.22 but a lot of compiler bugs have been fixed since then. If you relax the 'Sanity Check' that stops Marlin from compiling with v.22 you will probably run into syntax issues that have to also be warmed over. I ran into some syntax issues with the pre-processor just changing from v. 1.6.8 to v. 1.6.9

Roxy-3D on 31 Aug 2016

@thinkyhead Ok I made two prints: one with RC7 and one with RCBugFix. Here are the results

As you can see, not much difference. I get a ton of "zits", and those are not on layer changes (the layer change is on the same spot on the back of the benchy, to remove a variable), but on the straight lines.

Any clue what it could be?

@Roxy-3D I know of the Arduino-war, but in this case I managed to compile the firmware on 1.6.0 without issues. Am I just very lucky??

lucacri on 31 Aug 2016

I know of the Arduino-war, but in this case I managed to compile the firmware on 1.6.0 without issues. Am I just very lucky??

My guess is you are using an external programmer. Because without the proper board files, Arduino can't flash the AVR chip. If it is working for you, that is great!

Roxy-3D on 31 Aug 2016

I'm using Arduino 1.6.9, Teensyduino 1.29, and the PrintrBot Firmware Update to send the file (you drag and drop the .hex on the icon and it will send it). Is that similar to your environment?

lucacri on 31 Aug 2016

@thinkyhead I tried your PR #4738 but I still get the same identical results. I even tried a different color and brand of PLA, and Benchy came out bumpy as before. Any other suggestion on what I could do? I'm a developer too, so if i need to dive down in the code, I will!

But seriously, this LIN_ADVANCE is a paradigm shift in 3D printing. If we nail it down, it's basically plug-and-play

lucacri on 1 Sep 2016

@lucacri I didn't expect much difference with #4738. It's more just a cleanup. Did you test with some different rolls of filament? Sometimes bumps are an indication of water or other impurities in the filament causing sudden jumps in pressure. Also, could you post a photo or two of your print results so we can better diagnose? Blobs on sharp corners or at the start/end of travel moves are most likely to be attributable to advance settings.

thinkyhead on 1 Sep 2016

I tried with two different rolls (red is the latest print). Both filaments worked perfectly with the previous firmware (printrbot version of May-ish).

You can see the results here: https://imgur.com/a/d1rjE

I removed the variable of the layer change/end/start of loop by placing it on the front of the ship. So any bump you see it's just from the nozzle moving, then jerking for a split second, and restarts. It happens even when I send a G0/G1 command, moving from one side of the bed to the other (X or Y axis, doesn't matter)

I don't know much of the code, but i'm assuming the issue is that the cpu is just not keeping up with the calculation, and it makes the motor skip a beat.

There are a ton of variables in the configure.h. Is there one that kind of makes the calculation easier on the cpu? Dunno, something like MIN_STEPS_PER_SEGMENT or SERVO_DELAY? Really, i'm spitballing it here haha

lucacri on 2 Sep 2016

Wow, that is very lumpy alright.

thinkyhead on 2 Sep 2016

when I had issues like that I found that accel & jerk settings along with temp solved the problem.
Though it was not with the RC7 f/w.

ruggb on 12 Sep 2016

@ruggb I lowered them both (to a crawling pace) but no change :(

lucacri on 14 Sep 2016

This is what I go by - as low as possible is not necessarily going to improve it.

The main cause of prints with blobs in the corners and other places is a low JERK setting. If you increase the JERK setting you will see the blobs go away. If it is set too high you will get ringing on the corners.

jerk = 20-30% of print speed >> 60mm/sec print speed = 15mm/sec jerk
M205 X## to change + M500 to save

Acceleration – set slow print speed == M203 X100 Y100
Set diff for moving bed – same for CoreXY == m201 X9001 Y9001
Test part with 100% rectangular infill == not losing steps, increase by 20% and try again
Leave safety margin of 20% from critical value.

Speed – will never get much above 200mm/sec especially with a short bed
F parameter is mm/min so to test use G1 X335 F12000

The E jerk and acceleration are used during Retracts so they are important and do effect the print in a way. For Tantillus, I have the Acceleration set to 10000 for E and the Jerk set to 100 because of the need for a 5.5mm retract on the Bowden cable and we want it instantaneous. This may not work for most printers since on a Bowden machine the filament springiness is enough to accelerate the extruder backwards really fast and prevent skipped steps while retracting.

ruggb on 14 Sep 2016

👍1

I believe this needs to be re-opened, as it is still a problem as of d63230d73e710b6ebc8b77fc99f9571687852f12. I connect to the printer at 115kbps, printing via Printrun. Mostly I just see "checksum" and "line number" errors (maybe a few dozen per hour of active printing), but occasionally I get the glitch where X or Y will suddenly just move out to one edge of the print volume (after which, the print resumes normally).

VanessaE on 17 Nov 2016

What features do you have enabled? As far as I know, this issue was never solved in terms of "it works now at any baud rate". But if you have problems even at 115kbps, I think you are using some extra features like bed leveling?

Sebastianv650 on 17 Nov 2016

Not much really, just LIN_ADVANCE and as you guessed, bed leveling (3-point).

However if I turn MINIMUM_STEPPER_PULSE up (to 8; normally I have it set to 4, as LIN_ADVANCE causes my extruder to lose steps otherwise), hoo boy the serial errors go through the roof, the printer stutters while trying to print, and basically can't do anything useful.

VanessaE on 17 Nov 2016

Maybe we should start a new thread - this one has been confused with other things..........
BUT I am still of the opinion that Marlin is trying to do too much - or it can't do what it needs to do in the time between commands. Lowering the baud rate may help but it will not cure the problem. Lowering the print speed worked for me - until more stuff was added for Marlin to do between commands. Maybe the only solution is more horsepower in the hardware and more RAM.
IMHO, the error is because the printer misses a command then responds to the next command while the computer is waiting for a response from the previous one. The model obviously showed the result of missing commands.

ruggb on 17 Nov 2016

@VanessaE I bet that's the reason, MINIMUM_STEPPER_PULSE will increase the duration of each stepper ISR and therefore it's much more likely to miss a char.

@ruggb I'm afraid if we create a new thread we will start from scratch while there is already a lot of investigations done here. Therefore I can also say that Marlin is missing single chars from a command that can't be received due to stepper ISR is executed at the specific time.
More CPU speed can reduce the error rate (as far as there is neraly never an error visible), but the flaw is inside the design. If the stepper ISR is blocking at the needed time, and it takes so much time that the next char is already incomming when the ISR is finished, the error is unavoidable..

Sebastianv650 on 17 Nov 2016

Chars can get lost if (E_step_loops * pulse_length > 1 / (baudrate / 10))

Maybe a

    #ifndef USBCON
      customizedSerial.checkRx(); // Check for serial chars.
    #endif

inside the loop helps. Ideally between starting and stopping the pulses.

Blue-Marlin on 17 Nov 2016

I think the obvious question is: Does the hardware have enough horsepower to run both the ISR and handle the i/o data stream simultaneously? If it does not, then there are three options:

Require higher performance H/W to run Marlin. - that probably won't go over big with users.
Eliminate features that bog down the ISR - That won't be any fun for users.
Make the code more efficient - that may be a BIG challenge for developers.
I believe @thinkyhead had indicated in a previous thread that he thought he had the ISR as fast as he could make it. If so #3 is impossible, not a challenge. I think he also indicated the i/o data stream was handled via interrupt - or he was striving to implement it that way. I don't know where that stands, but that goes to the first question on hp.
Is there any overhead in the code for features that are disabled?

ruggb on 17 Nov 2016

I don't see that as a main problem of processing power, @Blue-Marlin is bringing it to the point. If we could tell the ATMega that serial communication has a higher priority than the stepper ISR and that it should interrupt the stepper ISR, there would be no problem. But that's impossible. Putting some checkRX() at good positions inside the stepper ISR can work.

With Blue Marlins calculation, there is a limit of 87µs at 115200 baud and 40µs at 250000 baud.
At the moment, the stepper ISR takes about 50µs, therefore if 250kbaud is working or not depends more or less only on how lucky you are.

@ruggb only disabling features that do calculations inside the stepper ISR can minimize the error rate. There are not much: MIXING_EXTRUDER, LIN_ADVANCE, ADVANCE and MINIMUM_STEPPER_PULSE. Each of them will add a few µs if enabled.

Sebastianv650 on 17 Nov 2016

I was under the impression that there was something like a DSR in the comm so that the host waited until the printer was ready to accept data. Is that the response string and if so is Marlin sending it prematurely?

IE instead of sending it as soon as it receives the command it should wait till it is ready to rcv another command - and this has to do with filling up the buffer.

ruggb on 17 Nov 2016

In the old days, we used RTS/CTS for flow control on old 8-bitters with fast modems - does that exist here?

VanessaE on 17 Nov 2016

No.
Only two lines (RX, TX) at arduino. Hardware handshake would need another two.

Blue-Marlin on 17 Nov 2016

In one of these threads someone said that when the buffer was full a message was sent to the host to wait.
If that is true, there may be an issue in that process. Like it is sending a wait AFTER the buffer is full and the host is getting it AFTER the host has sent another command. It may be that it needs to send the message BEFORE the buffer is full and allow another command or two to be rcvd.
Just guessing.......

ruggb on 17 Nov 2016

@ruggb I think you are mixing up the buffers. A message is sent when the buffer for incoming commands (not single chars!) is full. Marlin isn't sending messages after every single char..

Sebastianv650 on 17 Nov 2016

@Sebastianv650 I understand that - I wasn't speaking about characters, I was implying messages or commands - I think.

ruggb on 17 Nov 2016

Uhh. That hurts. You couldn't be wronger.
Marlin is sending an 'ok' when it has executed or queued a command. That's the signal for to host to send the next command. Most hosts make some assumptions about how deep Marlins buffers are and send 'some' code ahead.

Please look that up.

The problem is, the only one byte buffer in the UART. If you do not pick the byte before the next arrived, it's lost. If a new byte arrived a interrupt is issued and queued. The (standard) interrupt scheme of the AVRs does not allow interrupts to be interrupted. So if an (stepper/advanced) interrupt takes to long, a char is lost. For that we already check for new chars in the stepper interrupt (code cited above). It could make sense to do the same in the extruder/advanced interrupt if that takes too long.

Blue-Marlin on 17 Nov 2016

The extruder ISR takes only a few single µs. In fact it is so short that I couldn't measure it.
I had an idea while I was reading your answer.. Maybe not a good one, but maybe we should think about that. What if we would do something like this in the beginning of the stepper ISR:

Disable all ISRs except the UART ISR
Enable global ISRs again: sei()
If a UART event happens now, it can interrupt the stepper.
At the end of the stepper ISR, enable all disabled ISRs again (restore the register saved before)

It wouldn't be as chaotik as a real non-blocking ISR, but no char should be lost..

Sebastianv650 on 17 Nov 2016

Marlin is sending an 'ok' when it has executed or queued a command.

@Blue-Marlin That is what my perception was. The question is, what happens if Marlin does not send an OK? - is there a timeout?

host sends cmd, marlin sends OK, host sends cmd, Marlin misses it, time out, host sends another cmd, Marlin sends OK, error generated because line number was for the missed one.
If not I can't see how this error is generated == Error:Line Number is not Last Line Number+1

If there is a timeout, then what I am saying is maybe just ensuring that Marlin can accept the next command b4 it sends the OK, and not sending the OK just because it has queued the command would solve the issue.

And if this doesn't make sense it is because I have no clue how this really works.

ruggb on 17 Nov 2016

I believe @thinkyhead had indicated in a previous thread that he thought he had the ISR as fast as he could make it

There always seems to be more room for optimizations. Pre-calculating values in the planner. Replacing division with multiplication in the ISR. Things like that. It's going to be helpful to keep profiling the stepper ISR to see where improvements can be made, and where it would be best to call into checkRx(). Maybe not at the start of the ISR but somewhere in the middle.

On that subject, I see that checkRx() is called for each loop in for (int8_t i = 0; i < step_loops; i++) { . . . }. And the call to checkRx() itself adds some overhead. Maybe it only needs to be called when i & 1 == 1?

Too bad it can't be called in between pulse start and pulse stop. It would eliminate the need for the MINIMUM_STEPPER_PULSE at that point. But I think a call to checkRx() there would actually make the pulses too long.

I can't see how this error is generated == Error:Line Number is not Last Line Number+1

If the character missed is within the line number, that will do it.
Otherwise I believe you'll get a checksum error instead.

thinkyhead on 18 Nov 2016

I had same Issue recently. due to bad USB cable i bought shielded one and all run nicely now.

marekduciuc on 2 Jul 2018

Hmm. I've been using Cura 4.0.0 beta and Slic3r 1.3.1-dev to send jobs over serial (USB) to an Ender 3. Maybe a few dozen prints, no errors. In Slic3r, noticed the "Line Number is not Last Line Number+1" message in the serial text, but it would only display it once upon connection to printer, and seemed to work fine after that. Never failed mid-print yet. So researching this error, it seemed like a big potential problem, so decided to build and upload the latest 1.1.9 Marlin bugfix fw. Now getting an endless supply of "Line Number is not Last Line Number+1" in Slic3r and it refuses to print anything. Prints fine from Cura though!

mj1911 on 18 Feb 2019

@mj1911
Code relevant for this has changed several time since this discussion - 2 years ago.
Please open a new issue for this. If possible with some logs around the error.

AnHardt on 18 Feb 2019

In case it helps, I see this error as continuous when it happens, every message fails at 115200. I find that closing the slicer and resetting the printer clears the issue. This seems to occur only when I start the slicer program under some as yet not determined situations. I have seen it start just after a new print starts, the head will be as some random spot on the surface, and every message sent by the slicer is failing. So far this has only happened on the first layer.

uchobby on 12 Apr 2019

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] on 12 Jul 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] on 10 Sep 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] on 10 Nov 2020

Marlin: Serial communication errors: checksum and line number not +1

Most helpful comment

All 107 comments

3631 can potentially affect the ISR-runtime

3730 is the topic for that bit of backtracking.

https://github.com/Smoothieware/Smoothieware/issues/894

Related issues