Marlin: bugfix-1.1.x endstop triggered message after G28 and after G29 J3 probing

Created on 24 May 2018 · 19Comments · Source: MarlinFirmware/Marlin

A quick heads up as I know the endstop code has changed recently. Using the latest bugfix-1.1.x with endstop interrupts enabled I am now getting a message displayed that says that the Z endstop has been triggered after every G28 command I also get this message after running a G29 J3 UBL probe. This was not the case with my prior bugfix-1.1.x build (from a couple of weeks ago).I have an inductive probe connected to the zmin endstop pin, endstop noise reduction is not enabled. I have also seen the G29 J3 command not raising the probe to the specified height between probes, which I suspect is related to the above endstop message (I think a "spurious" endstop trigger may be aborting the move). I am still investigating and will post config details etc. and a more complete bug report later.

Confirmed ! Patch

Source

gloomyandy

Most helpful comment

I agree that we should not call update within hit_on_purpose. That definitely negates the purpose of that method.

And the axis_did_move flag should definitely be cleared at both completion and abort of a block. The original endstop checking code got the current block's move-size in each axis at the moment of testing. The new code only needs to look at axis_is_moving, and that state only needs to be set once for each block, when it is fetched, just before it starts to process (and that's how it now is).

thinkyhead on 25 May 2018

❤1 👍1

All 19 comments

Thanks, @gloomyandy — The code changes recently applied to Marlin under some duress were not very well tested and are leading to some anomalous behaviors and (I am hearing) a reduction in print quality as well, so I apologize. Previously Marlin wasn't perfect, but what we had was at least free from this kind of spurious triggering. We may end up reverting the bulk of the planner, stepper, and endstop changes so we can have a less buggy 1.1.9 release, and keep them in a separate development branch until they get more thorough testing and debugging.

Meanwhile, does the "noise reduction" help at all? I believe that without it there is no redundancy or de-bounce of the endstops whatsoever.

thinkyhead on 24 May 2018

I think a "spurious" endstop trigger may be aborting the move

This should only occur if endstops are left on.
Please test after doing an M121 to see if you still get the issue.

thinkyhead on 24 May 2018

Hi I will be doing more testing and will try all that you have suggested, but won't be able to get to do that until later. I will also be testing with the endstop interrupt feature on and off. After that I'll try and get a better picture of exactly what is happening. I suspect it is just a timing related issue uncovered by the new way of doing things.

Just to clarify about the move that doers not always happen. I'm talking about a move that is mid way through a G29 J4 sequence, this is basically the move just after the probe triggers to raise the nozzle/probe up to the height specified by Z_CLEARANCE_BETWEEN_PROBES basically occasionally it just doesn't happen and the X/Y move between consecutive probes happens at whatever height the probe had just triggered at. I suspect that when this move is made then the endstops may still be active (as it is mid way through a series of probe operations), but I'm by no means 100% sure.

One thing I'm not sure about is if there is any "noise/bounce" associated with an inductive probe? I would have thought that such a device would be pretty well behaved compared to a mechanical switch and it is the probe that I'm getting the problems with!

BTW the changes seem to be a step forward in many areas, so I have no problem with them, hopefully stuff like this can be easily identified and cleaned up.

gloomyandy on 24 May 2018

👍1

@gloomyandy Inductive sensors have no bounce. But nevertheless, being normally open devices that use an open-drain/open-collector approach, wiring is still susceptible to noise pickup.
If you are running a Ramps board, without any filtering cap, adding an small one (100nF ceramic disc between ground and signal) could also help.
Later Ramps (1.4.2) boards already have that capacitor

ejtagle on 24 May 2018

BTW the changes seem to be a step forward in many areas, so I have no problem with them, hopefully stuff like this can be easily identified and cleaned up.

Definitely my preference, and if we can fix all the current issue that would be amazing! I keep hoping to get 1.1.9 out over every weekend for many weeks, and new untested code on top of the already weird layer shifting issues is killing my spirit. All I want to do is get 1.1.9 out so I can focus attention on 2.0 and stop doing twice as much work for each PR. If I seem extra grumpy it's because I have a headache that won't quit tonight.

thinkyhead on 24 May 2018

@thinkyhead : I´d personally would not worry about 1.1.9, if in your position. Let´s sort out all problems on 2.0.x, and then do the merge.

As @Roxy-3D suggested, use 1.1.9 as an stable branch.

When 2.0.x is stable enough, merging to 1.1.8 will not be as complex as you could think. Obviously, there are differences, but "an inspired day" is usually better than a collection of small pieces of attention. I am sure you get the idea 👍

ejtagle on 24 May 2018

OK so I've made a little progress on working out what is going on. I added some debug code into the endstops class and I've seen a few odd things. When homing (and probing) after the endstop is "hit" the homing code calls "hit_on_purpose" to clear the endstop state. The latest version of this code (and some of the other enable calls) now checks the state of the endstops in the call this is new and only done if the endstop detection via interrupt is being used (which it is in my case). In the case of doing a G28 after the probe has triggered, the call to hit_on_purpose is made and this clears "hit_state", however the call to "update" then immediately detects that the endstop is in a triggered state and sets hit_state true again. I think it is this that is causing the spurious "endstop hit" message after G28. I assume that in previous versions hit_on_purpose is clearing the state.

However what I don't understand is why update is flagging the endstop as triggered, obviously the actual probe/switch is in the triggered case. But my reading of the code is that the head needs to be moving towards the endstop for any notice to be taken of it. I don't know enough about when/how the various movement conditions are being set. Could it be that the system thinks that the axis is still moving when the call to hit_on_purpose is made? In previous versions this call to update would not have been made in this situation and so the triggered state would not have been checked until later.

I'm not sure if the above issue is also causing my G29 J3 problem or not. I suspect it may be related, but unlike the G28 problem I do not see the G29 movement being skipped every time, but I suspect that perhaps a call to update is now happening at a different time and may be causing the move to be aborted. But it could also be noise on the endstop lines, but that would also be strange as the move that is skipped is away from the Z endstop and so in theory should never cause a trigger. Either way I'd like to try and resolve the first issue if I can and then see if I still have the G29 problem.

Any suggestions as to what may be causing update to detect an endstop hit in the G28 case?

gloomyandy on 24 May 2018

👍1

OK I've not tested this very much but looking at the code it seems that movement and direction information is provided to the endstop update method by the stepper variables "axis_did_move" and "last_direction_bits". I'm a little concerned about what information these will continue to provide after an endstop is triggered. It seems to me that they will continue to say that the axis is in effect still moving, until a new block is started. It seems to me that when the code to abort the current block is executed then "axis_did_move" should be set to zero to indicate that movement has now stopped. It seems to me that without this the endstop code can incorrectly trigger further endstop conditions even though nothing is moving. I think that this may also in just the right circumstances result in the next block after an endstop is triggered being discarded (as the abort flag may have been set). I'm not totally sure about all of this and will actually test things tomorrow.

Oh and I'm not sure what should happen when a block has been processed and no other block is available. Should these variables be cleared? If not then again they could remain set for some time possibly leading to spurious endstop conditions being triggered?

I couldn't go to bed without at least trying it! I made the change to clear "axis_did_move" when a block is aborted and so far it looks good at least for G28. As far as I can tell this is only used for the endstop stuff, so it should be a safe change. But I'm sure you will have a better understanding of this code than me! Anyway, more testing tomorrow.

gloomyandy on 25 May 2018

@gloomyandy : Try the following: At stepper:

if (abort_current_block) {

Add, after that line,
axis_did_move = 0;

In fact, this seems like an small race condition, but i do agree that once the movement is aborted, motors are stopped.

(Although there is something strange, as the motor should be moving in the opposite direction to the endstop, so motion should not be aborted - But your proposed change is safe to implement, as at that point motors and all axis are stopped)

ejtagle on 25 May 2018

Yep that is exactly the change I've already made. Do you think that we should also add the same line when "all_steps_done" is true and current_block is set to null, or perhaps this should be done only if/when a new block is not available? Not sure if that is needed or not, I suspect things will be OK without it.

gloomyandy on 25 May 2018

I agree that we should not call update within hit_on_purpose. That definitely negates the purpose of that method.

thinkyhead on 25 May 2018

❤1 👍1

Ok, applied those patches. Keep us posted if any other endstop oddities arise.

thinkyhead on 25 May 2018

Yes, the call to update() on hit_on_purpose() was just in case... Nothing against removing it. axis_did_move should be cleared at both places, i do agree. (but to be honest, the previous code before the "superpatch" was simply skipping the update call if there was no movement (current_block == null).
I guess this new solution is cleaner and get exactly the same results.

ejtagle on 25 May 2018

I'd personally would not worry about 1.1.9, if in your position.
Let's sort out all problems on 2.0.x, and then do the merge.

Well, I do need to sort out layer shifting, and the refactor showed some promise that —if nothing else— it might "accidentally" fix that issue. And, if it somehow made things worse, we could probably sort it out or revert it. Now, with the complications to Linear Advance, I'm reserving the option of reverting the planner in bugfix-1.1.x (thus 1.1.9) to where it was at the end of January. But that's obviously a last resort.

Maybe you made a typo, but to be clear, Marlin 1.1.8 is done and will remain exactly as it is, even with the Neopixel compile error. (Maybe we'd consider a 1.1.8.1 patch to address that.) The bugfix-1.1.x branch will be published as 1.1.9 once we've resolved the Linear Advance and lost steps issues. And along the way, we're only going to patch other bugs we find, and any cleanups that are simple to share with bugfix-2.0.x. No new features from this point.

Marlin 2.0.x can get lots of attention in other areas, especially the HAL, but I'm withholding all new feature PRs to that branch too, just till we sort out the planner and Lin. Adv. and release 1.1.9, because it's simply easier to keep the core, module, and lcd files in sync if we don't introduce new features.

Let's hope we resolve these annoying last few issues soon!

thinkyhead on 25 May 2018

Yes, i was refering to 1.1.9
Maybe the most problematic thing is the lost steps issues... I know, i know... LA is also important, but it is not exactly in my top ten list of things to fix (because the idea was to rewrite it to make it compatible with Bezier)...
The lost steps, i actually don´t know... it is very hard to be sure if it is a sw issue or a hw issue that is being triggered by a sw issue... Errors are truly random, unfortunately

Reverting would suck: If would be like renaming an old release and posting it as a new one... Not much sense, unless there is some kind of compromise forcing to release every X weeks... ;)

ejtagle on 25 May 2018

Errors are truly random, unfortunately

I'm starting to sound like a broken record saying this, but since we know they don't exist before February, there is definitely something we can adjust to mitigate the issue. Once we locate the day the issue began to manifest, we can analyze the change. It may turn out to be something very simple.

It would be like renaming an old release and posting it as a new one

Well, we would only revert the problematic portions of the planner and stepper, and maybe only a small portion at that. 1.1.9 would still contain the hundreds of other bug-fixes, new features, improvements, and enhancements made since December.

thinkyhead on 25 May 2018

I will update my build with the latest changes and retest. Thanks for your help with this. Hopefully the layer shift problem can be identified and fixed as easily!

gloomyandy on 25 May 2018

OK with the latest bugfix-1.1.x I no longer get either the spurious endstop messages and after running loads of G29 J commands I've not seen any missed moves. So all looks good to me!

gloomyandy on 25 May 2018

🎉1

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.