Layer shifts are usually a mechanical issue. As far as we know there's no randomness in Marlin. If an issue occurs at _some point_ in a G-code file, it will _always_ occur at the same point in the file. Can you provide a G-code file that demonstrates this issue consistently?

thinkyhead on 18 Apr 2018

I dont think its a gcode issue. The objects in the slicer dont show that behavior in the preview. Also the heigth of the shift is always different. If I print the same file multiple times, sometimes it happens in the second layer, sometimes it works fine untill 2h midprint. I printed an object at least 5 times, everytime it was different, sometimes on X, sometimes on Y. Thats wh I don`t think it is a mechanical issue, since the axis change.
Then I flashed from 2.0 to the latest 1.1.x bugfix with the same issue. In both cases I reduced Acc from 1600 to 800. raised an lowered Vref, decreased feedrate to 70%. Then I switched back to my january build, printed the same object (had to slice it again, the only setting I changed was the K value in the startcode to a v1.0 value) and it printed absolutely fine.
Could it be, that I somehow reached the cpu capacity of the adruino? I have no idea what else it could be.
What I also did:
More cooling on the steppers, but they stay cold anyhow.
Powered my ramps with and deactivated the diode, so it does not use the arduino VR. (thought it might be caused by dropping 5v because of power consumption)
Loosend and tightened the belts
removed the belts and made sure all axis move extremely smooth
checked the aligment of all axis mit a micrometer
Sliced the file multiple times
printed with different materials, that require higher and lower bed temperature

I really have no idea what to do else. And since it works with the older marlin version, I think it has something to do with the fw

Oh and btw, the shifts aren`t huge shifts. They range from around 0.5-2mm, but I never measured them.

viperchannel on 18 Apr 2018

Here is one of the gcodes I have used. The object had a shift in X direction at around 2.1mm heigth, than another one in the opposite direction 1.63mm later, and after another approx 3.3mm it had a shift in Y direction.
Before that I printed 2.5kg of the same filament, same kind of objetcs, same slicer settings (exept K value because of V1.0 and 1.5) on my january build with no problems at all.
failed object.zip

viperchannel on 18 Apr 2018

Could be your drivers overheating. You say you increased cooling for steppers but not to drivers. You also increased Vref that increases heat output. Lowering it to half would likely lead to insufficient torque and to skipped steps.

teemuatlut on 18 Apr 2018

Sorry, I ment the drivers, not the steppers. I have increased cooling on them, but that was not the reason, eaven with the smaller fan, they did not get noticably warm. I just didn`t know what else to do and wanted to eliminate as many reasons that could cause the problem as possible. Adjusting Vref was just a try, it did not help at all. Motors never got above 45°C btw.

viperchannel on 18 Apr 2018

If you have Z fade with a small number and you have a very uneven bed your nozzle will rub on layers and then cause shifting. you could try increasing the M420 to make it fade over a longer distance.

isaacfank on 18 Apr 2018

I have also aready though of that. But some shifts happened before reching the fade height and some 20mm after. With a 0.2 layer heigth, thats a lot of layers. I have also tried to move the axis while printing by hand. Thats nearly impossible it has so much torque, i would rip my extruder off the mount bore skip a step.
My thought are, that all the calulations like fade height, bed leveling, linear advanced, babysteps somhow max out memory or buffer during a print. Or it has something to do with linAdv 1.5. I have no idea if that is realistic at all.

viperchannel on 18 Apr 2018

I’m currently investigating a similar issue, trying to determine if firmware related or something else.
Had two 5 hour+ prints which have suffered layer shifts in the Y direction.
Initial print was with ABS and had some warping, so i thought might have been with the nozzle hitting the print. 2nd attempt, resliced and used PLA, same result. layer shifts at different heights, always in the Y direction

I'm trying to replicate with simpler, quicker prints that i can observe. All ok so far.

Printer has been fine previously.

autonumous on 19 Apr 2018

Keep experimenting. Disable various features. Test various settings for acceleration and jerk. Etc. The most important thing will be to narrow it down to a single causative factor.

thinkyhead on 19 Apr 2018

I can already tell, that is is 100% caused by the firmware or one of its functions.
I did not have a single print without shifts on the newer firmwares and as soon as I switched back to the old one, I didn`t have single shift.
Had it printing for 16h yesterday on various prints, including identical gcodes that had shifts with the newer versions with the exact same mechanical, acc, jerk, vref etc settings and not getting a single shift while using the old version.

viperchannel on 19 Apr 2018

I can confirm this with the 2.0.x bugfix -branch. After updating from 1.1.5 there is sometimes "loud thunk" and erratic moves in the middle of print (my printer if very silent normally and these "thunks" didn't happen with 1.1.5) and I'm also experiencing layer shifts, even with prints speeds of 50mm/s (1.1.5 goes fine with 80mm/s). I'm sure there is no mechanical failure or problem with stepper controller reference voltages, I've played around with different accel & jerk speeds. I even tried the BEZIER jerk and while it makes the overall printing smoother, the erratic moves are there.

I try to experiment more during weekend and if possible, make some comparisons with different values.

Jartza on 19 Apr 2018

Finally an issue regarding the TMC2130 layer shifts which occur with the bugfix-branches!
I'm experiencing the very same behavior as you guys are describing.

LichtiMC on 19 Apr 2018

It's _possible_ that I'm seeing the same behavior.
I'll need to start eliminating commits and see which change introduced this. If any.

~~If you guys want to try something, turn off MONITOR_DRIVER_STATUS. That way there will be no communication to the driver after the initial setup.~~

teemuatlut on 19 Apr 2018

I am kind of happy, I am not the only one that has those issues. I am currently running the TMC2208 on my printer btw. so I do not use SPI.

viperchannel on 19 Apr 2018

Also judging by the initial report, this is not isolated to TMC drivers.

teemuatlut on 19 Apr 2018

OK, may be, just saying I never had any of these problems with the A4988 drivers.
This could be because of the less torque of stealthchop though...

I even changed to much oversized stepper motors (400mA to 1700mA, H-Bot) but still I get these layer shifts.

LichtiMC on 19 Apr 2018

This could be because of the less torque of stealthchop though...

It is not, I'm running TMC2100 on my printer and it has been condigured to use spreadCycle mode

Jartza on 19 Apr 2018

Also judging by the initial report, this is not isolated to TMC drivers.

Correct, my other printer (Tevo Black widow) has drv8825 and the same issue seems to bother there too, erratic looking moves that are louder than normal printing, although no layer shifts (yet) as I'm driving steppers with more force.

Jartza on 19 Apr 2018

Same gcode-file with 1.1.5 (except modified lin_advance value):

img_20170412_160401

Jartza on 19 Apr 2018

@Jartza: Could you post your
DEFAULT_AXIS_STEPS_PER_UNIT and your
DEFAULT_MAX_FEEDRATE ?

A long time ago, when i switched to the bugfix2.0.0 branch, for my printer, i need for the Z axis
STEPS_PER_UNIT = 4000, and if i set MAX_FEEDRATE to 3 or more, the Z axis does not move, but it does a horrible noise (brrrr, but with some discontinuities). I am pretty sure there is some overflow in computations somewhere, never investigated them. Maybe that is being triggered somehow in some strange case also

The way to reproduce always reproduce it is:

define DEFAULT_AXIS_STEPS_PER_UNIT { 80, 80, 4000, 97 }

define DEFAULT_MAX_FEEDRATE { 300, 300, 4, 30 }

And try to move the Z axis with the LCD menu...

ejtagle on 19 Apr 2018

#define DEFAULT_AXIS_STEPS_PER_UNIT { 80, 80, 400, 96 }
#define DEFAULT_MAX_FEEDRATE { 150, 150, 8, 50 }

There is limit how much 8-bit cards can output steps.

I also have #define MINIMUM_STEPPER_PULSE 2

Jartza on 19 Apr 2018

Printing regular 20x20x20 calibration cube, there's almost always now loud "thunk" also when the the perimeter has been printed (there is just very short move command). I tried to record it on video, but the "thunk" is so low frequency that cellphone mic didn't catch it but it's very audible by ear :)

Jartza on 19 Apr 2018

I have my current printer on an old build of 2.0 and its running fine, so ill upgrade it to the latest and let you know if there is any clunking/shifting or whatnot. Ill look, but i might be able to figure out the date i downloaded it and ill know what commit its on.

isaacfank on 19 Apr 2018

My 2.0.x is less than week old, but I will update later today to latest bugfix-2.0.x to see if it helps any.

Jartza on 19 Apr 2018

Ill look, but i might be able to figure out the date i downloaded it and ill know what commit its on.

Try git log -n1 to see the last commit :)

Jartza on 19 Apr 2018

My 2.0 was april 11th, the 1.1.x bugfix was april 17th.

viperchannel on 19 Apr 2018

By looking at Jartzas pic, i noticed that my shift is different. It looks like his print shifter over several layers. It looks like the little house started to continously move to the left. My shifts are just one layer. For example, it starts the print, after several perfect layers, it shifts in one direction and stays there for a while, untill the next shifts happens. I dont have shifts that stack up on several layers like on the pic

viperchannel on 19 Apr 2018

@viperchannel that's the most rad example. I searched for mechanical failure and tested belts, wheels, bearings etc. from my printer because I had sudden layer shifts in 1-2 layers in some prints and first thought was of course mechanical/electrical issue. After checking everything, I reflashed 1.1.5 and redid the prints with same gcode without any issues (or clunking)

Jartza on 19 Apr 2018

Are you all using UBL? If so you might want to try the very latest (as of a couple of hours ago) bugfix version. There has just been a fix made to the way that UBL handles moves that may have some impact on this (but may not).

gloomyandy on 19 Apr 2018

I was using #define AUTO_BED_LEVELING_BILINEAR on all versions.

viperchannel on 19 Apr 2018

@Jartza I would agree with you, but i am using an Arduino Due (32bits) and the problem is exactly the same. If i get some free time, i´ll start debugging...

ejtagle on 20 Apr 2018

I can already tell, that is is 100% caused by the firmware or one of its functions.
My 2.0 was april 11th, the 1.1.x bugfix was april 17th.

I'm unable to reproduce the issue, unfortunately. I now have a small pile of perfect Benchy boats. I'll see if adding enabled features from your configs causes the issue to appear and if I'm able to get the issue to manifest then I'll be able to start narrowing it down to some point in time, and maybe a specific commit.

bugfix version from january it works fine
I reflashed 1.1.5

Many changes since January (and since 1.1.5), but I'll do a comparison and see if anything stands out. If you can determine the most recent version or commit where the problem is not present, that will be super helpful.

thinkyhead on 20 Apr 2018

@viperchannel — Have you tested with LIN_ADVANCE disabled to see if that feature is contributing?

@autonumous — Please zip and attach your configurations.

thinkyhead on 20 Apr 2018

Not yet tested without Lin adv. I will give it a try over the weekend if I find some time.

viperchannel on 20 Apr 2018

@thinkyhead by disabling LIN_ADVANCE do you mean completely removing the define, or just setting the value to 0?

Jartza on 20 Apr 2018

Try both, as time allows.

thinkyhead on 20 Apr 2018

👍2

This is definitely a firmware issue, the machine is randomly going full speed ignoring jerk in tight spaces. It just goes to full speed in the Y direction only. I can tell that it happened from the loud "thunk" that penetrates my walls and vibrates the floor. This started happening when testing non linear prints. It wont do it on prints that simply move in straight lines. If the print has a curve, like a pipe or tunnel whos diameter is on the horizontal plane it will do it. I can visually see that the machine it self is moving faster in certain palces than I set it too.

20180420_204304 1

I can send you the file i am trying to print or gCode and my configuration files if you need. There is something about this firmware that it's not obeying speed and jerk limits and it's torque ripping the Y axis out of step.

let me know what you need, I think I can say this is 100% repeatable at this point for me. Even switched out stepper drivers, got tons of them laying around since it's cheaper to buy them in lots of 4. Standard A4988.

I am really going to hate downgrading firmware for now, all these new features are nifty to have.

Grimshadows on 21 Apr 2018

👍2

that explains why I’m not able to reproduce with simple models. Simple cubes came out fine. Just tried a benchy had had a very slight layer shift, whereas the “curvy” model i was previously printing had many many layer shift.

autonumous on 21 Apr 2018

👍1

Yep simple cubes come out 100% fine, add curves in tight spaces with walls it goes bonkers.
All this prints just fine on 1.1.8:
20180420_224537 1

Grimshadows on 21 Apr 2018

I can send you the file i am trying to print or gCode

@Grimshadows — Please do! I would like to test with the G-code that gives you the most trouble — that cylinder, for example.

thinkyhead on 21 Apr 2018

Some reported they are using the 2130 tmc. Wouldn't they stop the print when overheating and report a driver error?
I have also already done the cooling experiment and switched from a 40mm fan to a 120mm fan and still had the shifts.

viperchannel on 21 Apr 2018

(comment deleted after reviewing the thread)

thinkyhead on 21 Apr 2018

@thinkyhead
Skimmer Intake Venturi.txt

One thing that I did forget to mention earlier because i hadn't noticed yet. Is that this is also making my extruder overshoot too. To the point that it feed enough filament to snap it inside the extruder just after the gear, My extruder is setup to have a ton of torque when needed enough to break the line if it cant pass through (I have only ever seen that with a clogged head, this head is not clogged). The Y axis torque rip and the extruder over shooting filament seem to happen at the same time.

I am going to try printing using another engine, giving Simplify a swing to see if the behavior is Cura related.

Grimshadows on 21 Apr 2018

Thanks! I will test it later today and see if I can reproduce the issue. I hope that I can, really, because then we'll all be on the same page!

thinkyhead on 21 Apr 2018

@Grimshadows I "dry printed" the gcode (without filament and temps to 0) and I think that's very good model to find out the problem as the printer really shakes violently with that gcode. Maybe I should test print with filament too 😆

Jartza on 21 Apr 2018

@thinkyhead looks like disabling LIN_ADVANCE either by setting it to 0.00 or undefining LIN_ADVANCE has no effect on this...

Jartza on 21 Apr 2018

👍1

@Jartza Those speed settings are really really toned down from the speeds this thing normally runs at. I toned them down thinking it was a speed issue. Doing a run with simplify now at my normal speeds, lets see if it's the slicing engine. i'll give an update in about 35 min. violently shaking is ok as long as im not hitting the chassis harmonic frequency.

Grimshadows on 21 Apr 2018

I am pretty sure, my motors were much warmer on the bugfix versions than they normally are. If you are switching between versions, take a look at that. I cannon play around this weekend, I have some prints I need to get finished first.

viperchannel on 21 Apr 2018

Skimmer intake-S3D.txt

Ok, ran the print in Simplify.... So... it may not be the Firmware after all not one skip. I don't even have S3D setup for fine tuning and calibrated for this level of detail. i mainly have it set up for speed. So there must be something that Cura is doing different with the slicer than S3D. I think I am going to calibrate S3D at this point and run with it. I don't want to give up the features gained in 1.1.8. Still doesn't explain why linear shapes print fine on Cura but not curved objects.

Video Link if your curious,:
https://youtu.be/FyQbR1NfjtM
yes... my tripod is a 3.5 inch VISE.

Grimshadows on 21 Apr 2018

I did all my files in simply and had the skips. So it's not Cura that causes it

viperchannel on 21 Apr 2018

I did notice the big difference is that cura is using G0 commands every few lines. S3D is all G1 commands. I remember that in firmware G0 was being treated like a G1 for sometime did that change? I see why cura is trying to do it, to conserve momentum in a direction it has to go next instead of coming to a full stop; and hit accel and decel constraints. I am wondering if that could be part of the problem. (Sometimes its faster to follow a curve than a stright line)

Grimshadows on 21 Apr 2018

All my simple test prints seem to be fine. The kids have more benchys, and cubes than they know what to do with! :-)

Just tried to print a whistle from thingiverse, and suffered several shifts.

As always, slicer was S3D. only other big difference is that i’ve just printer at 0.2 layer height, my tests have been 0.3. So will continue to use this same profile/process for other tests and try to reliably reproduce the issue.

autonumous on 21 Apr 2018

My first print noticing layer shifts.

(These prints were fished from the bin, the support had already been removed

I changed the material, thinking the warping of ABS may have knocked the print head off. Also changed the orientation of the part. Still same result.

This was today. Two distinct layer shifts. all in the Y direction.

I’ve swapped X and Y drivers (A4988) over and restored the same print to see if i get shifts in the X direction or continues with Y, or if it comes out ok.

Will report back once complete. depending on results then, will try different firmware options/versions

autonumous on 21 Apr 2018

Print completed, initially was looking good, but got a very slight layer shift at a higher point than the previous. Still in the Y direction, despite swapping drivers round. Everything else the same. Same code file, (sliced in S3D).

I wasn’t around at the time it occurred, But of the 100’s of prints i've done on this printer over last 2 years, I’ve never ever had layer shift issues..

Next will try with a previous firmware version,

autonumous on 21 Apr 2018

👍3

Print completed with no layer shift at all when using a previously compiled 1.1.8

A process of elimination now.....

autonumous on 21 Apr 2018

@autonumous I can see from the pictures you are getting the same over extrision on skips that i am seeing on my prints.

Grimshadows on 21 Apr 2018

@autonumous Is your gCode using Flips between G0 and G1 commands like mine? Could you post it?

Grimshadows on 22 Apr 2018

I've reverted certain portions of the planner back to the way they were in 1.1.8 in this branch:

https://github.com/thinkyhead/Marlin/archive/bf1_revert_planner_test.zip

Test with one of the troublesome objects and see if it causes failure. Stop the print as soon as you get any layer shifting to save on material. You can also use M111 S8 to set "dry run" mode, where it won't print, but will just do all the other moves. If you don't see the issue in "dry run" mode but you do in normal printing, that would be another clue.

thinkyhead on 22 Apr 2018

Dry Run wont work as the skips are sometimes so minute they are not audible. I did a quick print without any real re-calibration. Seems to have fixed the big skips, there were two minor shifts (could be user error, so I will have to retest). I am going to re-calibrate the machine when I get back home from work and try again with this build. Seems to have fixed the glaring skips. The Filament no longer attempts to curl up in the extruder too. I will put the part under the magnifier later and count the layers to see what line of code that relates to. I am still thinking it has to do with those G0 to G1 commands Cura slicer is putting out.

Video of the print
https://youtu.be/LJD0gmED4gQ?t=1729
Link is at the first skip

Grimshadows on 22 Apr 2018

Dry Run wont work as the skips are sometimes so minute they are not audible.

I see. So you need the object to see that the axis has gotten shifted.

I am still thinking it has to do with those G0 to G1 commands Cura slicer is putting out.

On Cartesians, G0 and G1 are synonymous in Marlin and will produce exactly the same movement. But there might be differences in what the slicer outputs for each one. For example, do the G0 commands in the G-code have a different Fxxx feedrate than the G1 commands?

thinkyhead on 22 Apr 2018

I maybe found out how to provoke this "loud thunk" and, associated therewith, the skipped steps:

In OctoPrint Control Tab, if using following settings and pressing the arrow keys fast in a row, the motors are losing steps:

If I fast press 2 times Y+ then go back to 0 and repeat this process 4-5 times, my hotend will crash into Y0.
This won't ever happen, if I press Y+ slower. It also wont happen, if I alternately press other direction keys, no matter how fast. Only if the same direction key is pressed fast multiple times in a row the problem occurs.

The feedrate is set to 6000 in OctoPrint.

Could be completely unrelated, but maybe it helps.

LichtiMC on 22 Apr 2018

Did any one tested it with stealthchop off ?? I heared several stories it then stops layershifting. I got also layershifts.I cranked up the current, lowered the speed and stuff like that but it wont help. Also the little 20 by 20 cube after adjusting some things there good now.I got a 80 mm fan cooling the steppers and i got the chineese steppers but the same results as you guys using marlin bugfix 1.1.8 im guessing about 1and a half month old version. And i slice it with S3D. I also got some weird movements between printing

felenna on 22 Apr 2018

It's not a TMC specific issue.

teemuatlut on 22 Apr 2018

its the same with normal pululus? well the same marlin i used with normal steppers and it just printed

felenna on 22 Apr 2018

ok been testing even with treshold of 0 so it goes only in spreadcycle doesnt work also

felenna on 22 Apr 2018

do you people know any release of bugfix that does work ok???

felenna on 22 Apr 2018

I suspect the issue is somewhere near the high-speed threshold, such as jerk/acceleration/feedrate being slightly different between firmware versions. The video posted above runs at what I would estimate to be the mechanical limit of the machine, and people above that also seem to be running quite fast.

If it really happens only with complex shapes, maybe the AVR CPU / gcode sender isn't able to keep up and the motion becomes less smooth, causing more vibration and back EMF, overloading the drivers/motors.

Does it also happen when you print really slowly (10mm/s) and with low acceleration (ie. 500mm/s^2)?

I've been running several machines on anything from 1.1.5, 1.1.8 and latest 1.1.x and 2.0.x bugfixes without ever seeing the issue, though I print at 10-20mm/s and with low acceleration.

comps on 22 Apr 2018

first layer already thats printing quite slow its the movements between print parts and infills that are tiny

felenna on 22 Apr 2018

20180422_173857 y shift

Been experimenting for a while with the same part causing the issues, ran though a set of prints at different speeds. The Left most Is S3D on super fast settings (bordering harmonic resonance) No skips but print quality is basically down the sewer. The next is Cura set to mildly ridiculous speeds; the last is dead slow with all jerk and acceleration down to near minimums. The slower I set the printer the more pronounced it happens.

Grimshadows on 22 Apr 2018

do you people know any release of bugfix that does work ok???

So far they work great for me, but I haven't used Cura for a while.

I posted a branch in a comment above for testing if anyone else wants to try it out.

I would suggest also testing with an older version of Cura, for example from last year. And start with its default settings.

thinkyhead on 23 Apr 2018

Just chiming in that I've just finished a TMC2130 build (einsy),everything tuned mechanically. I get the loud "thunk" as well. Running bugfix-1.1.x Seeing these layer skips. I'll recompile with 1.1.8 or 1.1.5 and report back. Slicing with Simplify3D.

Attached is a file I modeled up to test this; seems to cause this issue immediately.

test_fail_skip_layer.zip

dammitcoetzee on 23 Apr 2018

@thinkyhead I cant get S3D to exhibit this behavior. So i have been running with that to get some things done. Prefer Cura, I will do some more testing with Cura at complete bone stock and work up to my settings to see if its any settings issue. I got tons of filament, enough to last me till next year, so I don't mind tossing filament at this. I would like to work with Cura for the quick and easy settings. Also it seems to do a better job with ABS most of the time than S3D. First I am going to switch over to a flying extruder to eliminate Z distance as a possible issue and if it persists I will go to direct drive to see if that is an issue. No skipping in S3D only Cura.

@dammitcoetzee I will run your file in a few to see if I get that with S3D.

Grimshadows on 23 Apr 2018

I am sorry, for opening this bugreport and then not having the time to test different things. It has been a busy weekend and I have some prints I need to get finished.
The only new Information I can provide is that I don`t think it is related to the S3D Version. Some of my prints with shifts are from files a buddy has reated and he is still using S3D 4.0.0

Are you guys printing via Octoprint or is someone printing via USB or SD card? Around the same time these problems started, there was a new Octoprint version coming out. Maybe the error is there.

We already figured out, it is not a TMC Issue, but we could not narrow it down timewhise. What are the latest bugfix releases you are using that NOT have this problem? Anyone using something newer than my 1.1.8 bugfix from january?

Did someone try the new bugfix that thinkerhead posted here? Any results on that?

viperchannel on 23 Apr 2018

Hi!

I got this "new bug" notification because I was subscribed to https://github.com/MarlinFirmware/Marlin/issues/9368 . I temporarily solved my "shifting layer problem" by disabling stealthChop and the 256 steps interpolation. That made my printer (Anet AM8) as loud as is was with the 4988 drivers, but I don't lose steps for now. :|

At the moment I am running bugfix-1.1.x checked out at 05. april ( 98684952e33287008128a551ae6139103740b056 )

ikarisan on 23 Apr 2018

I'm wondering if we are seeing a problem with the Stealth-Chop. I'm wondering if when the printer loses position, if it is losing a full step. Can we catch the printer losing 'just a little' bit of position? And then measure how far the layer shifted?

What I'm wondering is if are losing a full step of the motors. And it might be caused because the micro-steps are not powerful enough to hold position?

Roxy-3D on 23 Apr 2018

I don't think so. Someone reported here that he uses A4988 which do not support 256 resolution and not have stealth cop or other fancy stuff.

viperchannel on 23 Apr 2018

It's definitely not limited to TMC2130 or StealthChop. My bot uses A4988's, and I get the layer shift issue if I go too much over 1000 mm/s² on acceleration (the printer's fastest "acceptable quality" acceleration is some 3000 mm/s², its mechanical limit is some 6000 mm/s²).

VanessaE on 24 Apr 2018

It's definitely not limited to TMC2130 or StealthChop.

OK! But... Are we losing full steps of the stepper motor? If so... That implies the firmware logic is OK, but for some reason we are sending the micro-steps too fast or some how the stepper motors don't have enough power such that they can't keep up. And they lose a step????

Roxy-3D on 24 Apr 2018

Hard to say in my case, but the rattle I hear is very similar to what used to happen before the advent of Linear Advance 1.5, just not as severe. The cause at the time was LA causing the motor to exceed the elsewhere-configured E jerk speed, but LA is turned off on my bot for now while I work on something else. Given that the planner still has a bug or two in how it deals with jerk (see #9917), I guess that's what's happening here and in #10272 and the others with similar probs.

VanessaE on 24 Apr 2018

@Grimshadows

I can see from the pictures you are getting the same over extrision on skips that i am seeing on my prints.

I assume you mean the little lumpy bits. These are quite consistent across prints, same places. Not got round to totally removing them, any suggestions?

Is your gCode using Flips between G0 and G1 commands like mine? Could you post it?

Nope, Just took a look at a file, G1’s throughout

Unfortunately, i’ve not got round to performing any further tests, i needed a print doing for a project so stuck with 1.1.8. I hope to be able to test a few things during the week, @thinkyhead will test your branch as soon as i can and report back.

I’m running A4988, as mentioned above, tried swapping drivers round, same issue on same axis (Y)

@viperchannel, Im also printing via Octoprint, will try via the SD also .

autonumous on 24 Apr 2018

So far I have looked at one of the Cura-sliced files, and I noticed that it was changing the acceleration on the fly, first to 2500, then to a very low value — _using the deprecated M204 S parameter_ which sets both travel and default acceleration at the same time.

I'd like to see the same STL file sliced with S3D (that doesn't exhibit the issue) and then sliced with Cura (which does) to see what they are doing differently.

Meanwhile, here's something you can try. Please insert this line and see if it helps:

  inline void gcode_M204() {
+   stepper.synchronize();
    bool report = true;

…and…

  inline void gcode_M205() {
+   stepper.synchronize();
    if (parser.seen('S')) planner.min_feedrate_mm_s = parser.value_linear_units();

Here's a branch with these changes already applied:
https://github.com/thinkyhead/Marlin/archive/bf1_synchronize_M204_M205.zip

thinkyhead on 24 Apr 2018

@Roxy-3D your Theorie is good, but like I said, the 4988 do not support microstepping and the step losses don't happen on previous versions with the same motor current. Since it happens on high and low printing speeds, it could be the data speed or the processing speed, but no idea how to check that

viperchannel on 24 Apr 2018

To add a new data point, I've been happily printing with my other printer and seen no skipped steps.
It's a coreXY frame with Archim2 board. So 32bit SAM and TMC2130 drivers in stealthChop and 960mA_rms.
X/Y max acceleration values are set at 2000mm/s².
X/Y jerk is at 10.
UBL is enabled with fade at 2.0mm.
Linear advance disabled.
Slic3r-dev from a week or two old build.

The branch is (re)based on 5a064d0e94c823de5787105c31c536d5fed7c089 from April 16th.

I'll see if I have time later today to test the cylinder model with my AVR board.

teemuatlut on 24 Apr 2018

I'd like to see the same STL file sliced with S3D (that doesn't exhibit the issue) and then sliced with Cura (which does) to see what they are doing differently.

@thinkyhead I've been only using Simplify3D and while the gcode it creates does not use M204 or M205, I still experience the "clunks" and problems.

I'm now trying different models to create test print that most often experiences the problem.

Jartza on 24 Apr 2018

like I said, the 4988 do not support microstepping

To the best of my understanding, the 4988 DOES SUPPORT µstepping… see https://www.pololu.com/product/1182 ... I read “Five different step resolutions: full-step, half-step, quarter-step, eighth-step, and sixteenth-step” in the overview…. So @Roxy-3D made a point worth verifying….

lrpirlet on 24 Apr 2018

@autonumous
I would take a look at Z seam settings, and the speed/distance of retraction. Also Double check your not printing at some wonky half set or some fraction of your actual Z resolution. The calculators at the reprap website will help you with realistic layer heights. ie... for me my real Z resolution is 0.04 or any multiple of that, so a resolution of of 0.1 would not be realistic for me as and I would need to be 0.8 or 0.12.

As far as the issue, I haven't had it creep up on me using S3D, so i have been getting some work done sans Cura. I bit the bullet and tuned S3D for the detail i needed. I am going to revisit Cura after these prints are done.

Someone up a few post back mention EMI as a possibility or Host client lag. I will be changing up a few things to see if maybe Cura is getting behind or overloading the Arduino. I am on a ten foot USB cable so the EMI thing sounds like a good thing to look at. Then again... S3D no issues.
I am running the branch thinky posted.

one thing I did not check in Cura was Z fade.

Grimshadows on 24 Apr 2018

Sorry, I meant the 4988s don't interpolate the steps up to 256 like the tmcs

viperchannel on 24 Apr 2018

Someone up a few post back mention EMI as a possibility or Host client lag. I will be changing up a few things to see if maybe Cura is getting behind or overloading the Arduino. I am on a ten foot USB cable so the EMI thing sounds like a good thing to look at. Then again... S3D no issues.

@Grimshadows I don't even use USB connection currently, my printer is in the other end of the room, I print from SD-card. And I don't use cura, only Simplify3D...

Jartza on 24 Apr 2018

👍1

@thinkyhead I've just tried the latest bugfix-1.1.x branch d429d5a but with your suggested updates to M204 & M205 (adding stepper.synchronize();).

The print, was better, but still had layer shifts.

autonumous on 26 Apr 2018

The print, was better, but still had layer shifts.

That's an interesting clue, but still leaves many possibilities open.

The fact that this doesn't occur with 1.1.8, combined with the fact that it tends to occur on curved sections, implicates any one of the changes made to the planner in the last 5 months. So I'll pick a few of those changes and post some test branches with isolated parts reverted. From that process we can narrow down which part of the code needs to be examined more closely.

thinkyhead on 26 Apr 2018

There are very few differences between 1.1.8 and the current bugfix-1.1.x in the planner/stepper code, but there are one or two that might have some effect. So for our first test, this branch reverts a part of the planner that deals with block chaining:

https://github.com/thinkyhead/Marlin/archive/bf1_revert_planner_test.zip

Please test this branch and report whether the behavior changes for better or worse.

If possible, also check whether the stepper drivers run any hotter when comparing the current code to 1.1.8. Sudden stops and lost steps at random points are a strong indicator that the stepper drivers are going into thermal protection mode.

thinkyhead on 26 Apr 2018

I'm willing to bet those shifts are are 1 full step on your motors. Can we crank up the power to your stepper motor drivers? Or give the stepper motors more voltage?

Roxy-3D on 26 Apr 2018

So I ran the tests I promised.

1.1.5 and 1.1.8, I actually couldn't get running well. Overall the steppers behaved horribly on them. Not sure why. In fact, at the same current settings I couldn't get them to move well (lots of shuddering and loud noises). I think I may be missing a setting or have my pins.h wrong, but I don't have more time to invest to dig deeper. Today I also tested with the 1.1.x-bugfix and the latest version of the TMC driver, no change.

I installed heatsinks on my trinamics and the steppers and bumped the current up to 1150, a number someone had thrown out in other threads, this seemed to fix it. So for me, I think I have to admit that this problem was mechanical. At least that's what my current round of tests showed. I suppose I need to work on my bearings. However, if I note anything like this cropping up in longer prints I'll report back.

dammitcoetzee on 26 Apr 2018

@Roxy-3D The shifts vary. From around 0.4mm to a couple mm. So it is not always just one step. I tried cranking up the steppers untill my motors were at around 60° with the same result. It didn`t help.

@thinkyhead It is not only in curved sections, but maybe it shows more often there. I printed a part which only had straight lines paralell to the X and Y axis and some lines that had an 45° angle and some honeycomb infill and got shifts, but the part had no curves at all.

I don't use them anymore. But if it is a thermal protection problem, shouldn't that been shown as an error message and stop the print when using SPI connected drivers? Some reported they are using the 2130s and have shifts. But the 2130s definitely stop the print when they go in thermal protection.

viperchannel on 26 Apr 2018

I have no idea if it has anything to do with that, but I just remembered that I had an additional Problem since I played around with the different versions. I can`t tell on which version it exactly happened, but when cancelling a print via Octoprint, I gut stuck in "cancelling", the printer stopped printing, but kept holding the temp. I had to dosconnect and connect to continue working. Just as an additional clue, maybe it has something to do with all of this.

viperchannel on 26 Apr 2018

Does anybody have a file that can reliably reproduce the problem?
Because I need to change my position to Can't reproduce. For me it was a hardware problem.
I just printed a 40x40mm cylinder and it came out better than ever. The printer was cartesian with AVR and TMC steppers on stealthChop.

teemuatlut on 26 Apr 2018

In my case, the only work around is to not make my belts too tight, to reduce the mechanical load on the motors, but of course that negatively affects print quality if taken too far. The correct tension on belts is such that you should be able to "pluck" the longer side like a bass guitar string, and hear a "note" in that instrument's range. I can't draw mine that tight without causing stalls, even with the motors driven at relatively high current.

Jerk/accel settings that are good for light load probably won't be good for the moderate load that comes with otherwise-properly-functioning printer hardware, and if the firmware's not applying them correctly to begin with, then even light load may still skip.

Increasing drive current helps for some bots, but that makes the machine louder, and isn't good for the driver modules if you don't have a fan on them.

My guess is that @teemuatlut and @dammitcoetzee are experiencing something along these lines.

VanessaE on 26 Apr 2018

Oh also, @teemuatlut I was able to reproduce the layer shift with https://www.thingiverse.com/thing:1363023 by the way (it's one of the models I use when recalibrating or dialing in a new filament). My response was to loosen the belt on the affected axis (X), as described above.

VanessaE on 26 Apr 2018

I was able to reproduce layer shift with the "Y motor mount" (https://www.thingiverse.com/download:3662264) of this project https://www.thingiverse.com/download:3662264 .

If you print it in a way that the large hole stand verticaly the layer shift always occures at the high, where the large hole closes and the two small holes begin.

ikarisan on 26 Apr 2018

Minor correction: even loosening the belt isn't enough. I've got mine pretty much on the edge of acceptable, but still getting occasional layer shifts.

VanessaE on 26 Apr 2018

Having the same issue with simplify 3d and a prusa i3 clone. Skipping started with the new bug fix. I currently have active cooling as well so its not overheating.

thjubeck on 27 Apr 2018

Sorry to come back with bad news. Did manage to create this issue the other night. Unfortunately I can't share the parts since they are for work. I could print a plate of parts separately no problem. Every time I put them together I'd get this layer skipping. Simplify 3d. One thing I'll note in these models is that the skipping really seemed to reliably happen after 10-20 layers.

dammitcoetzee on 28 Apr 2018

There are very few differences between 1.1.8 and the current bugfix-1.1.x in the planner/stepper code, but there are one or two that might have some effect. So for our first test, this branch reverts a part of the planner that deals with block chaining:

https://github.com/thinkyhead/Marlin/archive/bf1_revert_planner_test.zip

Please test this branch and report whether the behavior changes for better or worse.

If possible, also check whether the stepper drivers run any hotter when comparing the current code to 1.1.8. Sudden stops and lost steps at random points are a strong indicator that the stepper drivers are going into thermal protection mode.

thinkyhead on 28 Apr 2018

(and I'm going to keep posting that until someone tests it and gives a report.)

thinkyhead on 28 Apr 2018

👍1

thanks @thinkyhead for your branch, I’m going to try this today, just diff’ing the configs now. Will report back with progress.

autonumous on 29 Apr 2018

👍1

I look forward to your report, whether the reversions to 1.1.8 code help or not. The planner / stepper methods are just the first thing to rule out. It could certainly be due to code changes elsewhere, such as in the Temperature or LCD code, so if the lost steps still occur we will want to look at those changes next.

thinkyhead on 29 Apr 2018

Tried @thinkyhead “bf1_revert_planner_test" firmware, disabled LIN_ADVANCE, used same gcode as previously, still have a very slight layer shift in the same direction. Nothing like i had previously seen in the earlier prints. I can’t really say if this is better or worse than the previous test (adding stepper.synchronize();))

Complete line up of prints, oldest (left) todays right)

The trouble is, it just seems too intermittent, i’ve been printing other things with the most recent bug fix, and i’ve not seen anything like i was seeing originally last week. I’m seeing very minor layer shifts now in the Y direction. Previously (<1.1.8) i swear, never had any layer shift issues, but maybe i just never printed anything that would have ‘triggered’ the issue.

I originally assumed it was something on my machine, or something i had caused (my own tweaks etc), but was strange coincident that somebody else had experienced the same issue and logged a bug report around the same time.

I just wish i could find a small quick print that had the issue.

Looking back over the thread, one thing i’ve realised, my shifts have not been so dramatic since swapping the driver over, maybe i’ll swap back again..

autonumous on 29 Apr 2018

I’m assuming that all these layer shifts are due to lost steps, so they will always be in the direction away from the moves that trigger them. In other words, if the bed is moving backwards when the glitch occurs the layers will be shifted towards the back of the bed.

The thing about spurious lost steps is that we would expect them to occur at random, so there would be equal amounts of shifting in both directions, at least with symmetrical objects like cylinders. But even with cylinders the layer shifting is mostly in one direction, and more concentrated on a single axis. With a perfect cylinder (vase mode being ideal) we should expect to see all the glitches averaging out if the issue is purely software-caused. But even with a perfect cylinder, shifts still seem to be concentrated on a single axis and in a single direction. I’m not sure what that adds up to, but I’m sure it is relevant.

For the most problematic objects, try rotating them 90 degrees on the bed as part of the testing, to see if the error moves to the other axis. Then try rotating them 180 degrees to see if the layer shifting switches its direction. Finally, see if there’s some way to get the slicer to do perimeters counter-clockwise instead of clockwise, to see if that has an effect.

thinkyhead on 30 Apr 2018

Hey troubleshooters…. Can I assume everyone has tested both _with_ and _without_ bed leveling, to determine whether that is involved? And, you know, have you also tried turning off _other_ features to see if the problem goes away (or doesn't). It would be very helpful to know what minimum set of options is required to trigger step loss.

thinkyhead on 30 Apr 2018

Hey troubleshooters…. Can I assume everyone has tested both with and without bed leveling, to determine whether that was involved?

@thinkyhead ahh, fair point, I only tested with bed leveling (bilinear) enabled. I have some time tomorrow to run test prints, I'll test without leveling tomorrow. For now, I'm back to 1.1.5 because that's least problematic for me (and zero layer shifts, even if I go with higher acc/jerk values, only ghosting increases).

Jartza on 30 Apr 2018

That last time I got a layer shift, bed leveling was disabled (not even compiled-in).

VanessaE on 30 Apr 2018

👍1

And… SD printing or host?

thinkyhead on 30 Apr 2018

And… SD printing or host?

I'm always printing from SD, printer is too far to use USB 😄

Jartza on 30 Apr 2018

Host. My bot lacks LCD/SD.

VanessaE on 30 Apr 2018

Using octopi, SD support was not enabled but LCD.

viperchannel on 30 Apr 2018

Not been able to do any further testing, but to answer the above questions,
All my prints have been via octoPi, but with SD support compiled in. Also using UBL bed levelling. I’ve only used S3D. I have not tried printing objects in different orientations, i will as soon as i can.

autonumous on 1 May 2018

Cool. I downloaded and sliced the valve with S3D and Cura, and if I can reproduce the issue, I should be able to zero in on the day that the issue first started with no more than 7 tests, since 1.1.8 was published just over 120 days ago.

thinkyhead on 1 May 2018

Host. My bot lacks LCD/SD.

Thank You!

You know what we need to do??? We need the configuration.h files from everybody that is seeing layer shifts. We should be able to see where the configured options overlap.

Roxy-3D on 1 May 2018

Here's mine: marlin-configs-20180430.zip

VanessaE on 1 May 2018

Here's mine: marlin-configs-20180430.zip

@autonumous @Jartza @dammitcoetzee @Grimshadows (we have @viperchannel's config!)

Roxy-3D on 1 May 2018

@ikarisan I printed the AM8 Y motor mount on the very latest bugfix-2.0.x commit and saw no skipped steps.
@VanessaE I've also printed the CtrlV V3 test piece without an issue.

teemuatlut on 1 May 2018

👍1

I printed only round object the last few days and had shifted layers in every print. please find attached my configs from bugfix-1.1.x
I do not have bedleveling at all, print is always through USB from S3D.

I am using 1.1.6 now where everything works fine.

Here my configs from 1.1.x:
Archiv.zip

nudelpapst on 1 May 2018

Apologies for delay, attached are my configs

Also, I’m currently printing in a different orientation, will report results

Autonumous_Archive.zip

autonumous on 4 May 2018

I just got a shift in the x-axis while printing. I did notice two warnings when I compiled bugfix 2.0.x. One was comparing an unsigned integer to signed integer in stepper.cpp. The other was a pointer initialization type error in one of the libraries. An improperly initialize pointer could cause strange things...

rgw78 on 4 May 2018

So following on from above, print finished, was all looking good went out of the room and came back to find a layer shift. Again in the Y direction, despite the model being rotated by 90 degrees.

Interestingly I did swap the drivers (A4988) back, (X<->Y). I would say the layer shift is certainly more pronounced in the current configuration.

autonumous on 5 May 2018

autonumous - rough idea of which layer the shift occurred? Mine is in the x-axis and is suspiciously close to layer 256.

rgw78 on 5 May 2018

I've printed the very same whistle as @autonumous and try to push a bit the speed to check if this was a driver temperature issue(Sliced with simplify3d). "Unfortunately" no shift on my print. I do use the bugfix 1.1.x with most of options enabled (SD, Proprietary screen code, Power resume, Emergency parser, Advanced pause ....) and my printer is an anycubic i3 mega with TMC2208. One HW mod I've made is about driver cooling with better fans. Not sure this help.

What I do know is that my modified firmware version has been used by another i3 mega owner and he did see shifts after hours of print (like 8 hours of a 10 hours print). The randomness of the issue looks really related to SW/HW mix.

Is there an easy way for us to compare temperatures of steppers/motors between two firmware versions when printing the same gcode?

systemik on 5 May 2018

An IR Thermometer does the job pretty well for the stepper motors. But it is not reliable for the hotend.

viperchannel on 5 May 2018

@rgw78 - To be honest the it really does seem to vary. I’d say the lowest i’ve seen is around ~5mm /layer 25. The most recent was about 14mm/layer 70.

I’ll try to measure the heat of the motors next

autonumous on 5 May 2018

On my prints it varies. Same object printed several times. Once it happened between the first and second layer, on another it was perfect untill around layer 250. So it is totally random

viperchannel on 6 May 2018

To everyone who are experiencing this issue, please test the following versions. Neither I nor thinkyhead can currently reproduce the issue which makes it very hard to debug.
It doesn't help us much if all we know is things work in 1.1.5 or 1.1.6 as it has been a very long time ago and there have been many changes since. We need you to help us narrow down the commit(s) that introduced the issue.

Planner changes reverted

April 1st Commit 885ad70c8b7c6e53948624f0c41bd9ad842f4c96
March 1st Commit b86125c6d6e2c7b1a9c5120a625647c0e3a4a99d
February 2nd Commit 68cff5f2451ef10314face0c73bfcb3d4285425e
Jan 1st Commit 949191215ba9fbbd6b2fd0d45a3e01c348235483

teemuatlut on 6 May 2018

❤1

I tried the revert planner firmware and can report the issue is still present. Before I used 1.1.8 with A4988 drivers. With this combination I had no issues. The issues started when I installed the TMC2130 drivers. At the moment I am running them at 875mA, CoreXY setup.

stefan85 on 6 May 2018

I’ve read through this thread again and again, really seems odd, the symptoms are similar for all (layer shift) but no consistent way to reproduce it. As I’ve said before, I’ve not eperienced one layer shift issue on this printer and printed much larger (30cm high) prints in the past with no issue. Im sure thats the same for others. For me the issues just seem to be random, not at any particular layer/time in to the print. I can’t even say when the issues occurred, i only noticed it within a day or so of the start of this thread, and had printed anything for a while before hand.

The other day when swapping my drivers back to their original position, the results were more pronounced, shifts of multiple mm, more like the prints originally reported. I’d not see anything quite that bad, since swapping the first time around, maybe a mm or even just a couple steps.

Today i just checked the voltage of the drivers (A4988), all 5 drivers were set to ~0.36V. I recall setting these when i built the printer in 2016. I compared these to the online build docs, they state 0.55-0.6V !! Not sure if these were later revised, i’ll look through the forums. Else not sure why i set them lower than the recomended values

Anyway I’ve tried setting them to a higher value for now just trying ~0.500v and replicating the above print. I’ll also keep an eye on the motor temps (starting temp 25 degrees)

How hot is too hot with the steppers, "I normally go for too hot to hold"?

autonumous on 6 May 2018

I’ve read through this thread again and again, really seems odd, the symptoms are similar for all (layer shift) but no consistent way to reproduce it.

Today i just checked the voltage of the drivers (A4988), all 5 drivers were set to ~0.36V.

I think you can tell we are not claiming the firmware does not have a bug in it that is causing the layer shift problem. And further complicating the diagnosis is the fact Marlin supports so many different hardware configurations that are user controlled.

I haven't seen the problem. But with that said, my configuration is different than any body else in this thread. And I do something most people do not do: I power my stepper motors as hot as I can. I crank the adjustment voltage on the stepper drivers as high as I can and still hold the stepper motors with my hand. (I am saying _IT IS_ uncomfortable to hold my hands on the stepper motors)

The reason I run my stepper motors so hot is I've never had one fail (yet) and for me... it helps when an edge curls and the nozzle runs into it. I've lost position too many times because a curled edge catch on the nozzle. That still happens... but not very often.

I think it would be helpful for a few people in this thread (that can do it) to crank their stepper motor power way up. Let's see if that helps....

Roxy-3D on 6 May 2018

Cranking up did not help. I tried it until my motors were at 65C. With my January firmware they run with 45C and lower voltage without shifts.

viperchannel on 6 May 2018

OK! Thank You for confirming that!

Roxy-3D on 6 May 2018

Yesterday I flashed version 156bd28160384c351830c6a3c1ae096cedb13ca3 , resetted my EEPROM and reconfigured it to the values I used before the flash. Using the same values for acceleration, jerk and stepper current I got this result. :( :(

img_20180507_085551_small

And the motors are rattling like they lose dozens of steps.: https://youtu.be/f19ILbUh5SQ

ikarisan on 7 May 2018

@ikarisan can you try the commits teemuatlut linked? February 2nd seems like a good starting point. I don't think there is a way out of bisecting the issue.

HenningJW on 8 May 2018

👍1

oops thought i had posted this yesterday..

so after upping the driver voltage to 0.5v...
Initially, print looked better, but on closer inspection, very slight layer shifting. i’ve checked previous prints just to confirm i’m not being too critical, I’m not.

.

I’ll try the above commits as soon as I can.

autonumous on 9 May 2018

what kind of nemas you using because i have my steppers at 875 chineese ones nema 1.7 amps not stock anet ones
those are 1.2

felenna on 9 May 2018

i do admit ive modified my chineese steppers
by grinding of the top pcb paint to get to the heatpipes for a cooling block

felenna on 9 May 2018

I made an observation today on this topic with my Duplicator 6 running on bugfix-V2.0.x from May 11th (commit 7d78f34) which could be of relevance. The printer was printing fine for about 86 layers, but then I could hear the "thonk" sound, so I checked the print and saw a layer shift on the X axis (about +1.5mm). I immediately paused the print in octoprint and re-homed XY, as I was having success with this method before to save the print when I once accidentially blocked the printhead and it missed steps.
But then when I resumed the print today after re-homing XY, it did not continue in the correct position, but continued at the wrong, layer-shifted position instead. I guess that is not how it should behave?

AcHub on 12 May 2018

I started getting this too (occasional random Y axis shift) after upgrading to latest 1.1.x-bugfix and it happens regardless of stepper current. There are no mechanical issues either.

I'm now testing with a replacement driver on that axis just to exclude that as a possible source.

orcinus on 13 May 2018

_Everyone please test these._ Once we know roughly when the issue starts, we'll post 4 more. Then 4 more after that, until we know the precise day, hour, and millisecond when the issue began.

April 1st Commit 885ad70c8b7c6e53948624f0c41bd9ad842f4c96
March 1st Commit b86125c6d6e2c7b1a9c5120a625647c0e3a4a99d
February 2nd Commit 68cff5f2451ef10314face0c73bfcb3d4285425e
Jan 1st Commit 949191215ba9fbbd6b2fd0d45a3e01c348235483

thinkyhead on 13 May 2018

I am currently not at home, so I cannot tell you the exact date. But I am running a January Bugfix without any shifts for several days and several kilos of filament. The problem does not exist in my Jan build. I will tell you the exact download date when I am back home. It must be from around January 23rd.

viperchannel on 13 May 2018

Currently experiencing similar issues on my CoreXY printer running 2.0.x on an AZSMZ board. Latest 2.0.x shows the issue, so do versions going back to Feb. Seems like the printer makes clunking noises on sharp curves with many short segments, and sometimes this is enough to cause layer skipping. Enabling bezier jerk control from the newer versions makes the clunking worse. Very noticeable at high speeds (150mm/s), but the clunking persists even at slower speeds (70-80). Changing accel seems to help, but jerk has no effect. I've attached a zip with a section of a print that shows the issue, as well as my configuration.

I'm printing from SD, but have noticed that in the sections making the clunking noise, the printer sometimes moves slower than normal - is the buffer not being filled fast enough?
config and gcode

alexyu132 on 14 May 2018

Hummm... Just a question ... Does anyone here have skips using a printer based on a 32bit controller ? ... Or the skips are exclusively related to AVR based motion controllers ? ... The cl cl cl cl cl noises could be caused by the planner buffer being empty. I have myself seen this issue when debugging on debug builds, but never on production builds of Marlin, but i run a 32bit controller...
Maybe something is eating too much cycles...
Wild guess... Maybe it is UBL or ABL related ?

Please, to help @thinkyhead , test the builds he posted, post the results, and also mention _the controller being used, and if you were using ABL/UBL , linear advance and its settings_.

ejtagle on 14 May 2018

This issue is one of those Heisenbugs for me, so I have to bow out of any tests. :confused:

(you know, the kind of glitch that only happens when you're not watching for it and/or not ready to log/backtrace/etc.)

VanessaE on 14 May 2018

@ejtagle The AZSMZ board I'm using is a 32-bit board, originally designed for Smoothieware. I'm not using any form of software bed leveling, which is why I'm confused why the much faster 32 bit board would be struggling.

alexyu132 on 14 May 2018

@alexyu132 It could be a bug. Marlin uses a "cooperative" approach between the multiple tasks that it has to run to fill the Planner queue, so if anything is taking too long to compute, or if anything is waiting too much time before yielding to the next task, the planner queue could become empty. And that is bad, because the motors could suddenly stop, leading to lost steps.
I have never seen this problem happen to me (yet). There is a very strong ongoing effort to optimize the code to make it run faster, but it is not ready for primetime yet (https://github.com/MarlinFirmware/Marlin/pull/10688) ... Of course, anyone here that wants to try it, of course it can. Maybe it solves the problem, maybe not, maybe has other problems. But it would be great to find out the cause of the layer shifts, to be absolutely sure we have addressed and fixed them

ejtagle on 14 May 2018

Here's a video of the rattling vibration issues I'm having.

Given that enabling bezier jerk makes the issue worse in my case, it does make sense that something could be taking too much CPU time. @ejtagle I'll test your optimized branch to see if that fixes the issue.

alexyu132 on 14 May 2018

@ejtagle The optimized branch seems to run very erratically on my board, as if something is causing the planner to freeze for extended periods of time. Some sections run normally, but others are extremely slowed down. At times it looks like the print head is snapping to a position rather than smoothly accelerating. Do you know what could be causing this behavior?

Video of the issue

I also noticed that it hangs before printing when configured to home Y before X, but switching to normal homing order solved this.

alexyu132 on 14 May 2018

As promised:
Built from jan 24th works,
using ABL, on a ramps 1.4 with and without Lin Adv without any shifts.
Going to test the later ones now

viperchannel on 14 May 2018

👍4

alexyu132: I have also seen this behaviour. I´m not sure if it is related to the junction deviation algorithm (it can be disabled by making sure JUNCTION_DEVIATION is commented out in configuration_adv.h= , or a bug in the planner itself, or maybe it is a "feature" of the planner changes.
There were changes in the planner that restrict the acceleration when the moves are composed by a lot of consecutive small movements, trying to keep centripetal forces small.
I found out that many slicers use some kind of "vibration" mechanism to fill small walls, and the new code would probably think they are 180 degrees zig-zag movements, so they are slown down quite a bit (the older algorithm does allow such movements without slowing down... As you see, there are some things still pending to be tuned or fixed (but you gave me valuable information on the homing order... ;)

What i fear with the current branch (not the PR) is that using UBL + all the other goodies is overloading the processor and there is simply not enough processing power to keep the planner queue full sometimes.... That new PR you tried has several speed optimizations, so there is hope to fix this problem

ejtagle on 14 May 2018

I'll try disabling junction deviation and see if that fixes the issues.

My printer doesn't use any bed leveling features so those aren't compiled in on my build. With them disabled in the config, would they still cause a speed penalty?

alexyu132 on 14 May 2018

@alexyu132 No, a disabled feature in the config should not cause penalty. Features disabled at runtime (using Gcode) may cause penaties though

ejtagle on 14 May 2018

Behavior seems unchanged after reverting to normal jerk.

alexyu132 on 14 May 2018

Having the same issues described in this thread on a CoreXY with TMC2130 drivers on bugfix-2.0.x, no "fancy" features enabled (lin advance, bed leveling, junction deviation, etc.). Trying older builds at the moment, starting from https://github.com/MarlinFirmware/Marlin/commit/f8227abf1cbb87231eef2412ddf35d94a4379dd1, though struggling to get the configuration working at the moment (X and Y don't work properly for some reason). Will update if I get the configuration working.

grownseed on 16 May 2018

A few days ago, someone posted a new 1.1.8 Bug Fix version specifically for the CR-10 3D printers. I managed to get this version to work, but experienced the same exact layer shift issues as everyone else here. I am currently using the high-torque motors from E3D, with the vref pots on my MKS Base V1.5 board set to 1.12v (I still don't know if this is correct) on my multimeter. Motors run fine, and they seem warm but not hot to the touch when printing.

I have a version of Marlin 1.1.8 (no idea which dev version this is) that runs perfectly fine with no layer shifts. Once I switched back to this version, my layers shifts were gone. With the brand new 1.1.8 Bug Fix for the CR-10, the layer shifts only seemed to happen on the Y-axis, for whatever that's worth.

Ceemo on 16 May 2018

Could not get my configuration to work for https://github.com/MarlinFirmware/Marlin/commit/f8227abf1cbb87231eef2412ddf35d94a4379dd1, so decided to try https://github.com/MarlinFirmware/Marlin/commit/0945674ba237747756b85b96ffc05ffc7eb63dce (2015-04-15) instead.

Configuration changes to make it work:

had to disable SD support (SD init error)
decreased current to X and Y to 800mA (from 1000mA as I had it on latest), otherwise sensorless homing would kick in right away in one direction, drivers (with cooling) would immediately overheat in the other, movements would be erratic

The stepper motors also ended up being significantly cooler (from ~75C to ~40C) and the estimated printed time (sliced in Cura 3.3.1 and printed through repetier host) was considerably more accurate.

Unfortunately still ended up with a random layer shift. Trying at https://github.com/MarlinFirmware/Marlin/commit/8922b56b588bc9c54c3526986b8a056564828abd now.

grownseed on 17 May 2018

Tried https://github.com/MarlinFirmware/Marlin/commit/8922b56b588bc9c54c3526986b8a056564828abd (2018-03-15), had to switch X_MIN_ENDSTOP_INVERTING/Y_MIN_ENDSTOP_INVERTING to false on top of above changes. Also failed with a random layer shift, moving to https://github.com/MarlinFirmware/Marlin/commit/c49844df66213dfab5dbcc12d772f0a4279f2141 (2018-02-15).

grownseed on 17 May 2018

I can also report that layer shifts only occur on the y axis. Never on x

stefan85 on 17 May 2018

Also had a layer shift towards the end with https://github.com/MarlinFirmware/Marlin/commit/c49844df66213dfab5dbcc12d772f0a4279f2141, moving to https://github.com/MarlinFirmware/Marlin/commit/571ca728246447d99663e0f23f71cdb2765b69b6 since that seemed to work for other people.

edit: for some reason with https://github.com/MarlinFirmware/Marlin/commit/571ca728246447d99663e0f23f71cdb2765b69b6 (or commits around it), homing works but nothing else... going to give this a little break as I'm starting to lose my patience.

grownseed on 17 May 2018

I have a 2.x version from 15th February. There are no issues. I was using it for several months without any problems. After updating a few weeks ago i got this issue.
@thinkyhead maybe this helps

smoki3 on 17 May 2018

@smoki3 do you know if your version was before or after https://github.com/MarlinFirmware/Marlin/commit/c49844df66213dfab5dbcc12d772f0a4279f2141 ?

grownseed on 17 May 2018

It looks like my build is before this commit. This commit is from 16th my working build is downloaded on the 15th

smoki3 on 17 May 2018

Can confirm that February builds seem to work more smoothly than the latest. Currently running a build from Feb 9 (not sure of the exact commit) and the roughness is much less, though still present occasionally when there are lots of tiny segments.

alexyu132 on 17 May 2018

Just remembered that I have a board running an old bug fix that has the shifting issues: downloaded feb 17 @ 6:05PM, if the folder info is correct. I had swapped that board out because I thought it might be a bad MEGA at the time.

AletheianAlex on 17 May 2018

Tried https://github.com/MarlinFirmware/Marlin/commit/d6e29e95974dd3368abc638a87b79ab9e8a41e78 which ended up in another layer shift towards the beginning.

I then tried https://github.com/MarlinFirmware/Marlin/commit/34160806c0e7c199d11c06b56722ee500d97011d and happy to announce it doesn't seem to have the issue!

Both of these commits are within a day of each other (2018-02-01): https://github.com/MarlinFirmware/Marlin/compare/34160806c0e7c199d11c06b56722ee500d97011d...d6e29e95974dd3368abc638a87b79ab9e8a41e78

This is obviously not entirely conclusive as I've only been doing one test per build, but hopefully someone else can confirm.

I'm going to take a little break from this now, feed my trash can and enjoy my printer for a little bit ^^

grownseed on 18 May 2018

👍2

@grownseed Thank you very much for your debugging! If someone on 2.0 experiencing the problem can confirm that 3416080 is working this would be really helpful. I haven't really noticed anything that could cause shifts in the commits, but I am not really familiar with the code base.

Has anyone tried if the layer shifts are present when compiling without LCD support?

HenningJW on 19 May 2018

Looks like the latest commits of ejtagle's Marlin fork (with the planner optimizations) no longer have any layer shifting for me. Movement is much smoother even in areas with very small segments.

alexyu132 on 20 May 2018

@alexyu132
Which version are you referring to? Daily Bug-Fix? Where can I download this version to give it a try?

Ceemo on 20 May 2018

Sounds like…

The issue arose some time in February, probably between Feb 1 and Feb 15.
The planner refactoring work from @ejtagle (#10688) is fixing the issue.

We hope to have most of #10688 merged in the next day or two. Meanwhile, feel free to test that branch, which is posted here: https://github.com/ejtagle/Marlin/archive/bugfix-2.0.x.zip

thinkyhead on 20 May 2018

I tested 16f92dc, which includes #10688, with three prints today. Two came out well, but with the other one I was getting massive layer shifts (3 shifts in just 5 or 6 layers).
I won't be around for about a week so I can't do more testprints right now to verify the effectiveness of 16f92dc.
One thing I noticed was that when printing, the printer sometimes seems to stall for a brief moment in the middle of the line, which also resulted in small blobs on the print surface. I especially noted this when printing surface areas with layers in a 45° angle/diagonally to XY, but I also see those blobs on the outside walls of my prints in X or Y direction, so I suppose it stalled in random areas during the print.
I have Linear Pressure Control and Bilinear Auto Bed Leveling with 4x4 gridpoints activated in my config. I could do some more testing in about a week with those features disabled if that helps with narrowing down the source of the layer shifts.

AcHub on 21 May 2018

I also am experiencing that brief stalling, that originates vibration and blobs.
I had reported it at #10688 and it still happens now that that PR was merged to bugfix-2.0.
It happens (at least with AVR boards) if LIN_ADVANCE is enabled, even if with no ABL, bezier jerk control disabled, ENDSTOP_INTERRUPTS_FEATURE enabled... I have been playing with the configurations and for as long as LIN_ADVANCE is enabled (even if set to 0), there is brief staling.

FiCacador on 21 May 2018

@FiCacador : LinAdvance will be fixed ... ! ... But probably there will be a reimplementation to make it compatible with Bézier curves. Just allow us a bit of time, so things settle a bit ... That thread nearly ended in a disaster: Luckily, everything was explained, the merge took place. We know there will probably be some things to be fixed. But trust me: The planner and stepper are at least 10x improved in performance.. ;)

ejtagle on 21 May 2018

👍3

i am very new to printing and i am facing this layer shift issue too.

i am on 1.1.x-bugfix . how can i solve this issue?

bentech4u on 21 May 2018

Checked out fresh 1.1.x-bugfix, which - from what i can tell - includes the planner refactor/optimizations. The blobs are still present, without LIN_ADVANCE. But more importantly, there are sudden super-jerky Y moves on short segments, that sound as if they'll tear my belts to shreds.

Bezier jerk disabled, 8-bit AVR.

Edit: tried with 2.0.x-bugfix too, with that one, i was occasionally getting odd stutters and jerks even when homing. Not quite sure if that was due to my config, though, as i've never used 2.0.x before.

orcinus on 22 May 2018

@orcinus — Yeah, I merged the equivalent changes from #10688 into the bugfix-1.1.x branch about 11 hours ago (1pm US central time) and I did my best to get the exact equivalent to that PR, but it's always possible I let a typo fall in.

Since that PR didn't get extensive testing, it's also possible there are unforeseen side-effects of those changes, which are quite sweeping. But right now it's vital for us to keep bugfix-1.1.x and bugfix-2.0.x as close as possible until we get 1.1.9 released, and we very much want to know if it helps with the layer shifting, so I couldn't let the changes wait once I allowed them into bugfix-2.0.x.

Since we're making you and other users the "canary in the coal mine" on these changes, please proceed with caution and stick close to the power-switch to avert potential damage!

What we really need now is to get in-depth analysis of the behavior following these changes and find out where the edge-cases are. If you're seeing better behavior in bugfix-2.0.x versus bugfix-1.1.x with the same configuration then I'll look closely again and make sure they have perfect parity. I basically did a full paste of stepper.* and planner.* so they should match almost exactly.

And then also compare with yesterday's bugfix-1.1.x to see where it behaves better and where it behaves worse. Though I have a client breathing down my neck, I will do my utmost to examine the acceleration and jerk behavior and make sure anything that got broken by the refactor is fixed as soon as possible.

thinkyhead on 22 May 2018

As an aside, do make sure that ENDSTOPS_ALWAYS_ON_DEFAULT is turned off and that M120 is not being used anywhere, just in case endstop noise could cause an axis to stop. I suspect no one has this problem, since it would cause _massive_ layer shifts — often several centimeters at once. But best to eliminate it as a potential contributor.

thinkyhead on 22 May 2018

👍2

I don't know if it could help, but i use this version of Marlin 1.1.8 bugfix : commit e596931aac0c8204f3e42847081a4fece644b8e4 Date: Fri Apr 6 00:30:50 2018 -0500
On a AM8, with TMC2130 (SPI, 24V, spreadcycle only) on X/Y/E, RAMPS 1.4, babystepping, filament change, FULL LCD, SD SUPPORT. I print between 80-120 (infill 150), accel 1500/travel 2200, jerk 15. bltouch with AUTO_BED_LEVELING_LINEAR (before AUTO_BED_LEVELING_BILINEAR without shift also)

And I have any layer shifting with this version (so i will wait before updating).
I can provide my configuration*.h if it could help

kakou-fr on 22 May 2018

I still got a layer shift on latest 2.x! I have bezier jerk enabled, LIN_ADVANCE disabled.
Anyone else?

smoki3 on 22 May 2018

Everyone please ZIP up and post your configurations. We would like to compare them to see what you all have in common. You can also try disabling optional features to see if the layer shifting goes away. Maybe someone will discover that a certain feature is the cause in that process. We also posted links above for branches that you should test for us so we can figure out when the issue started. If you know anyone else with this issue, pass on those test branches. We think the issue started in the first half of February, and are reviewing commits from that period, but if we can narrow it down to a specific day with your help then we can solve this thing a lot faster. Thanks!

thinkyhead on 22 May 2018

The other cause could be an specific driver... So please, also post the driver model your are using for axis (TMC23xx, TMC21xx, TMC22xx, DRV88xx, A4988 ... )

ejtagle on 22 May 2018

configs.zip
I using TMC 2130 on X an Y axis

smoki3 on 22 May 2018

No more layer shifts for me on the latest 2.0.x with the planner refactor merged. Bed levelling, linear advance and bezier jerk are disabled:
marlin-conf.zip.

I have tried this build multiple times, pushing the speed up and no layer shifts (I do have some blobs but I'm ruling other issues out first).

I have TMC2130 drivers on all axes, all stealthchop (hybrid mode disabled), sensorless homing. Do let me know if you need any other information.

grownseed on 22 May 2018

Since layer shifting is mostly on the Y axis with moving beds (the heaviest load) we are beginning to suspect underpowered or poorly calibrated TMC2130 drivers. However since this isn’t affecting 1.1.8 we think it could be due to changes in the TMC2130 code or configuration defaults too.

thinkyhead on 22 May 2018

Marlin.zip
x and y tmc2130 standard marlin

felenna on 22 May 2018

@thinkyhead There have been no changes to TMC2130 startup parameters between 1.1.8 and bugfix-1.1.x. We've also had people with Allegro drivers reporting the issue. So while certainly not impossible, I just don't know what we would have changed with TMCs that might have introduced this.

teemuatlut on 22 May 2018

Maybe I can add a little to what @thinkyhead is saying. Now I remember I also started to have layer shift in the Y axis, in probably January or February. They were very suspicious. But at that time i was playing increasing printing speed, so my first suspect was the driver. At that time I was using A4988 drivers for all axes except the extruder (that was a DRV88xx)
After a lot of investigation, and following several online guides with no apparent success, I began suspecting the driver was not reaching the calibrated maximum current, took one of the drivers, reverse engineered the circuit and compared with the datasheet.
I found out that (and this applies to all drivers) that all drivers use a resistence to measure motor current. In the ones I had, that resistance was not of the value that the popolu original design was used.
That changed value resulted in a complete change of current scale.
Chinese manufacturers, in an attempt to make easier to adjust driver current with the preset potentiometer, changed the scale so full clockwise rotation of the calibration potentiometer resulted, instead of a 2A peak motor current (as the original popolu design) just 1A of peak current.
And that makes an incredible difference.
knowing that, i just set the potentiometer to the maximum value, and layer shifts disappeared.
My rule of thumb: If the motor does not heat, raise current!!

ejtagle on 22 May 2018

👍1

If you have A4988 or DRV8825 stepper drivers that you can test, we would be interested to see if it fixes the issue for anyone.

After reviewing commits in February, the most likely change I can find so far seems to be from February 3. Please test Marlin from Feb 1 and then Feb 4 to see if one has the issue and the other doesn’t.

I’m offering a US$100 bounty to whoever can determine the exact change that raised this issue.

thinkyhead on 22 May 2018

@thinkyhead I'm on a coreXY and I was getting layer shifts on my TMC2130 set to 1000mA on both X and Y. In more recent builds (April and May, before the planner refactor), I also noticed that the drivers were heating up considerably more than with previous builds (getting a lot of temperature warnings, almost only on Y), and so did the motors. Builds from before April (sorry haven't had time to narrow that one down) and the current build don't have that problem. I'm wondering if the layer shifts people are reporting are sometimes caused by issues with the drivers themselves, and sometimes by this bug (which in my case I'd narrowed down to a few commits on Feb 1st), compounded with recent changes which are making the drivers and motors work a lot more.

grownseed on 22 May 2018

As i told previously, this could be a hardware issue. As you stated, the Y axis is the heaviest one, so it requires quite a bit of driver current. In my case, layer shifts happened not at the beginning... At the middle of the print. The reason was that at that height the tip crashes with a big lever formed by the piece that was printed, and a lot more of torque is required to overcome that extra resistence. Just raising the current solved all the problems (I know, it´s very hard to measure...)

ejtagle on 22 May 2018

My motors were not even warm before; I raised their current (from ~750 mA to ~1.2A), making them quite warm (around 54°C as measured with a thermocouple stuck into the divot in the center of the exposed back end of the rotor spindle). I have not seen any more layer shifts, at least not below 120 mm/s. A4988 drivers all around.

VanessaE on 22 May 2018

👍1

@ejtagle I use it on a core XY. The X and Y axis have the same load. I also tried to increase the motor current but this doesn't help. I also printed ultra slow for a test.

Just my thoughts: It it possible that MINIMUM_STEPPER_PULSE have an influence? I know for example that DRV8825 doesn't work with MINIMUM_STEPPER_PULSE 0 on 32bit boards.

What i also recognized, that i had to update the TMC binaries with the update.

smoki3 on 22 May 2018

I think a hardware issue was the first thing suggested but what makes it suspicious is users can go back and forth in commits replicating the issue.
We just need to reliably narrow down on the specific commit(s).

teemuatlut on 22 May 2018

just thinking along any one got the layer shifts also on 24 volts ???

felenna on 22 May 2018

I’m going to postulate that it’s a combination of underpowered steppers with something we did that increased torque at the jerk threshold speeds during direction changes. That would explain why we see this effect multiplied in curves. The reports of extra hot motors is interesting, and maybe that could also be related to some change in our acceleration and jerk code that causes increased torque and thus increased back-current.

So far I haven’t seen any changes that would directly affect acceleration and jerk, but the Feb 3 changes that I’m asking you to rule out did introduce a new variable passed to the planner: the millimeter length of the move. And this value is then used in place of a calculated length. It’s really very minor, so if it can be ruled out that would be helpful.

Again, $100 to whomever discovers the exact point in time where the issue began, as a small token of gratitude from the RepRap community. So far our best guess is “some time in February.”

thinkyhead on 22 May 2018

wel as far i know most for example anet users get really hot motors because there lighter than the standard nemas 1.7 A i hear that alot they try to cool it etc

felenna on 22 May 2018

wile i got a home build and uses the nema's 1.7 a the only get hand warm

felenna on 22 May 2018

Where can I find the February builds? I will try to narrow it down to an exact date. I have all my prints done that have to be done and have some spare filament to test and finally I have some days off from work.

viperchannel on 22 May 2018

@thinkyhead ... maybe something related to the pulse width driving the motor drivers ? ... Yes, I also think this is a combination of several different causes. Some were caused by underpowered drivers, maybe others by narrower pulses on the step lines (DRV88xx require at least 2uS to reliably work), and maybe there is also related to planner starvation caused by .... IDK..

ejtagle on 22 May 2018

@viperchannel — Any commit from any point in time can be downloaded in full from the Commits tab once you’ve selected a branch on the main Marlin GitHub page.

thinkyhead on 22 May 2018

Thank you. I will start testing in a few hours

viperchannel on 22 May 2018

Lets say I start at Feb28 backwards. Can I use the config.h that I created for Feb28 on the earlier Feb branches or should I modify a new one that comes with the branch every time?

viperchannel on 23 May 2018

@viperchannel based on my tests, you should be fine re-using your config as is for the whole of February. There is a change at some point in March I believe that requires switching X_MIN_ENDSTOP_INVERTING/Y_MIN_ENDSTOP_INVERTING around if you use sensorless homing.

grownseed on 23 May 2018

Ok, thanks. Switching firmware and starting a print is easy to do, but getting the configs right over and over takes enormous ammounts of time. I a couple hours I will start printing. Lets hope I get the layershifts in my first test, so I can go backwards from there on.

viperchannel on 23 May 2018

My setup + observations, in case it helps (as additional data point)...

24V
TMC2100
the bed isn't my Y axis (the Y axis is still the heaviest, since it's driving an independent dual X gantry, but less so than with a moving bed)
there's no rhyme or reason to the layer shifts - they can happen in the middle of a print consisting of mostly the same or very similar layers
increasing the Vref setpoint from 0.6V all the way up to 1V didn't seem to change anything
never had two skips per print, oddly enough, just one - maybe they weren't tall enough?
the skips always seem to happen around Z=5-15mm - might be coincidence

orcinus on 23 May 2018

I tried https://github.com/MarlinFirmware/Marlin/commit/d6e29e95974dd3368abc638a87b79ab9e8a41e78 again, this time bumping up the current from 800mA as I had it when I last tried that commit to 925mA, the print unstuck towards the end, but I didn't see any layer shifts this time around. The other commit I tried from earlier that day, https://github.com/MarlinFirmware/Marlin/commit/34160806c0e7c199d11c06b56722ee500d97011d, went through fine at 800mA. I'm still certain there's a bug causing the layer shifts as it happened on later commits with the drivers at 1A, it's just probably not here, which means back to digging.

grownseed on 23 May 2018

Tried https://github.com/MarlinFirmware/Marlin/commit/0945674ba237747756b85b96ffc05ffc7eb63dce (2018-04-15) again just to make sure at 925mA on the drivers, and sure enough, I got a layer shift towards the end. Looking at all my failed prints through these tests, it seems the shifts always happen around curved moves. I was watching this last print when it happened, I heard a grinding noise as it was curving (fast perimeter of a curved wall), then a thunk noise, and noticed the layer shift.

grownseed on 23 May 2018

Hello All

I have the same issue and run 5 TMC2130, i tested some things in the Firmware and the shifting only began, when i activate the StealthChop mode.

12V
NEMA17 Stepper Motors

7sevenx7 on 23 May 2018

@7sevenx7 that has nothing t do with the firmware issue we are having here. Stealthchop1 is so terrible weak, it produces step loss all the time unless you cut down your acceleration extremely. I was printing in spreadcyle with 2200 ACC and had to go down to 700 ACC in stealhchop1. Now I moved on to the 2208s with stealthchop2 which is much better.

viperchannel on 23 May 2018

That's not quite correct either. Read Trinamic's Application Note 21 for comparison data between the two.
I'm personally running stealthChop with ~2000 acceleration values on a cartesian frame.

teemuatlut on 23 May 2018

It is possible, but that depends on many factors. I just wanted to point out the difference it can make. And that does not depend on the firmware version.

viperchannel on 23 May 2018

I prefer to use TMC2130 with spreadcycle and 24V. Quiet as Stealthchop without any layer skip. accel 1500-2200, jerk 15, print speed 80-150.

kakou-fr on 23 May 2018

I also noticed that there is just one skip per print, always at least 5mm but not more than 10-20, just as @orcinus said.

The skipping is usually only on the Y axis, but once it's skipped on both direction.

I use the 2.0 bugfix branch.

Gen-L
Dual Z (one driver with two motors)
XYZE all TMC2130, low acceleration, max 60mm/s at 750 current with stealthchop.
FYSETC clone with passive + active cooling with 120mm fan built into the bottom enclosure (board is next to the fan at the best cooling position to get the most airflow)

X's and Y's heatsink is pretty hot during the print (can only touch for 2-3 seconds because it's almost burning my fingers)
Z and E is cool.

Maybe it's happening because the driver's are so hot?
Maybe Marlin do something that causes the drivers to burn when they works continously?
Maybe... too many maybe... well at least I hope this info helps you to find the problem.

I always used the 2.0 bugfix branch from the day I bought the printer.
Gonna try the regular 1.1.8

Trontastic on 23 May 2018

1.1.8 has a bug regarding the initial communication with TMC2130.

teemuatlut on 23 May 2018

My setup is a CoreXY printer driven by an AZSMZ board (32-bit LPC1768 board) with A4988 drivers running 2.0.x. Using a 24V power supply. Usually running 3500-4000 accel with 10-15 jerk. For me the layer skipping usually happened on sharp curves with short segments. During many of these sharp corners it seemed like the firmware would ignore jerk limits and "snap" the printhead in place, causing occasional missed steps. Enabling bezier jerk made the problem worse in the problematic builds (maybe too much CPU overhead?). The build from Feb 09 seems to work better than later and earlier builds (Jan/March), but still exhibited some roughness. ejtagle's fork with the latest planner optimizations works the best for me, with smoother stepping and no layer shifting as tested even with bezier jerk enabled.

alexyu132 on 23 May 2018

@teemuatlut ; Maybe that is the answer... The default programming of the TMC drivers works the best ? ... :D ... ;)

ejtagle on 23 May 2018

👍1

I hope not :D The bug resulted in one of the driver registers to be filled with ones and then we had reports of axis going twice as far and twice as fast. What had happened was one of the bits enabled stepping on both falling and rising edges of the signal. On top of that there were a bunch of other settings as well that were not supposed to be set.

teemuatlut on 23 May 2018

❤1

For me the layer skipping usually happened on sharp curves with short segments. During many of these sharp corners it seemed like the firmware would ignore jerk limits and "snap" the printhead in place, causing occasional missed steps.

@alexyu132 this is exactly how I would describe this problem too, and the "snap" causes pretty low and loud "thunk" in my printer. I'm using TMC2100 and A4982 drivers in my printers and both have the same issue. Unfortunately I've been extremely busy lately and haven't had any time to test different branches, I hope to have time next weekend...

Jartza on 23 May 2018

Note that 1.1.9 will be released this weekend regardless of whether this issue is solved.

thinkyhead on 23 May 2018

Could anyone verify if pausing a print, homing XY and resuming the print after a layer shift happened leads to the printer printing back at the correct position or printing at the shifted position? I didn't have a chance to test this a second time by myself yet, but the last time I did a rehome on XY, my printer continued printing at the shifted position which doesn't make sense to me if this would be caused by lost steps.

AcHub on 23 May 2018

👍1

@alexyu132 this is exactly how I would describe this problem too, and the "snap" causes pretty low and loud "thunk" in my printer. I'm using TMC2100 and A4982 drivers in my printers and both have the same issue. Unfortunately I've been extremely busy lately and haven't had any time to test different branches, I hope to have time next weekend...

For me it would happen on smooth sections as well at times.
I even had it happen once during homing (!). Just once.

It sounds exactly as described - a loud low thunk.

orcinus on 24 May 2018

Any chance this is some kind of an overflow?
That, when happens, causes multiple steps to be sent to the drivers instead of one?

orcinus on 24 May 2018

Just fwiw, "thunk" is a good description of the sound my bot made when it got a bad shift before as well. Here are my Marlin configs, just for completeness.

Marlin-configs-20180523.zip

(bugfix-1.1.x, commit de5f69b2852e0e2fadd02f3c5a8d19f41ab2194e)

VanessaE on 24 May 2018

Any chance this is some kind of an overflow?

Anything is possible. If someone with the issue is able to track down the exact change to Marlin that led to this issue then we can all stop speculating and fix it.

thinkyhead on 24 May 2018

Anything is possible. If someone with the issue is able to track down the exact change to Marlin that led to this issue then we can all stop speculating and fix it.

I'd be all over it, but i'm up to my ears in GDPR crap (the dev side of it) 😶

orcinus on 24 May 2018

🎉1 👍1

If someone experiences the problem and needs help bisecting please ask, I am very happy to help. thinkyhead assumes it's between Feb 1 and Feb 4. ~~So in order to confirm that, could someone test that 36a1d122 isn't affected and ad70d76f is affected, I believe that should help a lot. Once that is confirmed please test if 64458592 is affected.~~ So could someone please test if 64458592 is affected? If it is affected please confirm that 36a1d122 isn't affected, if 64458592 is not affected please confirm that ad70d76f is affected.

For bugfix-2.0.x the commits would be unknown b13099de, good? 51d080d2, bad? 2ea4e74e.

Once someone confirms and tests this only 3 additional tests are needed and we have the exact commit it happens, once that is done it should be rather easy to fix.

edit: changed the order of tests.

HenningJW on 24 May 2018

I can only confirm that 68cff5f2451ef10314face0c73bfcb3d4285425e is unaffected (that's where i stopped). Or, at least, haven't had any skips in 2 prints. I did notice a few stutters, but no thunks / skips / layer shifts.

orcinus on 24 May 2018

@orcinus Can you test 36a1d12 then? It is very likely that this version is not affected by the bug as well.

HenningJW on 24 May 2018

I haven't had any layer shift despite having tried to, but I can hear that "thunk" noise on fast curves.
I am trying to find out when this started to happen, so far I can confirm 3 things:

Current bugfix even after the planner optimization still has these thunks, unless JUNCTION_DEVIATION is enabled. (Maybe it should now be enabled as default on all configurations?)
The build from January 29 already suffers from these thunks.
1.1.8 has no thunks.

Any suggestions on builds to test before January 29?

FiCacador on 24 May 2018

@FiCacador please test e2871f0d

henning@Henning:~/Marlin$ git bisect good 99f98890c
Bisecting: 96 revisions left to test after this (roughly 7 steps)
[e2871f0dcd927e36707cf016933f132ea0b777be] [1.1.x] Ensure smooth printer movements (#9149)

HenningJW on 24 May 2018

@HenningJW the build from that commit https://github.com/MarlinFirmware/Marlin/commit/e2871f0dcd927e36707cf016933f132ea0b777be from January 12 has thunks AND the build from the day before HAS NO THUNKS! You have just found the commit WHERE THE BUG IS!

FiCacador on 24 May 2018

I am running for months on a Build that I downloaded Jan 25th without shifts. As soon as I switched to one from April I had the shifts. Switching back to Jan25th and they were gone again. So it must be later that that.

viperchannel on 24 May 2018

It seems like there isn't a single commit that introduces the issue on its own, but certain commits may make the problem worse/better. In my testing I did notice that some builds were rougher than others, with the planner optimized version being the smoothest (but with very infrequent roughness at times, though this may be hardware related).

alexyu132 on 24 May 2018

The fix done at https://github.com/MarlinFirmware/Marlin/commit/e2871f0dcd927e36707cf016933f132ea0b777be was (is) incorrect: The next to be executed movement block was not being planned at all. That happened because block indices of the forward pass were (are) still wrong.

There is an incorrect assumption that the next block to the current being executed one has a dependency on the previous one. The dependency undoubtly exists, but not the implications that are mentioned.
This was supposedly fixed with latest commits to 2.0.x, but, as there are still issues, maybe something is still remaining...

ejtagle on 24 May 2018

@alexyu132 : The junction deviation algorithm still has "approximation" bugs. (trying to save the computing of an acos() function there. But for most usage cases, they shouldn´t impact at all. If anybody is able isolate a Gcode path that causes those issues, I'll take a look 👍

ejtagle on 24 May 2018

@ejtagle the problem is that the layer shifts happen more randomly and not exactly on the same position. Today I printed the same part 3 times. 2 times everything was fine, then the third part i got layer a shift.

That is also the reason why it is really difficult to find a single commit. You have to do more than one print on every test build.

smoki3 on 24 May 2018

I can perfectly understand @smoki3 . We are all of us looking closely at layer shifts, so every layer shift is suspected as a software bug. Previously, maybe there were layer shifts, but they were "ignored", taken as part of the risk of printing things.

I wish there was a way to try to simulate this problem with an emulator (and, in fact, there are ways, but means we should write an AVR emulator... for example, emulate the Marlin execution, but at 100x its realtime speed (2000mhz vs 16mhz of a PC vs an AVR). And collect all stepper pulses times and inspect them to find out possible glitches and delays... Believe it or not, some attempts were made: Ultimaker used avr-sim to do that, and @p3p also tackled the problem with a different approach)

ejtagle on 24 May 2018

@ejtagle how much do you think the acos() approximations deviate from actual, and would the faster 32-bit boards be able to calculate them on the fly? Currently running the old jerk since it seems to step a tiny bit smoother on my machine.

alexyu132 on 24 May 2018

@ejtagle I'm pretty sure the approach from my PR will be somewhat useful for algorithm debugging, but hides all the hardware interactions obviously. Still need to go over things and make sure its actually working as intended (it's probably not) and parsing the output data ..

Just one homing cycle home_gpoutput_log.txt

p3p on 24 May 2018

@ejtagle that bug from https://github.com/MarlinFirmware/Marlin/commit/e2871f0dcd927e36707cf016933f132ea0b777be is remaining (at least) on lines 2346 and 2356 of the current bugfix's planner.cpp:

block->entry_speed_sqr = !split_move ? sq(MINIMUM_PLANNER_SPEED) : MIN(vmax_junction_sqr, v_allowable_sqr);
block->flag |= block->nominal_speed_sqr <= v_allowable_sqr ? BLOCK_FLAG_RECALCULATE | BLOCK_FLAG_NOMINAL_LENGTH : BLOCK_FLAG_RECALCULATE;

FiCacador on 24 May 2018

@FiCacador: Those lines what they intend to do was the following:

When the movement queue is empty, the first movement was split in half. The idea was that while the steppers were executing the first part of the move, the planner would have time to chain succesive movements to the 2nd part of the move.

But, it soon was found out that that 2nd part would start from 0 speed (the cause was not investigated, i assume), thus causing a big jump in speed.

So, a flag was added (now split_move, but previously all 2 movements were queued with interrupts disabled, so that flag was used) to force the 2nd block to start from maximum possible speed...
As far as i understand it, even if it worked (?), that was not the main problem. The main problem was that the planner did not plan anything if the number of queued movements was less than 3 (don´t ask me why that condition was there... no idea... I cannot understand it .. in fact, 1.1.8 still has that condition)

Now there is no split of first movement, it was reworked to just delay execution of first move a bit (50milliseconds), so it allows the planner to properly append movements to it.

Now the planner is fixed, except for the starting speed. There has been some discussions on why planner starts planning not from 0 speed. The rationale behind that seems to be that the AVR timer is unable to produce less than 30 steps per second.

But that speed limit (fixed at 0,05mm/s) is incorrect to my eyes. The planner should plan from 0 speed, and the timer limitation should be taken into account at calculate_trapezoid_for_block() instead.

If you want to try such modification, I can create a patch ...

If you are using 96 pulses per mm (or 192 pulses per mm), that gives, for 192 pulses/mm a minimum possible speed of 0.15mm/S minimum possible speed. If you use 96 pulses/mm, that gives a 0.31mm/s minimum speed.

Yes, this could mean there is a calculation overflow if for some reason, speed comes close to a halt

ejtagle on 24 May 2018

If you are using 96 pulses per mm (or 192 pulses per mm), that gives, for 192 pulses/mm a minimum possible speed of 0.15mm/S minimum possible speed. If you use 96 pulses/mm, that gives a 0.31mm/s minimum speed.

@ejtagle
Compared to the usually used (Marlin-)jerk values this jumps in speeds are very low so probably not critical.

Even the "BEZIER jerk" shaped accelerations are not jerk free. There is no way to accelerate stepper motors continuously. They are moving in steps. So the breaks in between of the step pulses have to be different. So the speed jumps (jerks) at every step - when accelerating or decelerating.

@Roxy-3D

I'm willing to bet those shifts are are 1 full step on your motors.

You can't win that bet against physics. If you displace the rotor of a powered (two coil) stepper motor far enough to lose position it will snap in again at the 4*n th full step position in that direction. Because of the magnetic field the rotor is forced away from all positions in between.

AnHardt on 25 May 2018

@ejtagle I tried reverting those two lines and at first it seemed to eliminate the rough movements on fast corners, but later I realized it didn't changed anything, so never mind on that.

I would like to help more but my programing skills, my understanding of how boards and how Marlin works are very limited. I do mostly understand your explanation, and if you need someone to test on AVR I can give it a try.

FiCacador on 25 May 2018

👍1

If someone experiences the problem and needs help bisecting please ask

You bisect, we'll dissect. I also figured that with about 120 days since 1.1.8 we'd need 7 bisections to locate the exact day that it started.

I'll look at the current prime suspects and see if anything jumps out. It seems pretty clear that the issue began after the latter part of January, and certainly before the 3rd week of February. If there's some period where it went away and then returned again, that would certainly be important to know, since it would point to something being tweaked one way that fizzed it and then another that borked it again.

But, it soon was found out that that 2nd part would start from 0 speed (the cause was not investigated, i assume), thus causing a big jump in speed.

I never found that in _any_ of my testing after the move-splitting was added. The first move was always nice and smooth, even very short ones. You can test that yourself using 1.1.8. It was _before_ the move-splitting addition that the first move added to the planner always went down to the minimum speed.

_Anyway, the first move seems to be working just as well with the new method, so all is good._

the layer shifts happen more randomly and not exactly on the same position

And that is what leads me to think it's not a logical or mathematical error. Marlin's Planner is entirely deterministic. I think it's more likely that some other change, perhaps nowhere near the Planner or Stepper classes, is responsible. Though I wouldn't entirely rule out some kind of error in the Stepper ISR timing, or the planner getting interrupted by the stepper ISR which ends up invalidating some data the planner is working with.

The "thunk" is the most interesting aspect of this whole thing. After all this is over, I'm having T-shirts made with "THUNK!" on them so we never forget this historic time.

thinkyhead on 25 May 2018

😄1

You bisect, we'll dissect. I also figured that with about 120 days since 1.1.8 we'd need 7 bisections to locate the exact day that it started.

I don't experience the problem so I can't locate the commit myself. Pointing out the bisection and helping users with doing so is my way of trying to help, as I think it's still not clear what commit caused the problem. If this isn't helpful let me know so I can stop.

HenningJW on 25 May 2018

There is no way to accelerate stepper motors continuously. They are moving in steps. So the breaks in between of the step pulses have to be different. So the speed jumps (jerks) at every step - when accelerating or decelerating.

Technically incorrect, but practically correct in most cases.
This varies with speed, number of microsteps and stepper construction.

For every stepper there is a speed band in which acceleration (and speed) can be linear for all intents and purposes.

orcinus on 25 May 2018

If this isn't helpful let me know so I can stop.

It's super helpful, so I encourage the effort! The hard part is to get these busy and clever folks to move on from their me-too'ing and theorizing and actually download and test the code. The reason I finally posted a US$100 bounty is because no one paid attention to the posted test branches and feedback requests, so clearly some extra incentive was needed!

But all it really takes is one or two with the issue to track it down, so I'm hopeful.

thinkyhead on 25 May 2018

@thinkyhead sorry I can't do much to help, but regarding T-shirts... if you make 'em, I'll buy one! :laughing:

VanessaE on 25 May 2018

😄1

I also have that problem, yesterday I even had it on the first layers borders :( I'm running the most up to date commit from bugfix-1.1. Disabling StealthChop resolves the issue for me. It seems to also happen only on curved parts.
My main issue is that I have to print something for a birthday on the weekend and work keeps me super busy this week but I'm having holiday next week so I like to try the fixes. As this thread becomes a bit long and confusing, which commit/branch is to be checked?

warhog on 25 May 2018

the pattern I'm seeing in all these posts is "layer shifts happen if there's not much torque available, but only after commit xxxx"

VanessaE on 25 May 2018

Last commit I tried in this thread was 6445859 on 2 prints, and I got a few thumps, so I'll have to change some slicer settings and lower the driver current to see if I can get it to shift... tomorrow, because it's 3:30 a.m. here.

I reverted when the problem was really bad a few months back, but I have tried versions since then and some are better, some worse. TMC2130 in spi seem most effected, but other drivers shifted too. I have enough torque to pull out a tree stump: 64 and 84oz steppers run at 80%, 24v, 2:1 reduction on X and Y axis, with double heatsinks and active cooling.

I tried a few of the others but they were already mentioned by the time I got a result. One note: e2871f0 made my X axis (a 2208 driver in uart) shoot all over the place and print everything at the wrong aspect ratio (as if the x axis steps/mm were entered wrong), if that means anything to anyone. re-downloaded several times and dragged my setup files in -- like all the other tests -- but it acted the same.

AletheianAlex on 25 May 2018

If i understand well, the planner is determinist, so if it skip, it will always skip at the same place. But if an undeterminist factor acts, the skip would be random. Could it be the overheating of the stepper or stepper drivers ? when i played with the TMC2130, i had some skip problem, so i increased the current, and it looked good, but on long print they skipped again due to the temperature (60-70° for the stepper). So i used 24V, low the current to 500ma (for 0.9A), add active cooling and heatsink for the driver, and heatsink for the stepper. I always verify the temperature with an IR, the stepper doesn't go more than 50° now (even after 12h print). And i use this commit without any skip. I print currently at 120-150mm/s, accel 3000-4000, jerk 30.
I use this commit :
Marlin 1.1.8 bugfix : commit e596931 Date: Fri Apr 6 00:30:50 2018 -0500
just printed this thing without problem : https://www.thingiverse.com/thing:1519726

Could it be an overheat of the stepper introduced with the last commit that cause this random skip ?

kakou-fr on 25 May 2018

I use this commit : e596931 (Fri Apr 6)
Could it be an overheat of the stepper … that cause this random skip ?

I'm almost certain it has to do with overheating. The "thunk" is a clear indication that the stepper driver is hard-stopping, then starting up again at whatever the current speed happens to be, and the most likely cause for this randomized behavior is thermal shutdown of the driver.

There is definitely a need to ensure good torque with these stepper drivers, and these drivers heat up a lot more than other drivers. Thus, they absolutely need active cooling, and they have to be very carefully tuned. Apparently this is not an easy thing!

But we keep coming back to the fact that

the issue goes away when 1.1.8 is used.

Thus we are pretty certain that

there must be _something_ we can do to mitigate the issue.

By tracking down _the starting point of the issue_ we hope to figure out a solution.

That doesn't mean that your solution isn't the best one for your setup. It just means that you've found a way to mitigate the thermal shutdown through careful tuning and effective active cooling. For those who are stuck on this, even after attempting many things, we hope we can at least find a way to help from the firmware side.

A key question for us is: Why are the stepper drivers running hotter now than before?

thinkyhead on 25 May 2018

What I can say so far is that the whole issue will appear more often with mechanical issues or too low current. So for successfully testing, first get a version you know that it is working. Decrease the current until you find the sweet spot where you don't have any layer shifts and then switch to newer builds. I have changed my bearings from igus to misumi und they run much smoother. I still get the shifts on the newer builds but less often than before. (but with my old January build I still have no shifts at all) so finding the last working version really takes a lot of time, to be absolutely sure.

viperchannel on 25 May 2018

👍2

What I can say so far is that the whole issue will appear more often with mechanical issues or too low current.

Too low current or mechanical issues will produce layer shifting at high loads, but it will not be accompanied by the characteristic loud "thunk!" that is being reported. The culprit in these instances has got to be thermal shutdown, caused by a combination of increased current and insufficient cooling.

My understanding is that the Watterott stepsticks are not sufficiently sinked, and their sinking might even be implemented wrong. So overheating is an all-too-common problem, even with relatively low loads. Since the Y axis is the heaviest one on Mendel-style machines, it's no surprise that this is the most problematic axis.

Then, there are some joining us in this thread who use other drivers, A4988 and DRV8825. Certainly their problem is the same — either too low current, or, with sufficient current, overheating leading to thermal shutdown. Do they also find the problem goes away when using 1.1.8? If so, we come back to the key question again. Why hotter now?

Tracking down the code change that exacerbates thermal shutdown could tell us something about stepper drivers and the relationship of pulse timing to wattage that we didn't know before.

thinkyhead on 25 May 2018

I have never heard that sound but I was never watching my print. So maybe it did that sound, maybe not. But what I can say is that my motors got warmer with the newer versions, but the steppers were always cool. (2208s). And again, if it would be a thermal shutdown, the 2130s that a lot of people here are using would report that through spi and the whole print would stop. I had that several times while using 2130s and I forgot to plug my fan back in. I didn't read anything here that the drivers reported overheating.

viperchannel on 25 May 2018

Would it be an option to address the topic to TMC? Maybe they could add some ideas. @teemuatlut might even know someone there.

Sineos on 25 May 2018

I didn't read anything here that the drivers reported overheating.

And round and round it goes. Which is why we posted bisect branches for testing.

thinkyhead on 25 May 2018

Well I'll ask my contacts at Trinamic what they think of the issue but I don't actually think it's a thermal issue necessarily.
When the TMC drivers overheat, they don't just shutdown and start back up again a few milliseconds later. The driver needs to heat all the way to 150C (OverTemp) and then cool back down to 120C (OT PreWarn) before it can continue.
To me it sounds more like for some reason there is a quick burst of step signals pushed to the driver and it can't keep up, therefore losing steps. This would also be supported by any and all driver chips being affected.

Would it be possible to add a check to stepper ISR where it would set a flag if the last step was performed sooner than t microseconds ago.

teemuatlut on 25 May 2018

Would it be possible to add a check to stepper ISR where it would set a flag if the last step was performed sooner than t microseconds ago.

I think it would be best to get the oscilloscope out and examine the signals.

thinkyhead on 25 May 2018

True, but the nature of the issue has been that it happens in a somewhat random point and time. I don't think oscilloscopes can trigger based on pulse frequency...

teemuatlut on 25 May 2018

The physical symptom appears at random, but if there is some general irregularity with stepper pulses, especially pronounced in curves, and especially on the Y axis, then it should be evident from a sample of pulses over a period of a minute or two while drawing circles. Scanning the pulses recorded by the DSO should reveal if there's something weird going on in the pulse duration during those kinds of moves.

thinkyhead on 25 May 2018

Would it be an option to address the topic to TMC? Maybe they could add some ideas. @teemuatlut might even know someone there.

Why does this keep coming back to TMC over and over again?

let me know what you need, I think I can say this is 100% repeatable at this point for me. Even switched out stepper drivers, got tons of them laying around since it's cheaper to buy them in lots of 4. Standard A4988.

I’m running A4988, as mentioned above, tried swapping drivers round, same issue on same axis (Y)

I’ve swapped X and Y drivers (A4988) over and restored the same print to see if i get shifts in the X direction or continues with Y, or if it comes out ok. Print completed, initially was looking good, but got a very slight layer shift at a higher point than the previous. Still in the Y direction, despite swapping drivers round.

Correct, my other printer (Tevo Black widow) has drv8825 and the same issue seems to bother there too, erratic looking moves that are louder than normal printing, although no layer shifts (yet)

orcinus on 25 May 2018

I think the 2130 issues may be a red herring, or at least combined with a hardware/pin assignment issue.

Here is an example from an issue I had that could be mistaken for this issue: With a wonky CS pin (wiring or pin assignment), but perfectly tuned setup with a ton of torque: Eventually driver would trigger an overtemp warning and behave as if the driver were hot, changing settings until shifts wold occur (max/rms current went nuts, and tOFF, tBLANK and hysteresis would move all over the place). The toff/blank/hyst settings would recover if the CS signal returned, but the current would stay at its lower setting (if it dropped the current).

This can be reproduced by staring a print and pulling the CS cable off, etc. (I took one for the team at the time and fried a few 2130s figuring out which intermittent pins would cause the issue). The conclusion was that bad wiring or bad pin assignment caused the issue, not temperature or torque.

AletheianAlex on 25 May 2018

👍1

Would it be an option to address the topic to TMC? Maybe they could add some ideas. @teemuatlut >> might even know someone there.

Why does this keep coming back to TMC over and over again?

This was actually not aimed at the TMCs as the source of the problem but aimed the the TMC company with surely a hell lot of experience and driver testing as a potential source of additional information or input. Should have made this clearer.

Sineos on 25 May 2018

The conclusion was that bad wiring or bad pin assignment caused the issue, not temperature or torque.

And yet…

Reverting to 1.1.8 makes the issue go away.

While all these theories are very interesting, none of them is really conclusive.
So we really need those bisection tests more than ever.

thinkyhead on 26 May 2018

😄1

Here is a brief summary of reports from this thread. The exact date when the issue began and the exact cause are both hard to determine. There are conflicting reports in both respects.

1.1.x testing

commit id|date|user|status
-|-|-|-
e2871f0|Jan 11|@AletheianAlex|Crazy results. (#9149)
e2871f0|Jan 11|@FiCacador|Before this: good. After: problem.
68cff5f|Jan 24|@orcinus|No problem.
---|Jan 24|@viperchannel|No problem.
6445859|Feb 2|@AletheianAlex|Problem exists: "Thunk!"
e596931|Apr 2|@VanessaE|Problem exists: "Thunk!"
e596931|Apr 6|@kakou-fr|No problem (st.diag1_active_high(1) was removed)
d429d5a|Apr 25|@autonumous|Problem exists.
156bd28|May 5|@ikarisan|Problem exists.

2.0.x testing

Sample Reports

Apr 20 - @GrimShadows: "the machine is randomly going full speed ignoring jerk in tight spaces"
Apr 21 - @GrimShadows: "it has to do with those G0 to G1 commands Cura slicer is putting out."
Apr 22 - @GrimShadows: "The slower I set the printer the more pronounced it happens."
Apr 23 - @ikarisan: "solved by disabling stealthChop and the 256 steps interpolation"
Apr 26 - @dammitcoetzee: "I installed heatsinks … bumped the current to 1150… this seemed to fix it."
Apr 31 - @autonumous: "my shifts have not been so dramatic since swapping the driver over."
May 1 - @nudelpapst: "I am using 1.1.6 now where everything works fine."
May 4 - @autonumous: "I did swap the drivers (A4988) back… the layer shift is … more pronounced"
May 8 - @autonumous: "after upping the driver voltage to 0.5v… looked better… very slight… shifting."
May 12 - @viperchannel: "The problem does not exist in my Jan [23] build."
May 12 - @AcHub: "resumed the print… after re-homing… it did not continue in the correct position"
May 13 - @alexyu132: "clunking noises on sharp curves with many short segments"
May 22 - @ejtagle: "Just raising the current solved all the problems"
May 22 - @VanessaE: "I raised their current… I have not seen any more layer shifts"

thinkyhead on 26 May 2018

👍3

While for sure the layer shifting is a combination of many factors, something must have been introduced since 1.1.8 that triggers it to happen. The famous "thunk" noise is a good suspect!
I can reproduce that vibration/thunk by printing something with a 10 mm radius arc after a long enough straight line to get there at 100 mm/s printing speed.

I am more and more sure that this thunk noise was introduced by https://github.com/MarlinFirmware/Marlin/commit/e2871f0dcd927e36707cf016933f132ea0b777be. I tested the build from April 17 https://github.com/MarlinFirmware/Marlin/commit/8cf6ef841113d3a211985a4f02e3fdf67452a250 that we all can agree is affected by this mysterious bug, and the thunk is there. Then I modified planner.cpp and planner.h to undo https://github.com/MarlinFirmware/Marlin/commit/e2871f0dcd927e36707cf016933f132ea0b777be, tested again and no vibration, no thunk, curves went smooth.
If anyone with layer shifts can test if the build from April 17 https://github.com/MarlinFirmware/Marlin/commit/8cf6ef841113d3a211985a4f02e3fdf67452a250 with this revert still prints with layer shifts, that would be more conclusive to know if it's best to look elsewhere, or #followthethunk
planner revert.zip

FiCacador on 26 May 2018

(apologies for all the edits, I got too excited thinking things had worked)

I am also having the layer shifting issue. Seeing as some are suggesting that it works in 1.1.8, I went from an earlier build (not sure exactly which version, it was between february and April) to 1.1.8 earlier today and upgraded from only layer shifting and small thunking to layer shifting, strange printing noise/vibrations (not severe, but definitely abnormal), and huge thunking. Then I tried @FiCacador 's proposed change of 8cf6ef8 and his planner revert. At first I thought it had fixed things while testing on short prints that quickly displayed shifting before.

I went from this with 1.1.8:

to this with the planner revert:

the 1.1.8 print and two planner revert prints successfully printed next to each other:

but it was a false alarm. On a longer print, 30 minutes instead of 6 (achieved by multiplying the part 5 times), it looks like I still have layer shifting and printing noise/vibration. However, the THUNK has gone.

The results of the 30 minute print look like this:

The leftmost print is what the part should look like. The 5 prints to the right are all layer shifted. I can provide more details about my setup if needed and test more in the next two days, just let me know how I can help.

calvinstence on 26 May 2018

@calvinstence you're looking pretty over extruded there, are you sure it isn't jagging a
deposit on a previous layer?

Squid116 on 26 May 2018

I just want to give a short status report:
I increased the stepper high delay to 4 and I removed the solder resist on my wattrott tm2130. So I can put my heatsinks directly on the copper. The last to prints are fine for now. But I need more testing.

smoki3 on 26 May 2018

While all these theories are very interesting, none of them is really conclusive.
So we really need those bisection tests more than ever.

Not tossing out theories, just trying to put a finer point on it and do actual testing to eliminate hardware variables with some more data points. Sorry to clog up the thread, but I want this issue solved as well, and I thought it might help clear out some false positives when peeling through past builds if we could separate out the tmc issues from the firmware issues. Not conclusive, but at least a data point. the TMC shift I was having were different than with other drivers. Here is what I tried (this is for the sake of people getting shifts with 2130 so they don't mistake one issue for another):

NO: I pulled the heatsinks off my drivers, propped a heatgun up next to the board and printed 3 different models with reasonable current settings: I could not reproduce the TMC shifting behavior. I got OT warnings and a shutdown, but not a single shift. NO, pure heat had no effect other than shutting the chip down.

YES: I cranked the current up to 1200mA like a maniac and printed an oval in stealthChop: the layers shifted dramatically (cm or more) with high current and certain movements. Stepper heatsink was hot to the touch. I held my finger against the platform and applied pressure: could not stall the axis under reasonable pressure. Reproducible.

NO: I cranked the current up to 1200mA (then 1300 halfway through) like a maniac and printed an oval in spreadCYCLE: NO, The layers did not shift. Stepper heatsink was hot to the touch.

NOT REALLY: I stepped the current down to very low levels and , as expected, got dramatically reduced torque and regular old missed step shifts, but not the same behavior.

--If this function from tmc_util is supposed to report hitting temp thresholds from DRV_STATUS, it did not, but it would be very useful if it did because it would definitively either confirm or deny whether temperature was the culprit in a particular case... Maybe it does work, but had no 'X' in my terminal even after roasting the drivers with these particular builds:

static void tmc_parse_drv_status(TMC2208Stepper &st, const TMC_drv_status_enum i) {
      switch (i) {
        case TMC_T157: if (st.t157()) SERIAL_CHAR('X'); break;
        case TMC_T150: if (st.t150()) SERIAL_CHAR('X'); break;
        case TMC_T143: if (st.t143()) SERIAL_CHAR('X'); break;
        case TMC_T120: if (st.t120()) SERIAL_CHAR('X'); break;
        default: break;
      }
    }

SOME BUILDS: MONITOR_DRIVER_STATUS => CURRENT_STEP_DOWN ON while plugging and unplugging the CS cable DID reproduce the TMC shifting behavior with some builds (of course if serial monitor is open, you can see the current dropping with no signal and stabilizing when the cable is plugged back in). Bad signal flags overtemp and drops current, plugging back in resumes. YES: reproduced TMC shifting.

SOME BUILDS: MONITOR_DRIVER_STATUS OFF while plugging and unplugging CS cable DID reproduce the TMC shifting behavior with some builds. I have no explanation for this. There was no indication of a problem visually unless a serial monitor was open. As far as I know, unless configured that way, there should be no connection to current/torque and just an OTPW flag set by the cable disconnect, but SOME builds did behave that way, to the point that I re-read all the trinamic documents to see if there is an internal auto-current regulator default mode that I did not know about which could be triggering. YES&NO: some builds reproduced shifting.

-- did not report 0xFFFFFFFF register

KINDOF: Wiring the CS to pin 49 with a display exhibited erratic behavior... sometimes shifts, sometimes the axis just would not move, other times it moved too far.

I have to duck out of this thread and work on other things because I have chased this problem for days, but I hope that was helpful.

AletheianAlex on 26 May 2018

As you can see in the function parameter type, the different levels are for TMC2208 which does have the necessary registers.

teemuatlut on 26 May 2018

So after more test print, actual it seems that increasing the 'MINIMUM_STEPPER_PULSE' fixes the problem.

My theory about this: Is it possible that several improvements speed up the the whole firmware? So the drivers are getting the high Signals too fast in some cases and we lose steps.

So if this is the case maybe we can't find a single commit that fixes the problems.

Update: Again a layer shift :(

smoki3 on 26 May 2018

It is probably best to ignore the thunks, that are probably introduced by #9149, for now. They might contribute to, but are not the root of the shifts. All the reports about playing around with latest builds and different settings isn't nearly as helpful as a clear

[commit id] from [before March] is (/is not) affected by layer shifts

HenningJW on 26 May 2018

👍1

It's also important to stress NOT to conclude a commit is/isn't affected based on a single print.

orcinus on 26 May 2018

👍2

Please give the latest code a try, as there have been some changes to the timing of the stepper ISR to prevent sudden changes in feedrate.

thinkyhead on 26 May 2018

@AletheianAlex — Not a problem! I appreciate your collating data into a single comment. Putting data-points in close proximity helps us to discern patterns. And it's vital to know whether a thermal shutdown can occur without any notification.

@orcinus — Absolutely right. A single test is definitely not enough to determine that a particular commit is responsible. Likewise, it's not even clear that reverting a single commit would be enough. There may be two or more changes that each contributes to the probability of a layer shift.

Increasing current, adding better heat-sinking, and doing more cooling has solved the issue for some. Which strongly implies that thermal issues are a root of the problem.
We tentatively concluded that thermal shutdown of the driver cannot be responsible, since with the TMC2130s this would actually throw an error and stop the print. But as @AletheianAlex discovered by intentionally pushing the current (thus the heat) to high levels, overheating doesn't necessarily trigger an error in the firmware.
In fact, I have seldom seen mention of an OT warning. In testing of TMC2130 drivers on a SCARA, we heard "thunk!" and got bad positioning, and this was eventually cured, first by disabling StealthChop, and then by also improving the cooling.
Movement at slower speeds showed more layer shifting in one report. Slower speeds can actually lead to hotter drivers, because they spend more time generating a high-frequency "holding" current. Even keeping a stepper powered while standing still will generate more heat than moving a stepper at high speed. So again, overheating is implicated.
There's a "thunk!" sound for many, which cannot simply be the nozzle getting stuck on an overhang, or being slowed down by over-extrusion (which sounds like "kgkgkgkg…"). What is the source of this sound? @HenningJW states that it may be a red herring. Regardless, it does imply that a move is re-starting at high speed from a dead stop, and this is something we want to eliminate.
A single lost step can only cause 0.0125mm of layer-shifting, at most. To get 1mm or more would require 80 or more lost steps in a row. It seems doubtful that bad pulses alone could lead to the amount of layer-shifting described, and they would be unlikely to lead to a dead stop.
While the Jan 11 change (#9149 / e2871f0) certainly piques my interest (it removes a trapezoid calculation and adds logic directly testing stepper-ISR-enabled) we have many reports that everything is ok between Jan 11 and at least Feb 2.
And… I've actually posted a complete reversion of the planner to 1.1.8 that was tested by some, and surprisingly it didn't seem to help. It didn't include any reversion of the Stepper code, so changes to stepper remain open to suspicion. Also, any changes that could suspend the stepper ISR remain open to suspicion.
There's one report where re-homing XY didn't resume at the right place, implying that the planner coordinates were wrong (even though G28 zeroes them). While being far from conclusive (since it could still relate to lost steps or overheating) it would imply a major high-level problem in the planner. But no one has reproduced that test.

To conclude, we still don't have a "smoking gun," but the most common factor does seem to be overheating. The fact that better sinking and cooling fixes the issue for some implies that the issue is not caused by bad logic. However, the fact that reverting to 1.1.8 fixes the issue for some implies that differences in stepper driver initialization may play a role.

So another thing to test would be to revert TMC2130 initialization to reproduce the initial state given by 1.1.8 (basically leaving them in a default state) and see if helps.

Finally, if you're using A4988 or DRV8825 drivers and still experiencing layer-shifting, we want to hear more from you, to know whether you see worse layer-shifting at lower speeds, and find out whether reverting to 1.1.8 also eliminates the issue for you.

thinkyhead on 26 May 2018

Here is something I need help understanding. If thermal shut down is turning off the stepper motor drivers for a small amount of time, wouldn't they stay turned off long enough that we see a huge shift on the affected axis? And when the affected axis turns back on, wouldn't we expect it to heat back up very quickly and slip some more? Wouldn't it just keep slipping more and more?

Roxy-3D on 26 May 2018

Correct. Thermal shutdown does not typically cause sub-step (or even single step) layer shifts, the effects are way more pronounced. Anecdotally - i've certainly experienced layer shifts with drivers cool enough to be touched.

orcinus on 26 May 2018

And yet…

Some find that better cooling solves the issue.

And…

Increased current also solves the issue for some.

thinkyhead on 27 May 2018

I should probably add that I've never experienced shifts as pronounced as a lot of the recent examples in this thread. Mine look more like this:

orcinus on 27 May 2018

I should probably add that I've never experienced shifts as pronounced as a lot of the recent examples in this thread.

And of course, some of us have never been able to reproduce the issue at all. It's quite mysterious what common factors are involved.

thinkyhead on 27 May 2018

And yet…

Some find that better cooling solves the issue.

Most are saying increasing current and improving cooling helps, which in my mind would compensate for the symptom, allowing steppers to brute force through the pulse train irregularities (if they exist), as mentioned above testing should be carried out at the minimum current / maximum rate config that functions correctly on the working builds.

p3p on 27 May 2018

Quite mysterious indeed. Especially considering it's seemingly always (or most often) the Y axis. I know the accepted explanation is "moving bed", but i don't have a moving bed, yet i also only got shifts on Y.

4/5 prints with newer commits had shifts, all 4 that had, had it in Y.
6/6 prints with old commits (pre-Feb) had no shifts.

To make things even more interesting - i'm running Y in spread cycle mode, and X in stealthchop.

orcinus on 27 May 2018

👍1

@orcinus as you are getting repeatable results, are you able to run through the bisect branches to in order narrow down where things changed?

Squid116 on 27 May 2018

Will do my best. That's what i've intended to do this weekend, but my day job decided to interfere.

orcinus on 27 May 2018

👍1

Quick update: With 498a328 (May 27) I did not get layer shifts with a print, which failed 4 times in a row in the same area before, running on 16f92dc (May 20). However I did only one print with the latest firmware, so it's too early to tell if layer shifts are fixed.

May 12 - @AcHub: "resumed the print… after re-homing… it did not continue in the correct position"

In terms of the resume print after rehoming XY I'm actually a little bit confused.
My printer did not resume in the correct position running on 7d78f34 (May 11). It was continuing at the shifted position. However I didn't have time to test further and verify with more prints.
Then when I updated to 16f92dc (May 20), when I pause, rehome XY and resume, it seems to print at the correct position in terms of XY (so good news: I wasn't able to reproduce my first observation anymore), but instead it resumed at the wrong height. So the XY position was correct, but Z was maybe 2mm too high. I'm not sure if this is related to the layer shifts or if it's another bug. I believe what happens is that when I pause the print for recalibration, the printhead would move up ~2mm before G28 is executed, but this raise is not reverted when the print is resumed. This error is also present in the latest commit 498a328 from today. Should I open an extra issue for this?

AcHub on 27 May 2018

Should I open an extra issue for this?

Yes, please do. Be sure to fill out the steps required to reproduce, and all that. There are a lot of ways to pause/resume a print, and we want to make sure we follow the right code-path.

thinkyhead on 27 May 2018

Quick update: With 498a328 (May 27) I did not get layer shifts with a print

That's encouraging. I won't get my hopes too high, but there have been some good improvements made to the planner and stepper code in recent days, and I wouldn't be surprised if those are helping in some cases.

thinkyhead on 27 May 2018

Quick update: With 498a328 (May 27) I did not get layer shifts with a print, which failed 4 times in a row in the same area before, running on 16f92dc (May 20). However I did only one print with the latest firmware, so it's too early to tell if layer shifts are fixed.

Do more test. I also thought that it is fixed because i had 7 prints without a layer shift. Then after 10hours printing, again...
So for me it doesn't look like a thermal issue

smoki3 on 27 May 2018

I'm running on an Anet A8 board, Marlin 1.1.8, with linear advance enabled. I'm experiencing various layer shifting issues on the Y axis. At first I thought it was my settings as I've modified by acceleration settings (but not the max feedrate). I'm running bilinear bed levelling only with extrapolate beyond grid enabled (can't run full UBL on an A8 board due to memory issues). Dropping down my acceleration settings didn't help though, neither did reducing my top speeds.

I have tried a couple of different prints and the problem seems to kick in after a certain Z height is reached. I've double checked my Z Axis and bed are level.

Here are two prints I tried of Einstein. Each shows the same layer shifting problems. The first one actually shifted back to the proper alignment after about 5mm worth Z printing! Note that since the print these parts have broken away. I've put them back on the model as a sort of a reference. The second one was cancelled shortly after a check revealed the layer shift had not gone away. As you can see the second print began its layer shift a little higher than the first:

evis03 on 29 May 2018

@thinkyhead Here's what Trinamic's CTO had to say in general about layer shifts:

From what I have seen, a layer shift often results from a high start-velocity, where the motor cannot follow. Maybe you can output the sequence of velocities and trace, where it happens, to narrow down the failure position? Also, when a high start velocity directly follows a motor stop, and direction has changed, this will even double required motor torque.

teemuatlut on 29 May 2018

I was wondering, which slicer do you use?

Last couple of days I was trying to print this file https://www.thingiverse.com/thing:1817180. I sliced it with Cura, and it failed 4 times in a row. Yesterday I tried the same thing with Simplify3D. This resulted in a perfect print. Maybe S3D does something different than Cura.

stefan85 on 29 May 2018

@stenfan85 . same here i have been printing one model from last week i uess, nearly 6 times failed. i am also using latest cura 3.3.1

bentech4u on 29 May 2018

I use Cura. I'll try re slicing in slic3r to see if that solves the issue, but I can't afford S3D!

evis03 on 29 May 2018

I have been printing 8 prints / about 25h with the 498a328 firmware now and had zero layer shifts. But then I disabled Auto Bed Leveling (Bilinear) with M420 S0 for a test related to stuttering (see #10847) and immediately layer shifts were back. Can't tell whether this is related to ABL or if the underlying bug is still present though.

I was wondering, which slicer do you use? [...]
Yesterday I tried the same thing with Simplify3D. This resulted in a perfect print.

I can't confirm that.

AcHub on 29 May 2018

I'm running on an Anet A8 board, Marlin 1.1.8, with linear advance enabled. I'm experiencing various layer shifting issues on the Y axis.

Should I add to the matrix that some experience layer shifts with 1.1.8…?

There are so many reasons that layer shifting may occur that there are bound to be some reports here that have nothing to do with a flaw in the firmware. The fact that some have repaired the issue by tuning their voltages indicates that there may not be any firmware flaw, per se.

The issue is most pronounced with Trinamic stepper drivers, and we may just have to say that Marlin is simply not "certified" to work with Trinamic drivers. Unless they are very carefully tuned, extremely well heat-sinked and cooled, and paired with well-specified stepper motors, Trinamic stepper drivers simply may not be suitable as drop-in replacements for A4988/DRV8825 stepper drivers for 3D printing applications.

Is anyone in this thread using an Einsy RAMBo board? We'd be curious to know if such issues arise not just for stepstick boards, but also for tightly-integrated TMC2130 drivers.

thinkyhead on 29 May 2018

Nothing has changed for the TMC drivers since v1.1.8 and a just as likely reason is their popularity among the more enthusiast level users who are also more inclined to join the discussion here.

I'm using Archim2 on a coreXY and it's printing just fine on a fairly recent bf-2.0.x.

teemuatlut on 29 May 2018

Ok the layer shifts occure also on release 1.1.8 on y axis with trinamics also also randomly. I sended my conf a while ago i think indeed its with someting about how fast accelerate and is in a very short time decelerate it will shift i was printing a wing lower part goes wel but after it gets small on 1 side then its going to shift ill put some more juice in tomorrow on the y axis to test. Its not the heat because former test it was rather cold here and now its freaking hot here, and the layer shift happend lateron in the print. So nothing to do with heating up of trinamics or so . i use modified chineese trinamics got rid of the paint from the cooling part to get some better cooling .And 3 fans to cool the board and trinamics 1 70mm fan 2 50 mm fans in a wind tunnel. so yes i begin to beleive its settings in marlin to tune the trinamics a bit more

felenna on 29 May 2018

perhaps to investigate some more is there a easy way to program to live read out whilst printing some times read the temp of the trinamics???
just an idea but doubt it is the problem

felenna on 29 May 2018

At this time, in spite of all the interesting reports and data-points in this thread, we have not determined even a single change we could make in the firmware to reduce the incidence of lost steps for these cases. Nothing here has helped to even narrow down a contributing factor, with the notable exception of stepper driver tuning. Unless we can develop something “actionable” from all this, I’m inclined to just retire the thread.

thinkyhead on 29 May 2018

(Sorry to pop back in... If my tenaciousness has worn out my welcome, I understand.) That's the red herring I mentioned earlier that I want to test, I don't know if you can COMPLETELY eliminate that particular 2130 problem via firmware or operating conditions (will test), which means it may fall outside the scope of this issue (or not), so I didn't want it confusing this thread and infuriating the hard-working devs:

Regardless of Marlin version, PrusaFW, etc (Smoothieware as well, but the threads were locked and issue blamed on 3rd party hardware), there are definite moves+settings that cause tmc21xx to shift regardless of OC, temp, or firmware tmc init settings -- the chips lose regulation (I confirmed on a scope) and the axis shifts. There has been chatter all over the internet for months, and the Prusia/reddit/reprap/G+/facebook forums indicate the Prusa and others are looking for a workaround as well. I don't have a Prusa to test, but I read their code commits. If I dig anything up, I'll just provide a link so as not to muddy the bug report threads more with my over-enthusiastic verbosity.

Possibly a new-ish Marlin commit exacerbates the underlying issue, possibly through a tangentially-related motion or thermal mechanism, but that one particular behavior SEEMS to originate ultimately with Trinamic's implementation (and there were prints that I absolutely had to switch drivers to complete), which led me to bother them about it months back and got the reply to work around it via slicer settings.

(The advent of the complaints about shifting also happen to coincide roughly with the release of the Chinese clone steppers and their availability on Amazon and eBay -- first few months of this year -- which led to more widespread use. And obviously: weak/poorly-matched motors, low current, overtemp, bad wiring, and high mass made it worse -- and add normal missed steps to the mix).

TMC22xx with StealthChop2 fare somewhat better. Slicer settings can also be tweaked as a workaround, somewhat (lower speed/speed differential between different move and layer/move type has the greatest effect, as well as jerk, and accel to a lesser extent). The issue SEEMS to compound throughout the print through a cycle of losing regulation, and storing then reloading the setting repeatedly, but I have to confirm and chart that before I believe it whole-heartedly.

When I get more 2130s in -- as well as some of the clones -- I'll be trying to put a finer point on the conditions, because at the time I just wrote it off completely as a driver-specific behavior that would be fixed with a hardware iteration, or worked around with more refined motion control profiles, and I didn't give it another thought until this thread (which piqued my interest because my 2130/2100 have mostly just been gathering dust in a parts bin since FEB).

(sorry for the wall of text... I get passionate about troubleshooting)

AletheianAlex on 30 May 2018

👍3

That's actually pretty helpful, A8 boards are not known for their quality. Guess it might be time to consider upgrading to ramps.

evis03 on 30 May 2018

Hi, in my experience these drivers TMC2100 and TMC2130 are problamatic (more so for me the TMC2100), the silence is nice but not knowing if your 10 hour print is going to layer shift and fail well that is no good for anyone.

I should mention at this stage that I use the Chinese Fysetc drivers with heat sinks and set at 1.15 volts and cooled with two 40mm fans

I have Mendel-style Dual X Carriage printers where the extra weight of the Y axis/carriage is problematic for these drivers. I have not gone to the complex stage of using SPI on my rumba and RADDS boards with these drivers, I just run them in SpreadCycle mode.

Layer shift for me, happened with both Repetier and Marlin Firmware (but I have not used the latest bugfix version on my RUMBA boards, my latest installed bugfix-1.1.x dates back to 3rd April 2018.

On the Due/RADDS I have Repetier installed and suffered from the same layer shift.
My solution to the problem was to slow down all printer settings and, lastly, to make the printers reliable I installed MKS LV8729 drivers on the Y axis. These are not as quiet as TMC21xx but much better than using DRV8825 drivers which are powerful enough but are noisy and produce diagonal patterns on a print if installed for direct drive extruders (they are much better if a geared NEMA is used).

Obviously I cannot fine tune my TMC2130 drivers, but the above is my solution, use a different more powerful driver on your problematic axes.

bruce356 on 30 May 2018

What does any of this have to do with this issue, if i might ask?

Unless you've confirmed the problems you're having do not occur with older commits, you're just talking about the problems your particular mechanical setup is having with this particular driver. And if you've confirmed the problems you're having DO occur with older commits as well, you've proven your particular problem is not Marlin. In both cases, the problems are tangential to this issue.

There are people in this thread using other drivers.

There are people in this thread running them at much lower currents and with no overheating.

orcinus on 30 May 2018

@orcinus, just trying to point out that Marlin might not be the problem here

bruce356 on 30 May 2018

Ok, I have some additional information. Lately I could not reproduce the layer-shifts, even with the versions I used before which had the shifts (used the same hex file). Why is that? Because I did some mechanical modifications on my printer, I swapped out the bearings for example. Now the X and Y axes move with less force than before. Just to make it clear, I definitely had shifts running later versions and did not have them with the exact same mechanical settings on earlier versions, so it was not only a mechanical issue. I switched back and forth several times to confirm that before creating this bug report. In fact I played around 2 weeks before I decided to create this. This does not help to determine the exact reason for the shifts, but it explains why not everybody has those shifts and others get rid of them by increasing the stepper voltage. And it tells us that the firmware doesn't lose a step or do a double-step; it is more likely that the right commands get to the steppers but for some reason the drivers don't send enough power to the motor (I cannot imagine how that could work). So I think it is more likely that the commands get too fast to the steppers or Acc and Jerk values get ignored or overwritten with too-high values, which would also explain why most occur on the Y axis which is much heavier and needs more force on Prusa clone printers. That makes more sense in my mind. Hope this information helps.

viperchannel on 1 Jun 2018

If your theory is true viperchannel, the best way to investigate would probably be to do a dry print on just a board with a logic analyzer attached and then let a script in "your favourite language" analyze if accelerations and jerks do exceed the limits or not. The logging should be possible with sigrok(+fx2lafw) and a CY7C68013A board (or you can go the Saleae route).

HenningJW on 1 Jun 2018

There is another alternative to the logic analyzer... https://github.com/buserror/simreprap ;)

ejtagle on 1 Jun 2018

❤1

I just want to throw my feedback into the mix here.

Awhile back, I changed out my CR-10 Melzi board for an MKS Base V1.5 Ramps 1.4 board I had purchased from Amazon. I had been using this board for quite awhile with no issues using an older version of the 1.1.8 firmware, until I loaded the latest bug-fix firmware. Someone in the CR-10 FB group posted a new firmware specifically for the CR-10, and I'm assuming this firmware was based on the newest bug-fix. Once I loaded that firmware, I start having layer shift issues. Tried the latest firmware from here (cutting edge), and still got layer shifts. Loaded the 2.0 firmware, and got layer shifts with that, as well.

A few days ago, I switched back to the Melzi board (newer version 2.1), and loaded the firmware posted in that FB group once again. No layer shifts. I loaded the other latest firmware (generic 1.1.8 bug-fix), and no layer shifts. Loaded the 2.0 firmware, and no layer shifts.

In my particular case, I'm going to have to assume it was a hardware issue, and not a firmware issue. I just wanted to mention this here, just in case some of you have the same board (or a variant of the MKS Ramps 1.4 board) and are experiencing layer shifts.

This isn't a conclusive test, as it's just my experience, and I'm not an expert. However, in my particular case, changing out the board resolved my layer shift issues.

EDIT: I also wanted to mention that I do not have specific commit versions to list here. I don't know which commits I used, so please keep that in mind.

Ceemo on 1 Jun 2018

For whatever it's worth, I seemed to have narrowed the problem down to the version of the TMC2130 Stepper Library, thought I'm not 100% sure. If I compile Marlin 1.1.6 with anything higher (that is supported) than 2.1.5, I get skipped steps as soon as I turn the printer on and move the X or Y axis. I've tried 1.1.8, but since 2.1.5 is not supported, I have to use at least 2.2.1, which produces the skips. I also tried 2.0 with the marlin-config tool, but I get similar results (bad skipping). Pardon my ignorance - how do I know what version of the any library I'm using when compiling with platformio?

The interesting thing is that the drivers don't seem to exceed 33°C when measured with an infared thermometer. No active cooling - just the heatsinks. Also, drivers are Fystec.

I've checked all the settings in both config and config_adv to make sure they are the same with each version I'm testing and they are.

Is it possible that (at least for me) the problem is with the stepper library or could it be that it is hiding a different problem?

Garyr14 on 1 Jun 2018

There's really no significant changes between v2.1.5 and v2.3.0 that would/should cause the problems.
The only thing that technically do it is the extended register reset when initiating the drivers. But that wasn't yet implemented in v2.2.1.

For TMC users who experience skipped steps, you can try disabling interpolation. You can read this on the Raise3D forums to learn more about why this might be help.
But that does not explain differences between 1.1.8 and bugfix branches as interpolation has been on by default since the very beginning.

teemuatlut on 1 Jun 2018

@teemuatlut I disabled interpolation but my printer produces strange (funny) noises on arcs (especially at circles)

Is there a fix for it?
Acc: 1000, jerk 10 on XY, 50mm/s

Init script:

M201 X1000 Y1000 Z50 E10000; Set Max Acceleration
M203 X100 Y100 Z3 E60; Set Max Feedrate
M204 P1000 R3000 T1000; Set Acceleration
M205 X10.00 Y10.00 Z0.20 E10; Set Max Jerk

Trontastic on 9 Jun 2018

Chiming in here.

Tested with S_Curve off and Junction on and off in combination with s_curve. Still exhibited step loss. I even swapped out for A4988 drivers and it still had step loss at random. I reinstalled the DRV8825s with the smoothers that were fitted as well. Rolled back to the 1.1.8 build and ran the same GCode. No step loss. Succeeded on the first print.

I have step loss on my custom i3 machine that runs fine on 1.1.8 but gets step loss at random on the latest bugfix. It is running a RAMPS board and tried with DRV8825 with and without smoothers as well as A4988 drivers. Neither driver setup fixed the issue. I have the drivers actively cooled and VREFS set high so the motors are getting plenty of torque. Extruder is a "MK8 style" with the A4988 driver at 1/16 step mode 95steps/mm.

However I have a Tevo Tornado running the stock board with A4982 drivers and it has yet to lose steps with the latest bugfix. This has a titan extruder running a 463 steps/mm.

I'm going to continue to figure out whats going on and if I make any progress I'll post here. Both machines are printing over USB through a PI 3B running octoprint. It also lost steps with SD.

houseofbugs on 10 Jun 2018

👍1

@houseofbugs have you tried ejtagle's latest bugfix branch (https://github.com/ejtagle/marlin-bugfix/commits/bugfix-2.0.x)? With the latest updates to the stepper code it seems like any stuttering or roughness in curves is completely gone from my printer - I've even been able to turn accel up to 5000 and jerk to 20. There's a noticeable reduction in frame vibration and as a whole it seems to be smoother.

alexyu132 on 10 Jun 2018

All the stepper / planner fixes from @ejtagle's branch have been merged to both bugfix branches, so this would be a good time to have a look and see if there's any improvement.
There seems to be a new issue with too-high acceleration on homing bump moves, and this may affect other endstop/probe moves done after an interruption, so keep an eye out for that. Otherwise, the movement is generally improved.
You can trye the new ADAPTIVE_STEP_SMOOTHING option which is meant to reduce aliasing between axes, especially on slower moves. But this is not specified to address step loss.

@houseofbugs — If you still experience layer shifting even with the latest bugfix code then it may be a good idea to go back to bisecting and try to figure out the exact date and time that the issue started to show up. Give what we've gathered so far it still hasn't been determined that the planner/stepper code is the definite cause, so it would be good to track down the exact commit after 1.1.8 that boosted the probability of step loss.

thinkyhead on 11 Jun 2018

Just a short note, i'm currently running 3b06a8e with JUNCTION_DEVIATION enabled and no layer shifting or hot motors (running stealthchop). The other settings are still the same. I did about 20 prints with this so far and it worked just fine.

warhog on 11 Jun 2018

👍1

I tried yesterdays commit. I started a print and found out that I sliced with top layers to 0. So I stopped the print and tried again with the top layers on but now, the nozzle touched the bed which it didn't before.
After that I tried to increase the Z offset, store it with M500 and I ran G28 again, but nothing changed. With M503, it reports the new setting but it doesn't seem to be applied.

Were there some changes with the last modification?

stefan85 on 11 Jun 2018

Hi all. The latest updates to the stepper code have improved the timing a bit more, and there have been improvements to the core serial code as well. Please give the bugfix branches a trial run to see if they're working any better.

@stefan85 — It sounds like you have an issue distinct from this one. If you're still having problems with G28 today please open a new issue about it.

thinkyhead on 13 Jun 2018

So I had a layer shift again today.

The sensorless homing is now fixed.

smoki3 on 13 Jun 2018

Me too. (-ing ;-)

This is still true though, maybe someone can try and check if a similiar behavior occurs. Should be pretty easy/fast to check...

I maybe found out how to provoke this "loud thunk" and, associated therewith, the skipped steps:

In OctoPrint Control Tab, if using following settings and pressing the arrow keys fast in a row, the motors are "thunking" and losing steps:

If I fast press 2 times Y+ then go back to 0 and repeat this process 4-5 times, my hotend will crash into Y0.
This won't ever happen, if I press Y+ slower. It also wont happen, if I alternately press other direction keys, no matter how fast (for example left-right-left). Only if the same direction key is pressed fast multiple times in a row (with only very little time inbetween the keypresses) the problem occurs.

The feedrate is set to 6000 in OctoPrint.

LichtiMC on 13 Jun 2018

@LichtiMC — Thanks. What is your Y acceleration set to? It definitely helps to have something we can reproduce easily. I wonder if we can change it into a G-code sample and accomplish the same thing. Maybe this would be enough…?

G28
G1 Y10 F6000
M400
G1 Y20
M400
G1 Y30
M400
G1 Y0
M400
G1 Y10
M400
G1 Y20
M400
G1 Y30
M400
G1 Y0
M400
G1 Y10
M400
G1 Y20
M400
G1 Y30
M400
G1 Y0
M400
G1 Y10
M400
G1 Y20
M400
G1 Y30
M400
G1 Y0
M400
G1 Y10
M400
G1 Y20
M400
G1 Y30
M400
G1 Y0
M400
G1 Y10
M400
G1 Y20
M400
G1 Y30
M400
G1 Y0

thinkyhead on 13 Jun 2018

Acceleration on xy is set to 1000.
I will try the gcode.

It "feels" like the motors can't manage fast stop-go cycles but only if both motors (corexy) are turning.

GCODE file works perfectly fine. What is octoprint doing differently?

LichtiMC on 13 Jun 2018

I also can reproduce a similar situation with octoprint. I also use acceleration 1000 and also a corexy

smoki3 on 13 Jun 2018

octoprint is using
G91
G1 Y10

Is it possible that we create an error in a buffer when we send a command too often ?

smoki3 on 13 Jun 2018

👍1

I can reproduce motor stalls with Pronterface by jogging around too frequently, regardless of the actual speed or acceleration, especially on Y.

In normal prints, it doesn't seem to be a problem.

VanessaE on 13 Jun 2018

It seems like octoprint doesn't wait for the next move... (M400)
Tried with "G91" and "G4 P1" inbetween the moves, but it's always running very smoothly.

LichtiMC on 13 Jun 2018

If some variant of the G-code produces shifting of the Y bed reliably, then we'll be able to analyze the acceleration and jerk behavior —even on a bare board— to see what's happening. If M400 doesn't produce the effect we're looking for, we should try G4 Pnnn with some different delay lengths.

thinkyhead on 13 Jun 2018

Already tried G4 P1, P10, P100

LichtiMC on 13 Jun 2018

If that still doesn't give us a reliable shifting, we might try removing planner.synchronize() from gcode_G4 so we can control the interval even while blocks are still processing.

thinkyhead on 13 Jun 2018

Same perfomance here. Moving the X and Y axes with the keyboard in Octoprint produces skips. If I drop back to 1.1.6, no problems at all.

Garyr14 on 13 Jun 2018

👍3

In general my print experience has been much better with the latest code changes. I didn't have layer shifts any more.
However with 4c8d6df I can see and hear, that printing circles leads to stuttering and stops/slowdowns midprint. It seems like the printhead is taking a break two or three times while going around the circle on every turn.

To investigate further I created and printed a test pattern containing circles in different sizes and printed with 5 outside perimeters (STL File: Test Circles.zip). I didn't notice stopps when printing the smaller circles (1/2/3mm diameter) but with 5/6/7 mm diameter it happens every now and then. However I can not reproduce it reliably. Sometimes it would stutter and pause/slow down three times on the same line of a 6mm diameter, while in the next layer the exact same circle is no problem at all.

AcHub on 16 Jun 2018

Hi to all,

good to see that i not the only one with this problem.. bad thing i unfortunately, have the exact same Problem and behavior as viperchannel. I have Ramps 1.4 in my printer and Printed with these Setup around 2-3kg of Material per month, so the mechanical system and reliability was great with the machine. After i changed my old Stepper drivers to the TMC2208 i also chaged in this step to the new Marlin FW 1.1.18 that i can use Lin_adv. And exactly here the problems have started.

After some calibration-cube prints, i always have the problem that one axis stops to work, it seems that the stepper turns off because i can easily move the "stoped" axis by hand. First i thought that i have maybe the wrong vref settings... or acc/ jerk settings are too high but after changing these i all directions it is clear that these not cause the Problems ( tried 0.4 up to 1.5V Vref and everything works great, stepper and motors run on nice temperatures with an active fan and heat sink).

i tried these things:

Marlin 1.1.8
change to Marlin bugfix-1.1.x
change Octoprint against direct printing over USB
change Vref Voltage 0.4-1.5V
measure the stepper temp (every time without heat problems)
change jerk/acc settings
change stepper driver crosswise (failure happens again on the same position and axis)
Turn Lin_adv on and off

so i tried so many things but nothing works reliable, i have also the same problem when i push the manual jog controls fast in time the stepper begins to lose steps/ makes cracking noise. When i push the jog button only one time it is no problem also if i increase the moving speed to 400mm/s the axis moves just fine without losing steps etc. One thing is also.. when i change to my old Marlin Version with the same Jerk /acc settings as in the Bugfix version where the stepper stops to work.. everything is fine with the same gcode and also settings, so i think the problem is Firmware related.

I have the feeling that the TMC stepper shuts down after they get to much comands so that they internal go to a emergency stop.

Thanks and best regards,

Patrick

GasM0nkey on 19 Jun 2018

Hey @GasM0nkey
I had similar problems with the Tmc2208 drivers. A workaround for me currently is to reduce the maximum feed rates of the X and Y axes. I had my machine set to allow 150 mm/s max feed rates. Now I'm at 90 mm/s max (set in the eeprom) and the printer seems to be stable.
Anything higher than that would lead to step losses and humming noises when jogging via any host application.
Layer shifts seem to be gone as well with the adjusted feed rates.

BTW: I'm currently running the latest Bugfix-2.0.x on a ramps + arduino mega 2560 setup

Zwaubel on 20 Jun 2018

👍1

I don't know what modes you're running but stealthChop is meant for slower speeds and our default 100mm/s or 500 fullsteps/s might be stretching it.

teemuatlut on 20 Jun 2018

In fact I'm running them in stealthchop (all three jumper wires in place). What's interesting though is, that other firmwares, such as Klipper, are able to run at 150 mm/s without any issues. I even did a test print at 80 mm/s print speed and 200 mm/s travel speed and while my extruder had problems to extrude material fast enough, everything else turned out working.

Zwaubel on 20 Jun 2018

Did anyone try going back to an older version of the firmware to see which version introduced the issue?

Tailslide on 20 Jun 2018

👎1 👍1

To clarify is the octoprint generated G-code by @LichtiMC @Garyr14 still the current avenue of exploration?

Bostwickenator on 20 Jun 2018

@Tailslide yeah devs tried to get the users to get this information, thinkyhead even put up a us$100 bounty for the one who can pinpoint the exact version/commit that introduced the problems. But this thread got cluttered up by users me-too-ing so it looks like the devs left this, I can't blame them. I don't know why @Bostwickenator reacted with a thumbs down, your post is way more helpful than all the me-toos here.

HenningJW on 20 Jun 2018

👍1

Hi at all,

@Zwaubel does this helps only when you move the axis over the manual jog button or also during normal print jobs? what travel speed do you use? For example i can move the axis up to 400mm/s without losing steps, but then i could not push the button until the move is finish, otherwise the stepper begins to clip. But i will give that a trie :-) !

@Tailslide with my old Marlin FW witch i used before i dont have these Problems, can you tell me how i can check witch version this build is?

@teemuatlut my print speed was always under 100mm/s only my travel speed was higher ~150mm/s

these are my speed settings:

#define DEFAULT_MAX_FEEDRATE          { 500, 500, 25, 25 }
#define DEFAULT_MAX_ACCELERATION      { 1200, 1200, 100, 600 }
#define DEFAULT_ACCELERATION          1200    // X, Y, Z and E acceleration for printing moves
#define DEFAULT_RETRACT_ACCELERATION  3000    // E acceleration for retracts
#define DEFAULT_TRAVEL_ACCELERATION   1200    // X, Y, Z acceleration for travel (non printing) 
#define DEFAULT_XJERK                 5.0
#define DEFAULT_YJERK                 5.0
#define DEFAULT_ZJERK                  0.4
#define DEFAULT_EJERK                  5

it is possible that maybe some combinations out of acc and jerk settings can cause this problems?

GasM0nkey on 20 Jun 2018

@GasM0nkey there is a command I think M115 but I don't know if it's reliable.

If you want to try older versions you can go to https://github.com/MarlinFirmware/Marlin/commits/1.1.x
Then click the <> beside the version you want to try then click Clone or Download-> Download ZIP.

It would be helpful to know about which date range the problem started happening since there is a large number of versions.

Tailslide on 20 Jun 2018

@GasM0nkey
It did fix both: I'm now able to move manual via any host software as well as print without any layer shifts. This is how my default max feed rates are looking currently:

#define DEFAULT_MAX_FEEDRATE { 90, 90, 4, 85 }

For example i can move the axis up to 400mm/s without losing steps, but then i could not push the button until the move is finish, otherwise the stepper begins to clip.

Yeah, I noticed that, too. I was able to move at 150 mm/s without any issues when putting only one move command in the queue at a time. Spamming the motion queue with move commands lead to a complete lock up of the motion and caused humming noises instead. As mentioned before, lowering the maximum feed rates did fix the problem in my case. But when looking at your print speeds, this probably isn't a satisfactory solution for you...

Zwaubel on 20 Jun 2018

👍1

@HenningJW because if he read the thread the answer is right there, people had done that.

Disclaimer I haven't been into this bit of source yet but I'll spend my evening tonight going through it. @thinkyhead If you want some fresh eyes on anything in particular I'm sure there are a few people (like me) who could look over any particular suspect areas you highlight. Pointing at known good and bad changeset numbers would help. Additionally would it help if I went through this thread message by message and collated all the reports into a spreadsheet which listed who actually tried which firmware versions and which stepper drivers?

Bostwickenator on 20 Jun 2018

@Tailslide thanks, M115 works fine. My old Marlin Version is unfortunately really "old"-> Version Marlin 1.1.0-RC but as said works without Problems but i think that Version is too old too compare. But i tried the solution from @Zwaubel and that seems to work fine up to now. I changed my max_feedrate also to 90 and was now able to Print 5 Testcubes without losing steps or an axis (Thanks for that :-) !). Current FW is Marlin bugfix-1.1.x i will make some more test for reliability of the changes and than i change the feedrate a little bit higher to see if the problem comes back.

GasM0nkey on 20 Jun 2018

👍1

@GasM0nkey if you can spare the time could you try some intermediate versions to see where the bug appears?

Bostwickenator on 20 Jun 2018

[…] Pointing at known good and bad changeset numbers would help […]

@Bostwickenator try https://github.com/MarlinFirmware/Marlin/issues/10446#issuecomment-391768301

HenningJW on 20 Jun 2018

❤1

[…] Pointing at known good and bad changeset numbers would help […]

Here is the comment where I collated all available reports:

https://github.com/MarlinFirmware/Marlin/issues/10446#issuecomment-392224568

thinkyhead on 21 Jun 2018

👍3

Hi again,

@Bostwickenator I've checked the issue now again with some older Marlin Versions. I'm going back up to Version 1.1.6 here i don't have the problem. With the newer Versions, it look like the TMC cant handle higher feed rates than ~80-90mm/s, when i use the 250mm/s which is more or less the default settings, the stepper begins to stall and chatter if the jog button is pushed twice until the move is finished. Also it looks like that my geared extruder (5:1) have also this problem when i use the default feedrates it stops to work after the slow first layer ( with general speed about 50mm/s) otherwise when i turn it down to around 5mm/s it works but, the extruder is that slow that the print is no more smothe. Vref changes don't affect the problem, Lin_Adv deactivated. I have printed now some parts and the Printer works like before upgrading.

Marlin 1.1.0 RC3 (my default FW): TMC works without problems
Marlin 1.1.6: works fine also with default feedrates like 1.1.0
Marlin 1.1.7: problem occure, max feedrate 80-90mm/s
Marlin 1.1.8: problem occure, max feedrate 80-90mm/s
Marlin 1.1.8 bugfix: problem occure, max feedrate 80-90mm/s

Best regards,

Patrick

GasM0nkey on 24 Jun 2018

👍1

@GasM0nkey can you test dcc9b0d4 then? This is right in the middle of 1.1.6 and 1.1.7. Thank you!

HenningJW on 25 Jun 2018

Well, i am still wondering the cause of all the problems. I have done extensive investigation on the Planner and Stepper classes, and found several problems there, so if anyone wants to try the fixed, then please do! (PR #11098) ... The title is somehow misleading, as it fixes quite a bit more than described.

But, to be honest, while testing it, i got a thumk sound and a layer shift on the X axis. My test print is pretty small, so i though ... "Hey, maybe i will be able to reproduce it!..." ... Nevertheless, i did a second print with no issues and shifts at all, probably the smoothest print i have dome, not a single strange noise!...

What i hope with that PR is to fix lost steps when jogging with Pronterface (i don´t have it installed, so, please report on this!) .

I will take @AcHub test file and try to reproduce the thunk sound, trying to figure out the cause. I have a 32bit board, so i am absolutely sure it is not caused by lack of processing power. I print from SD, so i am absolutely sure it is not related to serial communications or USB communications...

ejtagle on 25 Jun 2018

👍1

I will take @AcHub test file and try to reproduce the thunk sound, trying to figure out the cause.

Just to clarify, with my circles-test file I did not get the thunk sound and I also had no layer shifts! What I observed was that the printhead would stutter, slow down and even stop completely every now and then, which was introduced around the PR. As a layman I would interpret it as if the print buffer was empty and the printer didn't know any more what to do next. I am printing via Octopi, don't know if that is part of the issue, but still I didn't observe this behaviour before and I can clearly say, that for my printer it has been introduced around the PR or at least within the changes in the last weeks.
I wanted to print the circles test from SD card to keep Octopi out of the formula, but unfortunately my controllerboard with the SD cardreader on it died a few days ago, so it will take a few days/weeks until I repaired that and I can repeat this test.

AcHub on 25 Jun 2018

I'm just testing #11098 from ejtagle and for me it seems like working perfectly fine! As much as I try provoking a step loss via octoprint controls or prints, I can't... 👍

Edit: Just trying to print with stealthchop and marlin standard movement settings... ;-)

/**
 * Default Max Acceleration (change/s) change = mm/s
 * (Maximum start speed for accelerated moves)
 * Override with M201
 *                                      X, Y, Z, E0 [, E1[, E2[, E3[, E4]]]]
 */
#define DEFAULT_MAX_ACCELERATION      { 3000, 3000, 100, 10000 }

/**
 * Default Acceleration (change/s) change = mm/s
 * Override with M204
 *
 *   M204 P    Acceleration
 *   M204 R    Retract Acceleration
 *   M204 T    Travel Acceleration
 */
#define DEFAULT_ACCELERATION          3000    // X, Y, Z and E acceleration for printing moves
#define DEFAULT_RETRACT_ACCELERATION  3000    // E acceleration for retracts
#define DEFAULT_TRAVEL_ACCELERATION   3000    // X, Y, Z acceleration for travel (non printing) moves

/**
 * Default Jerk (mm/s)
 * Override with M205 X Y Z E
 *
 * "Jerk" specifies the minimum speed change that requires acceleration.
 * When changing speed and direction, if the difference is less than the
 * value set here, it may happen instantaneously.
 */
#define DEFAULT_XJERK                 10.0
#define DEFAULT_YJERK                 10.0
#define DEFAULT_ZJERK                  0.3
#define DEFAULT_EJERK                  5.0

~~I also see that my stepper motors are cooler!~~ 🥇

LichtiMC on 27 Jun 2018

So let's try. Everyone should do a lot of test print to see if there are any layer shifts

smoki3 on 27 Jun 2018

👍1

Everything's merged into the bugfix branches now. Eduardo has put in a huge amount of testing and chasing down subtle race conditions to solve the issue. So let's get some confirmation! If layer shifts are finally eliminated, then I presume we can all agree he has more than earned the offered bounty!

thinkyhead on 28 Jun 2018

👍4

Everything's merged into the bugfix branches now. Eduardo has put in a huge amount of testing and chasing down subtle race conditions to solve the issue. So let's get some confirmation! If layer shifts are finally eliminated, then I presume we can all agree he has more than earned the offered bounty!

I tested with the #11098 and the overall movements seem much smoother and with my 3 test prints I made yesterday, there was no skips (I had to go back to 1.1.5 before, because I had lot of prints to finish and couldn't test as much as I would've wanted) and the "thunk" sound is also gone, which is a pretty good sign. I will make few more test prints later today with this version, but I already agree that Eduardo has earned the bounty if we get few confirms that the issue is gone 👍

BTW. I'm using Trinamic TMC2100, and the layer skip was also happening with those.

Jartza on 28 Jun 2018

🎉2

Thanks so much for addressing the skipping issues. Using the latest merged 2.0 bugfix, I can confirm that I cannot induce the skips by jogging the keyboard arrows with OctoPrint. I did a few test prints: No problem with 20x20x20 cube. Benchy almost finished, but experienced a y axis skip a few layers from the top. A large print skipped after the first layer. An 80 or so mm vase (using circular printing) came out perfectly. I'm using TMC2130's. I'm hoping that the shifting is either mechanical or something with my acceleration and jerk settings.

Garyr14 on 28 Jun 2018

🎉1 👍1

Been glad too early...
I'm still having layer shift issues, how about you?

I don't think it's a mechanical issue in my case. Everything moves really smoothly.
While printing I tried to provoke skip loss by holding back the hotend to check if the 17HS4401 stepper motors are too weak: I really can't stop them... before they stop my whole printer tilts. Anyways ~~there is no thunk sound anymore~~, and the layer shifts only a little but repeatable, sometimes sooner, sometimes later.

M122 after the shift:

Recv:       X   Y   Z   E0
Recv: Enabled       false   false   false   false
Recv: Set current   800 800 350 800
Recv: RMS current   795 795 336 795
Recv: MAX current   1121    1121    474 1121
Recv: Run current   25/31   25/31   10/31   25/31
Recv: Hold current  12/31   12/31   5/31    12/31
Recv: CS actual     12/31   12/31   5/31    12/31
Recv: PWM scale     43  47  21  41
Recv: vsense        1=.18   1=.18   1=.18   1=.18
Recv: stealthChop   true    true    true    true
Recv: msteps        16  16  16  16
Recv: tstep     1048575 1048575 1048575 1048575
Recv: pwm
Recv: threshold     0   0   0   0
Recv: [mm/s]        -   -   -   -
Recv: OT prewarn    false   false   false   false
Recv: OT prewarn has
Recv: been triggered    false   false   false   false
Recv: off time      5   5   5   5
Recv: blank time    24  24  24  24
Recv: hysteresis
Recv: -end      2   2   2   2
Recv: -start        3   3   3   3
Recv: Stallguard thrs   0   0   0   0
Recv: DRVSTATUS X   Y   Z   E0
Recv: stallguard
Recv: sg_result     0   0   0   0
Recv: fsactive
Recv: stst      X   X   X   X
Recv: olb
Recv: ola
Recv: s2gb
Recv: s2ga
Recv: otpw
Recv: ot
Recv: Driver registers: X = 0x80:0C:00:00
Recv:   Y = 0x80:0C:00:00
Recv:   Z = 0x80:05:00:00
Recv:   E0 = 0x80:0C:00:00

No temperature problem, drivers and steppers only handwarm.

#define DEFAULT_MAX_ACCELERATION      { 1000, 1000, 100, 10000 }
#define DEFAULT_ACCELERATION          1000    // X, Y, Z and E acceleration for printing moves
#define DEFAULT_RETRACT_ACCELERATION  1000    // E acceleration for retracts
#define DEFAULT_TRAVEL_ACCELERATION   1000    // X, Y, Z acceleration for travel (non printing) moves
#define DEFAULT_XJERK                 10.0
#define DEFAULT_YJERK                 10.0
#define DEFAULT_ZJERK                  0.3
#define DEFAULT_EJERK                  5.0

I'm trying to do further tests and report back if I discover something.

LichtiMC on 30 Jun 2018

@LichtiMC : Please, try to increase a little bit MINIMUM_STEPPER_PULSE (set it to 2uS or even 3uS). Even if it should not be required..

Are you using ABL/UBL ?

If you have a 32bit board, try increasing BLOCK_BUFFER_SIZE from 16 to 32 (on AVR there is not enough RAM usually to increase it, unfortunately)

ejtagle on 30 Jun 2018

No for testing I'm not using any sort of UBL.
Still shifts with:

#define MINIMUM_STEPPER_PULSE 3

With this gcode file my print constantly starts shifting at the 5th layer: (I think the 5th and I think it always shifts ~~before going to print~~ after printing the honeycomb pattern and jumping to the next layer starting point, as it thunks (more silently.))
SN04_without_Dial_Gauge.zip

On the 5th layer it starts with honeycomb pattern (infill 40%) and shifts. Afterwards it often (not always) shifts after or before the honeycomb pattern...

I will test BLOCK_BUFFER_SIZE on Sunday as it's already too late now... (5am)

LichtiMC on 30 Jun 2018

Hi, I am far from being an expert, but i find my TMC2130 drivers perform much better in SpreadCycle mode, a little louder but not much. Cartesian printer and in stealthChop mode I used to get layer shift.

bruce356 on 30 Jun 2018

I can now confirm that my last set of layer shifting (post everything being merged into bugfix branch) problems was due to mechanical issues. My only problem now, and I'm not seeing it all the time, is y-axis "grinding" during auto-leveling. Does not seem to be happening anytime else, and not all the time. Anything specific I can check?

Garyr14 on 30 Jun 2018

I am running 2018-01-20.. Same problem with tmc2130 on Y. I will test the latest version too.

jarnoburger on 30 Jun 2018

I am running 2018-01-20

We have confirmed 100% that there are lost step issues from that period of time. Currently all emphasis is on testing the current bugfix code, which we know is much better.

thinkyhead on 30 Jun 2018

👍1

@LichtiMC — "I think it always shifts after printing the honeycomb pattern and jumping to the next layer starting point…"

If there's a Z move at this point, then we should consider that in the investigation. If it's merely going from a fast (infill speed) move to a lower (perimeter speed) move then we should consider that in the investigation. I doubt that there is any kind of planner starvation to worry about in such a situation. However, if there's a layer change and the E axis is being reset to 0 then we should also have a look at that. Depending on your slicer, you should be able to turn off the option to reset the E position on every layer. Please try that and see if it has any effect.

thinkyhead on 30 Jun 2018

I did 3 more prints with latest 2.0 bugfix. Zero issues with it. Two printers (Tevo Tarantula), MKS Gen and MKS Gen L boards, TMC2100 stepper drivers.

Jartza on 30 Jun 2018

❤1 🎉1

Could it possibly be, that only Core X/Y and H-Bot printers are affected by these weird layer shifts? As I said before my stepper motors are really overdimensioned for my printer. They don't have to work hard at all...

@Jartza: Would you be so nice and share your values for the following settings:
(These are my settings now.)

#define DEFAULT_MAX_ACCELERATION      { 1000, 1000, 100, 10000 }
#define DEFAULT_ACCELERATION          1000    // X, Y, Z and E acceleration for printing moves
#define DEFAULT_RETRACT_ACCELERATION  1000    // E acceleration for retracts
#define DEFAULT_TRAVEL_ACCELERATION   1000    // X, Y, Z acceleration for travel (non printing) moves
#define DEFAULT_XJERK                 10.0
#define DEFAULT_YJERK                 10.0
#define DEFAULT_ZJERK                  0.3
#define DEFAULT_EJERK                  5.0
#define MINIMUM_STEPPER_PULSE 3
#define MAXIMUM_STEPPER_RATE 400000
#define BLOCK_BUFFER_SIZE 32

@ejtagle: BLOCK_BUFFER_SIZE 32 doesn't change any behaviour:

// The number of linear motions that can be in the plan at any give time.
// THE BLOCK_BUFFER_SIZE NEEDS TO BE A POWER OF 2 (e.g. 8, 16, 32) because shifts and ors are used to do the ring-buffering.
//#if ENABLED(SDSUPPORT)
  #define BLOCK_BUFFER_SIZE 32 // SD,LCD,Buttons take more memory, block buffer needs to be smaller
//#else
//  #define BLOCK_BUFFER_SIZE 16 // maximize block buffer
//#endif

@thinkyhead: I found a setting "Allow zeroing of extrusion distances (i.e G92 E0)" and unticked it: No improvement.

By watching the print very closely I think I can say when the step loss is being triggered:
While executing the last travel movement to the honeycomp pattern starting point.
It skips when this last travel movement stops... it looks like there is ignored any acceleration settings between the travel movement and the first honeycomp pattern movement. (It also makes this thunk noise again.)

LichtiMC on 1 Jul 2018

By watching the print very closely I think I can say when the step loss is being triggered:
While executing the last travel movement to the honeycomp pattern starting point.
It skips when this last travel movement stops... it looks like there is ignored any acceleration settings between the travel movement and the first honeycomp pattern movement. (It also makes this thunk noise again.)

If @LichtiMC can pinpoint the exact spot, wouldn't it be possible to add debug output like in https://github.com/MarlinFirmware/Marlin/pull/11098#issuecomment-400136577 to see what's going on?

Sineos on 1 Jul 2018

@LichtiMc my settings are: acceleration 900, x-jerk 8, y-jerk 5, e-jerk 5 and z-jerk 0.2. Minimum stepper pulse 2.

Jartza on 1 Jul 2018

i dont use honey comb and also have some layer shifts so its not only honey comb

felenna on 1 Jul 2018

i got it the most if i print a 3dlabs wing

felenna on 1 Jul 2018

@Sineos ... There are at least 2 ways to catch it, assuming we can reliably reproduce it. One of them is logging all the movement blocks, as they are queued and as they are planned. The other one is with a logic analyzer.

The first approach, if the print is long, is hard to do, because logging alters the timing. The 2nd one could confirm if the bug is a mechanical one or a firmware one...

ejtagle on 1 Jul 2018

I have a logic analizer (Saleae Logic Pro 16).
But I don't know what and how to analyse and how to interpret the data. Can you show me?
I would love to eliminate the possibility of a mechanical bug...

Thank you all!

LichtiMC on 1 Jul 2018

I tested the last bugfix branch with many benchys, big and small prints. The Issue that the Steppers / Motors begin to struggle when the manual jog is pushed instantly is solved in my case for X / Y. But now i have the Problem that instantly when i turn on lin_adv the extruder stepper stops to work, if i extrude manual it no problem and everything works without problems but as soon as i turn on Lin_adv the stepper stops on the first retract. Same settings Lin_adv = 0 the print runs without problems. I tested many different vref settings and measured the temperature, without a change of the problem ( temperature always around 35-40°C ). Also when the E Stepper stops to work, after canceling the print i am directly able to extruder manual over the jog button, so for me i dont think that the stepper gets to hot in this moment.

After this i try to change the Extruder TMC2208 against some old A4988 driver with the same settings and everything works without problems, also with Lin_adv activated. I will try to change the TMC2208 into Spreadcycle mode that maybe overload could be eliminated.

GasM0nkey on 2 Jul 2018

@GasM0nkey — Your issue sounds a lot like #11024.
Let's discuss LIN_ADVANCE + Trinamic over on that thread.

thinkyhead on 3 Jul 2018

👍1

Ok just a weird idea i work with elevators alot the door operating system is a bit same style as a 3d printer what i see alot are that software endstops like the trinamics use will shift in time so we need to learn the car door a new teachin like homing in marlin. Is it possible that that changes value and give layershifts????

its just an idea ....

felenna on 10 Jul 2018

So I tried the latest build today. I can confirm that the layer shifts still there.

smoki3 on 12 Jul 2018

https://github.com/MarlinFirmware/Marlin/issues/11024#issuecomment-404200927

LichtiMC on 12 Jul 2018

For what it's worth, when I had shifting after everything was merged into the bugfix branches (2.0 for me) about two weeks ago, I found that in every case, it was due to my carriage or bed binding on the rails due to mis-aligned bearings for one reason or another. Make absolutely sure that your carriage and bed (if applicable) slide freely without the belts. If there is any binding whatsoever, that's your problem (at least that was the case for me).

Garyr14 on 12 Jul 2018

No more layer shifts with latest bugfix version.
I disabled BEZIER_CURVE_SUPPORT, ADAPTIVE_STEP_SMOOTHING and JUNCTION_DEVIATION though.

Maybe thats the reason. I'm simply glad, that it's finally printing without problems. (At least for approx. 20h now...)

LichtiMC on 23 Jul 2018

no changes merged that can affect this. So I think you only had luck for the last 20h

smoki3 on 23 Jul 2018

I didn't ever try with BEZIER_CURVE_SUPPORT, ADAPTIVE_STEP_SMOOTHING and JUNCTION_DEVIATION disabled (all three had been enabled all the time).

Also there were some changes regarding stepper timing (new options).

I could perfectly reproduce the skipped steps as I described in previous posts with the gcode file (I posted as well) and now the same piece prints without problems. No more I'm saying... will let you know if there is a shift again...

LichtiMC on 23 Jul 2018

Aaaha @LichtiMC what you say is very interesting…

The 'jerk' value should be fixed function of what your mechanics can handle… in reaction to the action induced by the jerk, you have things such as inertia. In a core device the number of element to 'jerk' implies a high inertia…

Now the jerk value is defined in the config file when NOT using JUNCTION_DEVIATION

#define DEFAULT_XJERK                 10.0
#define DEFAULT_YJERK                 10.0

When using JUNCTION_DEVIATION, the jerk value is calculated using that formula

max_ejerk [mm/s] = SQRT((SQRT(0.5) * max_acceleration_mm_per_s2[E_AXIS] * junction_deviation_mm) / (1 - SQRT(0.5)))

What is your calculated Y and/or X jerk when using JUNCTION_DEVIATION ?

I suspect that you will have layer shift again if you use the JUNCTION_DEVIATION calculated jerk values even when not using JUNCTION_DEVIATION...

lrpirlet on 24 Jul 2018

sqrt((sqrt(0.5)*1000*0.02)/(1-sqrt(0.5)))=6.948688455202313
would be lower...!?
What is meant with "max_acceleration_mm_per_s2[E_AXIS]"? Why would x,y jerk be calculated using the Extruder-Axis?
My x and y axis have set a default acceleration of 1000 mm/s². I guess they are meant.
My E_AXIS is on Marlin default of 10000. This then would be too high:
sqrt((sqrt(0.5)*10000*0.02)/(1-sqrt(0.5)))=21.973682269356203

LichtiMC on 25 Jul 2018

@LichtiMC
Sorry, I have not really a lot of time… and I hate this weather (30+°C in the evening when one expects around 20°C) that blows my electronic again (ok, I was stupid to try…)

What is meant with "max_acceleration_mm_per_s2[E_AXIS]"? Why would x,y jerk be calculated using the Extruder-Axis?

Everything you ask is defined in planner… In particular see line 757

    #if ENABLED(JUNCTION_DEVIATION)
      FORCE_INLINE static void recalculate_max_e_jerk() {
        #define GET_MAX_E_JERK(N) SQRT(SQRT(0.5) * junction_deviation_mm * (N) * RECIPROCAL(1.0 - SQRT(0.5)))
        #if ENABLED(LIN_ADVANCE)
          #if ENABLED(DISTINCT_E_FACTORS)
            for (uint8_t i = 0; i < EXTRUDERS; i++)
              max_e_jerk[i] = GET_MAX_E_JERK(max_acceleration_mm_per_s2[E_AXIS + i]);
          #else
            max_e_jerk = GET_MAX_E_JERK(max_acceleration_mm_per_s2[E_AXIS]);
          #endif
        #endif
      }
    #endif

Note that the formula is ONLY available for the E axis because it was needed for the linear advance… BUT each and every axis can use the formula provided one replace E by the corresponding axis…

Please note that, in the formula, you want the MAXIMUM acceleration, NOT the working acceleration one…

max_acceleration_mm_per_s2 is the 'max acceleration in units/s^2 for print moves'. See planner.cpp line 114, you can change that value using M201...

uint32_t Planner::max_acceleration_mm_per_s2[XYZE_N],    // (mm/s^2) M201 XYZE

lrpirlet on 26 Jul 2018

Hi @LichtiMC

@LichtiMC

On june 27th you wrote that your max acceleration was
#define DEFAULT_MAX_ACCELERATION { 3000, 3000, 100, 10000 }.
Hopefully this has NOT changed and you still have this value active…

This X and Y DEFAULT_MAX_ACCELERATION of 3000 would lead to a jerk of about 12... Could you set your X and Y jerk to 12? If you have layer shift, it would confirm that this is the problem…

Even better, it would explain why it looks intermittent (only 20% higher), or solved when mechanical 'resistance' is lowered…

lrpirlet on 30 Jul 2018

Of course, after getting shifts, I reverted Accel. to 1000.
I can set jerk to 12, but not at the moment, as my printer is currently disassembled.

What I don't understand since the beginning of this issue is: When I have layer shifts with totally overdimensioned steppers and everything moving really smoothly (moved by hand), there should be many more people experiencing layer shifts with the weak standard steppers...

LichtiMC on 30 Jul 2018

Hello,
I'm also experiencing the same weird layer shifts, that could be understandable as my X-carriage could use some improvements, but Y is bound to steel frame and also experiences similar shifts.

What i noticed just now while printing, is that when the extruder was shifted out of the print at around 8th layer, the coordinates of the printer place the head still somewhere in the middle of the bed, which is of course impossible.

I didn't hear any weird noises from the print while printing (it's a very short and simple print, and i was trying to test if i still get those shifts after compiling newest version - 86d9af1108f9747eeca6462926e69a0a4a741ec6) but i noticed the layer shift started at the very beginning of the layer, and the x carriage just moved more to X+ out of the blue.

What may be worth noting, is that i use TMC2130 for all stepper motors WITH softSPI (that was my reason for using bugfix branch). Didn't notice any similar issues with 1.1.8, but since I really don't want to solder additional pins on my RAMPS, I'd would like to help fix this in any way i can (or maybe a backport this to 1.1.9 so we can check? Just an idea).

BEZIER_CURVE_SUPPORT, LIN_ADVANCE, ADAPTIVE_STEP_SMOOTHING and JUNCTION_DEVIATION are all disabled in my setup too...

I'm gonna restart the print from the same file and FW in a bit and see if the same behaviour can be replicated, or if it'll be randomised.

EDIT:
Just finished printing the same part from the same file and settings - this time i got only ~0.4 mm shift in Y- dir, everything else is spot on...

Scorcerer on 5 Aug 2018

@LichtiMC

No more layer shifts with latest bugfix version.
I disabled BEZIER_CURVE_SUPPORT, ADAPTIVE_STEP_SMOOTHING and JUNCTION_DEVIATION though.

Maybe thats the reason. I'm simply glad, that it's finally printing without problems. (At least for approx. 20h now...)

I could perfectly reproduce the skipped steps as I described in previous posts with the gcode file (I posted as well) and now the same piece prints without problems. No more I'm saying... will let you know if there is a shift again...

Lucky you to have a file which allows you to perfectly reproduce the problem on your machine The more of a pity it is that you didn't test it with the new FW version and with BEZIER_CURVE_SUPPORT, ADAPTIVE_STEP_SMOOTHING and JUNCTION_DEVIATION enabled.

If you can identify the layer where the problem occurs, you could try to start printing just from a few layers below by removing the g-code for the earlier layers and inserting a G92 Zxxx to tell the printer what Z you are going to start with, where xxx is the Z coordinate of the new first layer plus the nozzle distance to the bed right before the layer's g-code will be executed minus the layer height (be careful with setting xxx too high, because that would cause the nozzle to kiss the bed). And after printing, reset the printer or do a Z homing.

qx1147 on 7 Aug 2018

hi guys. Have the same problem only on X. Acceleration can be even 400. Drivers are cold. Drives about 45C-50C. But really strange thing that on test cube 20x20x20 I have shifts in same place 2 cubes, 2 shifts on same Z axis.
TMC2130 on x and y only.
marlin 1.1.9.
I'm not sure that is the same issue but looks like.
BED_LEVELING enabled
LIN_ADVANCE disabled

On the photo is first and unfinished third cube
img_20180808_222549

stavinsky on 8 Aug 2018

Oh my! I spent the better part of the day chasing down the crucial commit and I made some progress, but I am afraid that we might be looking at two firmware-caused problems when it comes to TMC-drivers. I am testing with a Prusa i3-Mk3, so it is TMC2130 for all axes, and I restricted my testing to the 1.1.x branch.

The last good commit for me is https://github.com/MarlinFirmware/Marlin/commit/f089bbbc9305ee2396f53c151eebb2cd0c0025e3 (Feb 7), the first bad one (one later) is https://github.com/MarlinFirmware/Marlin/commit/a471cd26e1e6a0df78b44b3fbc0885465632b26f (Feb 7, "Enable Z axis and delta sensorless homing"). But then there is also https://github.com/MarlinFirmware/Marlin/commit/bc08ce86be128c35a6ba474b45afbe961af5feb2 (March 5, "Fix broken reverse planner") which makes things even worse (more frequent Thonks). With the all-good version I can print fine with jerk=30, with the bad versions even 10 is too high.

I had a quick look at the crucial Feb7 commit, but nothing obvious there, especially not regarding non-TMC drivers. So it might well be that for non-TMC drivers only the March 7 commit is crucial, whereas for TMC-drivers it is Feb 7 and March 5. EDIT: But see my comment below.

I am committed to continue testing, but now it is more dissection instead of bisection. I'll focus on the Feb7 commit first, but maybe someone who uses non-TMC drivers could test whether for him commit https://github.com/MarlinFirmware/Marlin/commit/081ab35e82a47886504324c30c1984aa5ec83086 is the last good one (this is the commit right before the potentially crucial March 5 commit cited above).

Here are my config files: i3Mk3_Einsy_BLtouch_Config.zip

Acceleration and jerk is changed via g-code:

M201 X900 Y900
M204 S900
M205 X20 Y20

Test object: Skip test.gcode.txt

Most crucial loop segment:
mostcrucialsegment

qx1147 on 9 Aug 2018

👍1

@qx1147 That's an interesting find. Do you personally use SENSORLESS_HOMING?

teemuatlut on 9 Aug 2018

@teemuatlut

@qx1147 That's an interesting find. Do you personally use SENSORLESS_HOMING?

Yes, I do, but not for Z.
EDIT: But you are right, the Feb7 commit might only be relevant to those who use sensorless homing.

qx1147 on 9 Aug 2018

Would actually be an interesting statistic to see which features people enable when they use the Trinamic steppers.

But that particular commit doesn't actually change anything TMC related apart from the homing procedure. The only thing I can think of is some of the registers wouldn't get restored the same after the homing procedure. Maybe if the spreadCycle-stealthChop switching speed is at the wrong setting, the driver would somehow get confused and try to rapidly change from one to another and back again and that would cause an issue.
I'll see if I have the time next week to verify your findings.

teemuatlut on 9 Aug 2018

Sigh! False alarm regarding the Feb7 commit (see my earlier comment). The Feb7 commit indirectly changed the homing position a tiny bit (with Trinamics in stealth mode, and sensorless homing with bump), which makes all the difference for my test part. I can make the part print ok with the Feb7 commit by just shifting it a few microsteps (about 20, using 16µsteps/step). I also can make the part print bad with the pre-Feb7 version by shifting it in the opposite direction. Instead of then testing the potentially critical March-commit, I tested a newer FW version from ~~July 8~~ Aug 7 (bugfix-1.1x), but I couldn't find any microstep offset that would avoid layer shifts. All the testing was done without any extras enabled, so no linear advance, junction deviation, adaptive step smoothing, and what not else. I am using microstep interpolation though, which might have a modulating effect, but disabling it doesn't seem to make much of a difference (except for the noise).
This is some progress, but not as much as I was hoping for. Right now, I am even not so much concerned about the potential bug that newer FW versions might still have, I am more concerned about the huge jerk margin I would currently need for safely avoiding layer shifts, even if I would use a pre-Feb7 FW version. However, this might well "just" be a matter of finding better (printer-specific) parameters for the Trinamic drivers.

qx1147 on 13 Aug 2018

Maybe I should explain a bit further how the Feb7 commit changes homing and how this might affect the layer shifting. At least four things have to come together: using a Trinamic driver, using StealthChop, using sensorless homing, and using home bumping. Note that the comments in the config file advise against home bumping if sensorless homing is used, however, without providing further reasoning or forcing home bumping to be disabled.
For sensorless homing, the Trinamic driver has to be switched from StealthChop to SpreadCycle, which used to be done once before executing the homing (1st home move, moving away for bumping, 2nd and usually slower home move), and the driver was switched back to StealthChop after homing was complete (per axis). Since the Feb7 commit, switching back and forth between StealthChop and SpreadCycle is done for each of the moves. Obviously, in the case of home bumping being disabled, there is only one move per axis and the Feb7 commit would not make a difference.
The first problem with bumping is that it didn't work in the first place (pre-Feb7), because the Trinamic StallGuard flag (the end stop signal) appears to be only reset by switching back and forth between SpreadCycle and StealthChop. So in the pre-Feb7 versions, the end stop signal was already set when starting the 2nd home move and, therefore, the 2nd home move was stopped basically instantly, leaving the motor at the bumping position. With post-Feb7 versions the StallGuard signal is correctly reset (when using StealthChop), but since the 2nd home move is slower than the first, StallGuard might trip early (as it does in my setup), because the chosen StallGuard threshold might only work well for the initial homing speed. So, in my case, my motors end up at slightly different homing positions depending on the FW version.
Now why would this affect layer shifting? Layer shifting occurs when the motor cannot provide enough torque. Besides many other factors, the available torque also depends on the microstep position within the (electrical) motor cycle, i.e. the exact position within the cycle of 4 fullsteps. So a rather critical move might cause a layer shift only if the motor happens to be at a "weak" microstep position when the move has to be made. If the print object happens to have only a few spots which might be just critical enough to provoke layer shifts, it is a matter of luck (aka microstep position at homing) whether one actually gets layer shifts. That being said, I find it quite surprising that I stumbled across a case which allows me to switch between "all good" and "pretty bad" just by changing the microstep position.
The mircostep position can be read back from the Trianmic driver (MSCNT register), and in my case MSCNT differed by around 20 microsteps between pre- and post-Feb7 commit ((FW microsteps, not internal Trinamic microsteps, so at 100µsteps/mm = 0.2mm = 112° in terms of phase angle within motor cycle). Unfortunately, the MSCNT value read back after homing is less consistent for post-Feb7 versions, which I blame on the StallGuard threshold being just on the brink during the 2nd home move, making the trigger point less well defined.
BTW, in my setup, the not working home bumping has the positive side effect of Z homing being done while the motors are not pulled against the mechanical end stop anymore, which otherwise makes my bed lift and rotate a bit (during Y homing). Z homing appears more consistent with home bumping enabled (and home bumping not working). Other than that, home bumping does indeed not make too much sense when using sensorless homing and has to be disabled for further testing anyway in order to make the layer shifting behavior more consistent. It possibly also makes sense to add new parameters to the clear command (G28) which allow adjusting the per-axis home positions to specific microstep positions (or have per-axis config switches for telling the FW whether to adjust the homing position to, say, microstep position 0 - also useful for most accurate re-homing after a crash).

qx1147 on 13 Aug 2018

only reset by switching back and forth between SpreadCycle and StealthChop

I remember when we got the memo about that, and we did make _some_ changes to adapt. Either they weren't perfect in the first place, or the fix got regressed.

the chosen StallGuard threshold might only work well for the initial homing speed

I see, although it opposes my intuition. I would expect fast moves to hit the threshold more quickly than slow ones.

It could help if we provided homing and probing acceleration settings. Or, it may help to add a small delay (or at the very least a planner.synchronize) after the "move away" to give the axis time to settle before we start a move in the opposite direction. The direction-change itself is a likely culprit, but the jerk setting could also play a role.

home bumping does indeed not make too much sense when using sensorless homing

Also true. No harm in disabling it, if you aren't concerned about repeatability (which can help alignment on a resume after power-loss).

adjusting the per-axis home positions to specific microstep positions

That would only be a workaround, and relies on the weird behaviors observed. It's better if we can simply tune the motion, acceleration, jerk, and timing to avoid false-positives.

thinkyhead on 13 Aug 2018

…it may help to add a small delay…

As an experiment along those lines, try making this change to Marlin_main.cpp, around line 3126…

    safe_delay(100);
    do_homing_move(axis, 2 * bump, get_homing_bump_feedrate(axis));

See if certain values in the call to safe_delay have a beneficial effect.

thinkyhead on 13 Aug 2018

@thinkyhead I spent way too much time already with testing different timings (after having added code that allowed me to add pauses here and there via G-code), but to no avail. For a long time I suspected that, if one wouldn't wait long enough before or after switching, the Trinamic driver would get into an internal state that would make the motor perform somehow differently, without seeing any other indication of that than the motor losing steps (e.g., the Trinamic register values as shown by M122 always looked basically the same).
Microstep-adjusted homing (at least for X and Y on Cartesian printers) is only relevant for further debugging and for crash recovery, because it allows repeatable homing results at microstep accuracy. Physical homing position might vary by as much as ± half the electrical motor cycle and it would still be possible to home to the exact same absolute microstep position (see point 1 in this comment regarding a related Prusa commit).
Microstep-accurate homing doesn't help with layer shifting in general though, because each and every object has its critical spots somewhere else. But it helps with reproduciblity, which is key for further chasing this layer shift issue. For my very specific object I can tune the microstep homing position to get always a good print, and then see which commit further downstream breaks it again. If I don't do that, I always would have to be suspicious about an observed layer shift being just the result of a glitch during homing. It is bad enough as it is already.

qx1147 on 13 Aug 2018

If there are "critical spots" your safety margin is too small.

AnHardt on 13 Aug 2018

Retiring this issue, since the core problem it addressed has been fixed, and now the topic is too general. If anyone is experiencing layer shifting with 1.1.9 and 2.0.x please open a new issue and we'll look into the cause, which is likely something entirely new.

thinkyhead on 15 Aug 2018

Whih commit fixed the layer shifts?

smoki3 on 15 Aug 2018

No specific commit. The planner has been overhauled. Whatever was causing the common issue discussed here has been addressed. When users experience lost steps now, it is due to a new as-yet-undetermined cause and we need to start the investigation afresh.

thinkyhead on 15 Aug 2018

okay! I will switch back to marlin tomorrow and will test :)
With a one and a half week old, the build the layer shifts still happened.

smoki3 on 15 Aug 2018

Can anyone confirm this. I'm using Slic3r and getting the same issue but Z axiz appears to be moving up one layer at approx 10 mm from the build plate. The rest prints 100% but i have a split part at 10mm. I switched to Cura for slicing and my prints works 100% with Marlin 1.9 bugfix but if i slice with Slic3r 1.3.0 it skips again. I'm not sure is Slic3r is the cause or a mix of Marlin and Slic3r. But mixing Marlin 1.9.x bugfix and Slic3r 1.3.0 appears to be the issue for me. I tried switching fading off with no difference.

diybuc on 26 Feb 2019

It sounds like a Slic3r issue. Please drop your G-code file on your next reply. You will need to change the file extension to ".txt" for Github to accept the file drop.

thinkyhead on 3 Mar 2019

Any fix for this yet`? I have been using the old firmware and just switched my Arduino on the ramps and flashed the newest release version and got my Layershifts back. Been printing with the older firmware for almost a year now without shifts. The newest one immediatley had a shift in the first benchy.

viperchannel on 26 Apr 2019

Here is a brief summary of reports from this thread. The exact date when the issue began and the exact cause are both hard to determine. There are conflicting reports in both respects.

1.1.x testing

commit id date user status
e2871f0 Jan 11 @AletheianAlex Crazy results. (#9149)
e2871f0 Jan 11 @FiCacador Before this: good. After: problem.
68cff5f Jan 24 @orcinus No problem.
--- Jan 24 @viperchannel No problem.
6445859 Feb 2 @AletheianAlex Problem exists: "Thunk!"
e596931 Apr 2 @VanessaE Problem exists: "Thunk!"
e596931 Apr 6 @kakou-fr No problem (st.diag1_active_high(1) was removed)
d429d5a Apr 25 @autonumous Problem exists.
156bd28 May 5 @ikarisan Problem exists.

2.0.x testing

commit id date user status
3416080 Feb 1 @grownseed No problem.
d6e29e9 Feb 2 @grownseed Problem exists.
6445859 Feb 15 @smoki3 No Problem.
0945674 Apr 15 @grownseed Problem exists.
0945674 May 11 @AcHub Problem exists: "Thunk!"

Sample Reports

Apr 20 - @Grimshadows: "the machine is randomly going full speed ignoring jerk in tight spaces"

Apr 21 - @Grimshadows: "it has to do with those G0 to G1 commands Cura slicer is putting out."

Apr 22 - @Grimshadows: "The slower I set the printer the more pronounced it happens."

Apr 23 - @ikarisan: "solved by disabling stealthChop and the 256 steps interpolation"

Apr 26 - @dammitcoetzee: "I installed heatsinks … bumped the current to 1150… this seemed to fix it."

Apr 31 - @autonumous: "my shifts have not been so dramatic since swapping the driver over."

May 1 - @nudelpapst: "I am using 1.1.6 now where everything works fine."

May 4 - @autonumous: "I did swap the drivers (A4988) back… the layer shift is … more pronounced"

May 8 - @autonumous: "after upping the driver voltage to 0.5v… looked better… very slight… shifting."

May 12 - @viperchannel: "The problem does not exist in my Jan [23] build."

May 12 - @AcHub: "resumed the print… after re-homing… it did not continue in the correct position"

May 13 - @alexyu132: "clunking noises on sharp curves with many short segments"

May 22 - @ejtagle: "Just raising the current solved all the problems"

May 22 - @VanessaE: "I raised their current… I have not seen any more layer shifts"

Ah, late, but just to be clear mine was both the current and the incorrect R_SENSE value. I was able to run at a much more reasonable high current after the R_SENSE too, which makes sense :)

dammitcoetzee on 14 Aug 2019

For what it's worth (probably not very helpful) my layer shifts have been resolved by moving to 2.0.x. I had a brief resurgence of very slight layer shifts on the other axis, but those turned out to be due to one of my stepper mounts softening during longer prints.

Things of note that have changed that had nothing to do with 2.0.x:

i've disabled S curve
i've disabled SD card support
i've disabled shutting down steppers after inactivity period
i've increased DEFAULT_MINSEGMENTTIME to 50000
i've decreased MINIMUM_PLANNER_SPEED a tad

orcinus on 14 Aug 2019

S_CURVE_ACCELLERATION is fairly well contained. I don't know if what you are seeing is real or not... But the thing is, S_CURVE_ACCELLERATION shouldn't be affecting the positions of the nozzle (to the point where it is in the wrong location).

It would be really good for some people to dig in and examine every

#ifdef S_CURVE_ACCELLERATION

code block in the code. And if we don't find anything... Maybe we should move on and do the same thing for the Classic Jerk code. There is something wrong with the Planner:: code and we need to find it.

Roxy-3D on 30 Oct 2019

I have also been wrestling with this issue recently. I have been able to find a some working settings but it requires keeping my speeds low (<75mm/s) and it does seem touchy. Here is a summary of my setup.

psu: generic 24v AC-DC (turned up to 24.6v)
control board: Einsey RAMB0 1.1b
motor drivers: TMC2130 - Powerful Fan for Cooling - Heat Sinks Soon
steppers: LDO-42STH47-1684AC

homing: sensorless
probe: bltouch

TMC2130 settings (X and Y)
r_sense: 0.22
current: 1000 mA
stallgaurd threshold: 8
stealthchop enabled
hybrid threshold enabled: 60

thillRobot on 26 May 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] on 25 Jul 2020

Marlin: [bug] Layer Shifts on bugfix-1.1.x and 2.0.x

Description

Steps to Reproduce

Additional Information

Most helpful comment

All 415 comments

define DEFAULT_AXIS_STEPS_PER_UNIT { 80, 80, 4000, 97 }

define DEFAULT_MAX_FEEDRATE { 300, 300, 4, 30 }

the issue goes away when 1.1.8 is used.

there must be _something_ we can do to mitigate the issue.

Reverting to 1.1.8 makes the issue go away.

1.1.x testing

2.0.x testing

Sample Reports

Some find that better cooling solves the issue.

Increased current also solves the issue for some.

1.1.x testing

2.0.x testing

Sample Reports

Related issues