In Mesh_Bed_Leveling.h we have this declaration:
class mesh_bed_leveling {
public:
bool active;
float z_offset;
float z_values[MESH_NUM_Y_POINTS][MESH_NUM_X_POINTS];
mesh_bed_leveling();
Please note the size of the z_values[][] array as well as the order of the subscripts. In Mesh_Bed_Leveling.cpp we have this code:
mesh_bed_leveling::mesh_bed_leveling() { reset(); }
void mesh_bed_leveling::reset() {
active = 0;
z_offset = 0;
for (int8_t x = MESH_NUM_X_POINTS; x--;)
for (int8_t y = MESH_NUM_Y_POINTS; y--;)
z_values[x][y] = 0;
}
We are clearly indexing off the end of the z_value[][] array (and off of the [x] subscript of the array.)
Also, it isn't clear to me why we would want the [0] indexes to not be cleared. Lastly, the subscripts are flipped from the way they are declared. [x] should be on the right of [y]. It would make more sense to me to have this written as:
mesh_bed_leveling::mesh_bed_leveling() { reset(); }
void mesh_bed_leveling::reset() {
active = 0;
z_offset = 0;
for (int8_t x=0; x<MESH_NUM_X_POINTS; x++)
for (int8_t y=0; y<MESH_NUM_Y_POINTS; y++)
z_values[y][x] = 0.0;
}
The original code was actually just like you are suggesting, got changed in this commit https://github.com/MarlinFirmware/Marlin/commit/bc5a547d55722a87d41d2e4ee6ad32ed9db736ba#diff-6419b880662c5d1c95104b9a0515c7b8
Don't see where that [y][x] got swapped though.
But the for loop actually seem to be ok, though hard to read.
for (int i=10; i--; /*empty here*/ ) printf("%d ", i);
will output 9 8 7 6 5 4 3 2 1 0
crucial piece is the post decrement in the mid part in the for statement...hard to read
will output 9 8 7 6 5 4 3 2 1 0
crucial piece is the post decrement in the mid part in the for statement...hard to read
I'll play with that in the morning. When I was looking at it I was thinking i gets to zero and the loop stops. But that isn't true because the auto-increment (decrement) happens after the evaluation (and usage). My mistake on that!
Don't see where that [y][x] got swapped though.
Some time after that would be my guess... @epatel I was typing in some release notes for the Unified Bed Leveling:
* Release Notes: Z-Probe Sleds are not currently fully supported. There were too many complications caused
* by them to support them in the Unified Bed Leveling code. Support for them will be handled
* better in the upcoming Z-Probe Object that will happen during the Code Clean Up phase. (That
* is what they really are: A special case of the Z-Probe.)
*
* My apologies to Epatel! I tried to preserve and leave his Mesh Bed Leveling as unaltered as
* possible. I wanted to show what a good technological foundation he had put in place by not
* making any changes to it. (And a big thanks to the authors of 3-Point and Grid leveling too!
* Excellent work!) But I ended up making some subtle tweaks to his code just because I could not
* keep things straight in my head. I changed the indexes on his z_values[y][x] array to be z_values[x][y].
* (His order was unnatural to me and I kept getting bugs in the code because of it.) I also changed the
* names of his member functions just because I was having to think too hard to use them. For example,
* cel_index_x() and cel_index_y() were changed to get_cell_index_x() and get_cell_index_y(). Those names
* were more natural for me to hold in my head.
@Roxy-3DPrintBoard The reason for having [y][x] order is that the numbers get layed out in memory each x value after each other per y so to speak. My graphical background makes me do things like that, necessary? Maybe not here (lack of caches and pipelines), but can be confusing when doing low level debugging (with [x][y]). The name changes to cel_index_x/y(), wasn't actually mine and I actually do not approve of the changes in them (as I think they can and will introduce numerical differences compared to other parts in the math for MBL, see https://github.com/MarlinFirmware/Marlin/issues/3811#issuecomment-221498086). If one want speed I imagine making an array (for get_x/y()) would be easiest and very fast.
And thanks for the mentioning! :)
Otherwise, I wrote the core for MBL a year ago and at the time focused at getting the method and math right. Never got around to clean other parts up (ints vs bytes etc) and fix needed stuff like the homing of single axises. So hard to get through here. Well, fixed the single axis homing recently (hope you get that in, https://github.com/MarlinFirmware/Marlin/pull/3835/commits/d49322a10c94790322d5ab713198a36715793870) which makes some people happy (which makes me happy). Think home to max-endstop could be a future thing to look at, but you might have that covered already?
as I think they can and will introduce numerical differences compared to other parts in the math for MBL
The maths appear to be fine now. Apply science, if you don't know. Run the old functions and new functions through a loop, giving them different coordinate inputs and check the output. The new code should in fact be _more correct_ than the old code. I believe I already pointed this out.
The name changes to cel_index_x/y(), wasn't actually mine and I actually do not approve of the changes in them
Nevertheless, some kind of distinct naming was needed to distinguish a probe index from a cel index, and getting the nearest index in each case is a different operation. After all, when you specify 3x3 probe points, you get 3 probe X indices and 3 probe Y indices, but you only get 2 cel X indices and 2 cel Y indices.
Allow me to illustrate for the casual reader. Let's say you have a simple grid, 3x3, on an idealized 200x200 bed:
For a given cel, the probe index to its left is the same as the cel index. The probe index to its right is the cel index plus one. These two indices are used to get the Z values for interpolation within a given grid cel.
Is a cel the same as a cell ? I'm wondering how that name got chosen.
Ok, lets see if I can explain this. And bear with me, this was very deliberatly implemented, for speed and simplicity. The methods get_x/y() looks like this
float get_x(int8_t i) { return MESH_MIN_X + (MESH_X_DIST) * i; }
and what is special about that? Well, all macro numbers and their expansion really gets expanded to integers (unless someone start to write .0 on the numbers). Well, this is what I am hoping for. and this was deliberately to gain speed (maybe I should have been more explicit). So the numbers will be slightly not perfect inline, and they really do not need to. We just need to know where they where probed and calculate from the same position. If the squares are slightly different doesn't matter, what matters is that we calculate from the same "grid". So, multiplications with i might get shorter as i increases (because we lost that fractional part).
It actually looks like Prusa saw this and force all those numbers to floats. Well, then he actually loses the intended speed gain we get there.
Again, the squares do not have to be exactly the same, what matters is that probing and calculations are aligned. And the best way, I think, is to use the same source, here the methods get_x/y() then there is no question if there is a difference, it is the same.
Anyone? Doesn't that make sense?
And using perfect dividable numbers in an example like 200 does not show this (I think).
The code to derive probe or cell indices from points works correctly according to my testing. The mesh option values are virtually always going to be integers.
Prusa saw this and force all those numbers to floats
We should slap users or forks that set any of the mesh options to floats. Alternatively, we can cast them to integers with (int) and I believe the compiler will treat them as though they were integer literals and produce code as though integers had been used in the first place.
float get_x(int8_t i) { return (int)MESH_MIN_X + (int)(MESH_X_DIST) * i; }
probing and calculations are aligned
Everything works fine, so long as configuration is done properly (with integers). But let's "science the sh*t out of this thing" instead of just speculating. Test the code, see whether it produces the desired values. If there's a discrepancy, patch it. I ran loops and timed them, found that all the numbers come out correct the calculation time is also reduced over the loops originally used in the index-figuring methods.
Is a cel the same as a cell
"Cel" is an animation term. Changed to "cell."
The code to derive probe or cel indices from points works correctly according to my testing.
I changed the names just because I was getting confused. The only place these routines are used is in the M421 handler. It is very possible that M421 doesn't need to exist in the next generation of Bed Leveling.
int8_t find_closest_x_index(float x) {
int8_t px = int(x - (MESH_MIN_X) + (MESH_X_DIST) / 2) / (MESH_X_DIST);
return (px >= 0 && px < (MESH_NUM_X_POINTS)) ? px : -1;
}
int8_t find_closest_y_index(float y) {
int8_t py = int(y - (MESH_MIN_Y) + (MESH_Y_DIST) / 2) / (MESH_Y_DIST);
return (py >= 0 && py < (MESH_NUM_Y_POINTS)) ? py : -1;
}
find_closest_x_index
Index of the nearest probe line or index of the containing grid cell?
You might be making an assumption that is not in what I'm doing. Given a location on the bed, this function tells you where the closest mesh point's index is . (By adding (MESH_X_DIST/2) on to the location, it is trying to round up or down to the 'correct' & 'closest' mesh point)
tells you where the closest mesh point's index is
Yes, of course. I asked because the name does not make clear whether you will get the index of the "probe grid line" ("cell border") or the index of the "cell" that the point lies inside. That is why I added the word "probe" to the method name. To make a clearer distinction, that it gets the index of the nearest probed grid line and not the index of the bounded region ("cell").
I re-implemented these "nearest probe index" methods in the manner that I did so that they would be compatible with M421. I originally designed that command to take coordinates (X Y) rather than indices, for compatibility with many potential probe methods. Although now it also takes indices (I J) in case that makes more sense sometimes.
So, when your MBL is configured to probe 3 x 3 points, the probe_index_x and probe_index_y methods will return an index from 0 to 2.
For the same (3 x 3) mesh, cell_index_x and cell_index_y return the index of the "cell" that the point falls inside. A value from 0 to 1, since there are only 2 x 2 cells.
For the same (3 x 3) mesh, cell_index_x and cell_index_y return the index of the "cell" that the point falls inside. A value from 0 to 1, since there are only 2 x 2 cells.
No. The index's actually can be de-referenced to get the Z height at that location. And because of that, the code needs to be able to acquire the index for a location. Right now, those locations are the Mesh Point (where we have a Z Value) that is closest to the specified location. (Everything keys off of the Mesh Points and their Z-Heights) Once you know the one you are closest to, you can easily figure out which cell you are in.
@thinkyhead I have a quick question related to all of this. Can I just delete all of the example_configurations Configuration.h files? Or do I need to edit them all? The problem is a million options for Auto Bed Leveling are going away (at least in the Development Branch I'm doing). I'm wondering if I can use ExamProDiff to quickly get the changes to the main Configuration.h file pushed into all of the Example_Configuration files? Can you give me some guidance on what the 'Right' thing to do is?
Right now, those locations are the Mesh Point (where we have a Z Value) that is closest to the specified location.
No…
probe_index_x and probe_index_y return the index of the nearest probe position.cell_index_x and cell_index_y return the index of the "cell" that a point falls inside.To get the Z at a given point on the bed, you first need to know which "cell" you're inside. Then you know which probe indices bound it on the left and right (cx and cx + 1). From these probe indices you get Z at all 4 corners and interpolate Z for the XY sub-point within the "cell".
float get_z(float x0, float y0) {
int8_t cx = cell_index_x(x0),
cy = cell_index_y(y0);
if (cx < 0 || cy < 0) return z_offset;
float z1 = calc_z0(x0,
get_probe_x(cx), z_values[cy][cx],
get_probe_x(cx + 1), z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
get_probe_x(cx), z_values[cy + 1][cx],
get_probe_x(cx + 1), z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
get_probe_y(cy), z1,
get_probe_y(cy + 1), z2);
return z0 + z_offset;
}
Yes. I understand what you are saying. The problem is the wording:
I need help with the example_configuration Configuration.h question. Can you offer guidance?
I ran loops and timed them, found that all the numbers come out correct the calculation time is also reduced over the loops originally used in the index-figuring methods.
I was curious about that. Did you do that on you laptop or an actual board? Well, I did this test, on a Mega256 board. I copied in the necessary part and cut out the fat and tried a couple of versions. I am always amazed of things the optimizer can do. Code below, _borrowed_ @jbrazio timing thing.
Things I found.
1) Most amazingly didn't the "fix1" change anything substantially. In fix1 I changed so get_probe_x() is called twice and not 4 times. Seems like the optimizer realizes that and just keeps the value from the first call. Yay optimizer!
2) Timing wise I see your "div" version clocking out at appr. 280-290 us while the _naive_ loop approach (fix2) actually in worst case (7x7) clock out at 268 us. In best case, close to the origin, it clocks out at 243 us.
3) In fix3 I added an array as I have been suggesting. And it was surprising for me to see that actually was worse than the loop. Not much but just a little.
4) The loop seem to be faster (in worst case) just up to 7x7 and then after that gets slower. So I added fix4 that does a bin search, so testing is always the same number of tests and not linearly worse. For 7x7 I get 246 us and 265us for 15x15, still faster than the "div" version.
Is this science enough? Have I missed something?
Still the _naive_ method to use get_x() is so we should not have to speculate and worry and have extra headache. _Is it correct or is it not correct?_ Using the same values from probing guarantees correctness.
Output for 7x7 and worst case, x = 191.4 y = 193.4;
fix0: 283.14uS # Current "div" version
zz: 1.666455268
fix1: 282.62uS # Remove extra get_probe_x calls.
zz: 1.666455268
fix2: 265.32uS # Naive loop version
zz: 1.666455268
fix3: 279.04uS # With array
zz: 1.666455268
fix4: 246.58uS # With bin search
zz: 1.666455268
Code below for anyone that want to verify, here set to 7x7 and calling get_z() with big x and y for worst case for loop.
I have been using Arduino IDE 1.6.5, default so it compiles with -Os (for size)
#define X_MIN_POS 0
#define Y_MIN_POS 0
#define X_MAX_POS 200
#define Y_MAX_POS 200
#define MESH_INSET 10 // Mesh inset margin on print area
#define MESH_NUM_X_POINTS 7 // Don't use more than 7 points per axis, implementation limited.
#define MESH_NUM_Y_POINTS 7
#define MESH_MIN_X (X_MIN_POS + MESH_INSET)
#define MESH_MAX_X (X_MAX_POS - (MESH_INSET))
#define MESH_MIN_Y (Y_MIN_POS + MESH_INSET)
#define MESH_MAX_Y (Y_MAX_POS - (MESH_INSET))
#define MESH_X_DIST ((MESH_MAX_X - (MESH_MIN_X))/(MESH_NUM_X_POINTS - 1))
#define MESH_Y_DIST ((MESH_MAX_Y - (MESH_MIN_Y))/(MESH_NUM_Y_POINTS - 1))
class mesh_bed_leveling {
public:
float z_offset;
float z_values[MESH_NUM_Y_POINTS][MESH_NUM_X_POINTS];
// Added for array lookup test
float probe_x[MESH_NUM_X_POINTS];
float probe_y[MESH_NUM_Y_POINTS];
mesh_bed_leveling();
void reset();
float get_probe_x(int8_t i) { return MESH_MIN_X + (MESH_X_DIST) * i; }
float get_probe_y(int8_t i) { return MESH_MIN_Y + (MESH_Y_DIST) * i; }
// Use the array instead
float f_get_probe_x(int8_t i) { return probe_x[i]; }
float f_get_probe_y(int8_t i) { return probe_y[i]; }
// Original version
int8_t o_cell_index_x(float x) {
int8_t cx = 1;
while (x > get_probe_x(cx) && cx < MESH_NUM_X_POINTS - 1) cx++;
return cx - 1;
}
int8_t o_cell_index_y(float y) {
int8_t cy = 1;
while (y > get_probe_y(cy) && cy < MESH_NUM_Y_POINTS - 1) cy++;
return cy - 1;
}
// Original w/ array
int8_t f_cell_index_x(float x) {
int8_t cx = 1;
while (x > f_get_probe_x(cx) && cx < MESH_NUM_X_POINTS - 1) cx++;
return cx - 1;
}
int8_t f_cell_index_y(float y) {
int8_t cy = 1;
while (y > f_get_probe_y(cy) && cy < MESH_NUM_Y_POINTS - 1) cy++;
return cy - 1;
}
// bin search
int8_t b_cell_index_x(float x) {
int8_t start = 1;
int8_t end = MESH_NUM_X_POINTS-1;
while (start < end) {
int8_t mid = start + (end-start)/2;
if (f_get_probe_x(mid) < x)
start = mid + 1;
else
end = mid;
}
return start-1;
}
int8_t b_cell_index_y(float y) {
int8_t start = 1;
int8_t end = MESH_NUM_Y_POINTS-1;
while (start < end) {
int8_t mid = start + (end-start)/2;
if (f_get_probe_y(mid) < y)
start = mid + 1;
else
end = mid;
}
return start-1;
}
// New version
int8_t cell_index_x(float x) {
int8_t cx = int(x - (MESH_MIN_X)) / (MESH_X_DIST);
return constrain(cx, 0, (MESH_NUM_X_POINTS) - 2);
}
int8_t cell_index_y(float y) {
int8_t cy = int(y - (MESH_MIN_Y)) / (MESH_Y_DIST);
return constrain(cy, 0, (MESH_NUM_Y_POINTS) - 2);
}
float calc_z0(float a0, float a1, float z1, float a2, float z2) {
float delta_z = (z2 - z1) / (a2 - a1);
float delta_a = a0 - a1;
return z1 + delta_a * delta_z;
}
// Current
float get_z(float x0, float y0) {
int8_t cx = cell_index_x(x0),
cy = cell_index_y(y0);
if (cx < 0 || cy < 0) return z_offset; // Should be removed
float z1 = calc_z0(x0,
get_probe_x(cx), z_values[cy][cx],
get_probe_x(cx + 1), z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
get_probe_x(cx), z_values[cy + 1][cx],
get_probe_x(cx + 1), z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
get_probe_y(cy), z1,
get_probe_y(cy + 1), z2);
return z0 + z_offset;
}
// Fix 1, call 'get_probe_x()' once
float get_z1(float x0, float y0) {
int8_t cx = cell_index_x(x0),
cy = cell_index_y(y0);
float cpx0 = get_probe_x(cx);
float cpx1 = get_probe_x(cx+1);
float z1 = calc_z0(x0,
cpx0, z_values[cy][cx],
cpx1, z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
cpx0, z_values[cy + 1][cx],
cpx1, z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
get_probe_y(cy), z1,
get_probe_y(cy + 1), z2);
return z0 + z_offset;
}
// Fix 2
float get_z2(float x0, float y0) {
int8_t cx = o_cell_index_x(x0),
cy = o_cell_index_y(y0);
float cpx0 = get_probe_x(cx);
float cpx1 = get_probe_x(cx+1);
float z1 = calc_z0(x0,
cpx0, z_values[cy][cx],
cpx1, z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
cpx0, z_values[cy + 1][cx],
cpx1, z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
get_probe_y(cy), z1,
get_probe_y(cy + 1), z2);
return z0 + z_offset;
}
// Fix 3
float get_z3(float x0, float y0) {
int8_t cx = f_cell_index_x(x0),
cy = f_cell_index_y(y0);
float cpx0 = f_get_probe_x(cx);
float cpx1 = f_get_probe_x(cx+1);
float z1 = calc_z0(x0,
cpx0, z_values[cy][cx],
cpx1, z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
cpx0, z_values[cy + 1][cx],
cpx1, z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
f_get_probe_y(cy), z1,
f_get_probe_y(cy + 1), z2);
return z0 + z_offset;
}
// Fix 4
float get_z4(float x0, float y0) {
int8_t cx = b_cell_index_x(x0),
cy = b_cell_index_y(y0);
float cpx0 = f_get_probe_x(cx);
float cpx1 = f_get_probe_x(cx+1);
float z1 = calc_z0(x0,
cpx0, z_values[cy][cx],
cpx1, z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
cpx0, z_values[cy + 1][cx],
cpx1, z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
f_get_probe_y(cy), z1,
f_get_probe_y(cy + 1), z2);
return z0 + z_offset;
}
};
extern mesh_bed_leveling mbl;
mesh_bed_leveling mbl;
mesh_bed_leveling::mesh_bed_leveling() {
reset();
for (int8_t x=0; x<MESH_NUM_X_POINTS; x++)
probe_x[x] = get_probe_x(x);
for (int8_t y=0; y<MESH_NUM_Y_POINTS; y++)
probe_y[y] = get_probe_x(y);
}
void mesh_bed_leveling::reset() {
z_offset = 1.1;
for (int8_t y = MESH_NUM_Y_POINTS; y--;)
for (int8_t x = MESH_NUM_X_POINTS; x--;)
z_values[y][x] = rand()/32768.0;
}
float zz;
float x1 = 191.4;
float y1 = 193.4;
void fix0()
{
zz = mbl.get_z(x1, y1);
}
void fix1()
{
zz = mbl.get_z1(x1, y1);
}
void fix2()
{
zz = mbl.get_z2(x1, y1);
}
void fix3()
{
zz = mbl.get_z3(x1, y1);
}
void fix4()
{
zz = mbl.get_z4(x1, y1);
}
void setup()
{
Serial.begin(115200);
uint32_t s, e;
uint16_t j;
for (uint8_t i = 0; i < 5; i++) {
s = micros();
for (j = 0; j<1000; j++) {
fix0();
}
e = micros();
Serial.print("fix0: ");
Serial.print((float)(e-s)/j);
Serial.println("uS");
Serial.print("zz: ");
Serial.println(zz, 9);
s = micros();
for (j = 0; j<1000; j++) {
fix1();
}
e = micros();
Serial.print("fix1: ");
Serial.print((float)(e-s)/j);
Serial.println("uS");
Serial.print("zz: ");
Serial.println(zz, 9);
s = micros();
for (j = 0; j<1000; j++) {
fix2();
}
e = micros();
Serial.print("fix2: ");
Serial.print((float)(e-s)/j);
Serial.println("uS");
Serial.print("zz: ");
Serial.println(zz, 9);
s = micros();
for (j = 0; j<1000; j++) {
fix3();
}
e = micros();
Serial.print("fix3: ");
Serial.print((float)(e-s)/j);
Serial.println("uS");
Serial.print("zz: ");
Serial.println(zz, 9);
s = micros();
for (j = 0; j<1000; j++) {
fix4();
}
e = micros();
Serial.print("fix4: ");
Serial.print((float)(e-s)/j);
Serial.println("uS");
Serial.print("zz: ");
Serial.println(zz, 9);
Serial.println();
}
}
void loop() {}
index of the lower left corner of the "cell"
Corners have nothing to do with it, unless you're thinking in terms of grid lines.
It's the index of the "cell" itself. Cells:
0 1
|---+---|
0 |0,0|1,0|
|---+---|
1 |0,1|1,1|
|---+---|
The probe indices apply to the grid lines:
0 1 2
0 |---+---|
| | |
1 |---+---|
| | |
2 |---+---|
In your diagram, cell (0,0) has a corner at grid lines (0,0). Similarly, cell (1,1) has a corner at grid lines (1,1). We are talking about the same thing.
Is this science enough? Have I missed something?
It's the best science I have seen around here in a long while. I wish more contributors would do this.
The place where efficiency is most vital is in the get_z method. As you mentioned, the optimizer is smart enough to drop multiple calls to get_probe_x. Of course, we can also do the following to save 3 more calls (which should be inlined by the compiler, really) and get rid of 2 multiplications:
float px1 = get_probe_x(cx), px2 = px1 + MESH_X_DIST,
py1 = get_probe_y(cy), py2 = py1 + MESH_Y_DIST;
…so (although the optimizer will do this for us) we get…
float get_z(float x0, float y0) {
int8_t cx = cell_index_x(x0),
cy = cell_index_y(y0);
if (cx < 0 || cy < 0) return z_offset;
float px1 = get_probe_x(cx), px2 = px1 + MESH_X_DIST, py1 = get_probe_y(cy);
float z1 = calc_z0(x0,
px1, z_values[cy][cx],
px2, z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
px1, z_values[cy + 1][cx],
px2, z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
py1, z1,
py1 + MESH_Y_DIST, z2);
return z0 + z_offset;
}
I found in my testing that cell_index_x and cell_index_y were more efficient as loops in the first quadrant, but with 2 or more loops they were no faster than the float subtraction plus integer division. And certainly they are nowhere near the cost of common delta calculations.
One other small optimization I can think of is to get rid of the float subtraction too. So instead of…
int8_t cx = int(x - (MESH_MIN_X)) / (MESH_X_DIST);
…you get…
int8_t cx = (int(x) - (MESH_MIN_X)) / (MESH_X_DIST);
…and this should work okay, as I don't think the fractional part of x is important here.
We are talking about the same thing.
Not entirely. The method to get the nearest grid line is different from the method to get the nearest cell. In the former case, if you've gone 75% across cell 0,0, for example, then you'll get a "probe index" of 1, based on the grid line at the right edge. But if you ask for the "cell index" at that point you will get a "cell index" of 0 at that position. Sometimes (as with M421 X Y) this distinction matters.
@thinkyhead Ok, so I tested this (your code below), and I actually get 4-5 us worse (and I cringe because this is spreading the calculations all over the place)
The numbers I get are ~270 us with this change and ~265 us without.
The int(x - to (int(x) - change did gain ~15 us...(which is part of the diff from my 280 and 265 now). Validity in all cases? Still, using the same method/numbers that is used when probing will secure any numerical errors and everything will line up.
float get_z(float x0, float y0) {
int8_t cx = cell_index_x(x0),
cy = cell_index_y(y0);
if (cx < 0 || cy < 0) return z_offset;
float px1 = get_probe_x(cx), px2 = px1 + MESH_X_DIST, py1 = get_probe_y(cy);
float z1 = calc_z0(x0,
px1, z_values[cy][cx],
px2, z_values[cy][cx + 1]);
float z2 = calc_z0(x0,
px1, z_values[cy + 1][cx],
px2, z_values[cy + 1][cx + 1]);
float z0 = calc_z0(y0,
py1, z1,
py1 + MESH_Y_DIST, z2);
return z0 + z_offset;
}
Do people agree that we need to change the code to look like this:
for (int8_t y = MESH_NUM_Y_POINTS-1; y--;)
for (int8_t x = MESH_NUM_X_POINTS-1; x--;)
z_values[y][x] = 0;
@Roxy-3DPrintBoard Do you mean swapping x/y or use that hard to read for statements? x/y swap is fine I guess. Don't think that should change anything, as long as you get all places changed etc. Reason was memory layout, but is probably less important on AVR. The for statements I do not think we need to change. Readability vs Speed? Necessary here? I'd say no. And really how many us are we talking about?
No! I mean the fact that both x & y are initialized past the end of their respective subscripts. The array is declared as:
float z_values[MESH_NUM_Y_POINTS][MESH_NUM_X_POINTS];
And we are going off the end of the array. I'm wondering if this is the root cause of some of those complaints we have seen about Mesh Bed Leveling not staying activated and such. That is why I'm suggesting the -1 on both MESH_NUM_Y_POINTS and MESH_NUM_X_POINTS.
Remember... A typical for() loop says something like for(i=0; i There is no storage declared for z_values[MESH_NUM_Y_POINTS][MESH_NUM_X_POINTS] The storage is only declared up to z_values[MESH_NUM_Y_POINTS-1][MESH_NUM_X_POINTS-1] Right????
Just did a timing test on reset()
So which one do you think is fastest? Actually for 7x7 I get 56us for reset2() and 66us reset() (Mega256 board "original" + Arduino IDE 1.6.5)
void mesh_bed_leveling::reset() {
for (int8_t y = MESH_NUM_Y_POINTS; y--;)
for (int8_t x = MESH_NUM_X_POINTS; x--;)
z_values[y][x] = 0;
}
void mesh_bed_leveling::reset2() {
for (int8_t y =0; y< MESH_NUM_Y_POINTS; y++)
for (int8_t x =0; x< MESH_NUM_X_POINTS; x++)
z_values[y][x] = 0;
}
To be fair... reset() is doing an extra iteration of each loop. Tell it to print out what the index variable is at. x & y are both doing an iteration with a value of MESH_NUM_x_POINTS and reset2() is only going up to MESH_NUM_x_POINTS - 1.
@Roxy-3DPrintBoard I think the loop does what it should, though hard to read. And actually not efficient it turns out.
I get this output from the method below with 7x7.
6:6 5:6 4:6 3:6 2:6 1:6 0:6
6:5 5:5 4:5 3:5 2:5 1:5 0:5
6:4 5:4 4:4 3:4 2:4 1:4 0:4
6:3 5:3 4:3 3:3 2:3 1:3 0:3
6:2 5:2 4:2 3:2 2:2 1:2 0:2
6:1 5:1 4:1 3:1 2:1 1:1 0:1
6:0 5:0 4:0 3:0 2:0 1:0 0:0
void mesh_bed_leveling::reset() {
for (int8_t y = MESH_NUM_Y_POINTS; y--;) {
for (int8_t x = MESH_NUM_X_POINTS; x--;) {
z_values[y][x] = 0;
Serial.print(" ");
Serial.print(x);
Serial.print(":");
Serial.print(y);
}
Serial.println();
}
}
with the other form I get this output
0:0 1:0 2:0 3:0 4:0 5:0 6:0
0:1 1:1 2:1 3:1 4:1 5:1 6:1
0:2 1:2 2:2 3:2 4:2 5:2 6:2
0:3 1:3 2:3 3:3 4:3 5:3 6:3
0:4 1:4 2:4 3:4 4:4 5:4 6:4
0:5 1:5 2:5 3:5 4:5 5:5 6:5
0:6 1:6 2:6 3:6 4:6 5:6 6:6
void mesh_bed_leveling::reset() {
for (int8_t y =0; y< MESH_NUM_Y_POINTS; y++) {
for (int8_t x =0; x< MESH_NUM_X_POINTS; x++) {
z_values[y][x] = 0;
Serial.print(" ");
Serial.print(x);
Serial.print(":");
Serial.print(y);
}
Serial.println();
}
}
Oh! The second term happens even before the loop runs the first time. OK! My mistake!
@Roxy-3DPrintBoard Yeah, what I mean by hard to read.
Well if it can run faster and be easier to read... We should do the reset2() style.
x & y are initialized past the end of their respective subscripts
Do people agree that we need to change the code to look like this…
Hells no. for (int i=5; i--;) cout << i << " "; gives you 4 3 2 1 0.
what I mean by hard to read
This is a fairly common shorthand. Essentially the same as: int i = 5; while (i--) { ... };. It's a familiar idiom to me, but then I have been programming for over 30 years.
…if it can run faster…
Forward iteration should run no faster on any CPU with even a modest cache. I would say use memset but you can't safely do that with a float array.
Breaking it down…
uint8_t y = MESH_NUM_Y_POINTS; // costs the same as init to zero
while (y--) { // Equals-zero? Always decrement after
// y is between 0 and MESH_NUM_Y_POINTS - 1
}
uint8_t y = 0; // costs the same as init to MESH_NUM_Y_POINTS
while (y < MESH_NUM_Y_POINTS) { // Less-than int is more expensive than equals-zero
// y is between 0 and MESH_NUM_Y_POINTS - 1
y++; // always increment
}
Anyway… mbl.reset() is not a bottleneck, so not worth worrying about optimizing for speed, as much as for size.
@thinkyhead So you actually run it, measured it and compared?
When I run the code below I get below result (Mega256 "original" + Arduino IDE 1.6.5). I have shorted it down so it should be a very simple copy'paste test. Please do.
Maybe you should re-evaulate your idiom "book"? Frankly I have been programming for a pretty long time too (in many companies and many projects), and this idiom has rarely appeared. Actually never I think, but there has been a lot of discussions about ++obj vs obj++, but that usually work good for iterators. And nowadays compilers and optimizers are so capable that writing code that gives more options to the compiler will help (which I can imagine is happening here).
So, the real reason for re-writing it was to enforcing your idiom? Or was it speed?
reset1: 67.87uS
reset2: 56.86uS
#define MESH_NUM_X_POINTS 7
#define MESH_NUM_Y_POINTS 7
float z_values[MESH_NUM_Y_POINTS][MESH_NUM_X_POINTS];
void reset1() {
for (int8_t y = MESH_NUM_Y_POINTS; y--;)
for (int8_t x = MESH_NUM_X_POINTS; x--;)
z_values[y][x] = 0;
}
void reset2() {
for (int8_t y =0; y< MESH_NUM_Y_POINTS; y++)
for (int8_t x =0; x< MESH_NUM_X_POINTS; x++)
z_values[y][x] = 0;
}
void setup() {
Serial.begin(115200);
uint32_t s, e;
uint16_t j;
for (uint8_t i = 0; i < 5; i++) {
s = micros();
for (j = 0; j<1000; j++) {
reset1();
}
e = micros();
Serial.print("reset1: ");
Serial.print((float)(e-s)/j);
Serial.println("uS");
s = micros();
for (j = 0; j<1000; j++) {
reset2();
}
e = micros();
Serial.print("reset2: ");
Serial.print((float)(e-s)/j);
Serial.println("uS\n");
}
}
void loop() {}
So, the real reason for re-writing it was to enforcing your idiom? Or was it speed?
Generated code size. Speed in mbl.reset() is not vital, not at all.
@thinkyhead Ok lets look at that then. (did quick test by copy pasting to have two reset1/reset2, lazy editing)
Sketch uses 4Â 330 bytes code above but with 2 times reset1().
Sketch uses 4Â 310 bytes code above but with 2 rimes reset2().
So it is actually 10 bytes longer.
My copy of Arduino 1.6.7 is producing binaries of identical size.
This is smaller, and will also work, because IEEE float zero is binary zero:
memset(z_values, 0, sizeof(z_values));
Another likely change to the mesh_bed_leveling class is to make all its data and methods static, as I have done with the other "singleton" classes. This both reduces code size and improves speed.
@thinkyhead I did try 1.6.7 and 1.6.9 and both produce 10 bytes longer code. I am on a Macbook Pro retina 15".
I did try 1.6.7 and 1.6.9 and both produce 10 bytes longer code.
@epatel Not for me. Arduino 1.6.7 produced the same size binary (59,148) for both cases. And I haven't messed with the optimization settings.
However, the memset version saves 58 bytes. (59,090)
@thinkyhead With code below (which have a #if 0)
On Arduino IDE 1.6.7
#if 0 -> Sketch uses 4Â 168 bytes (1%) of program storage space. Maximum is 253Â 952 bytes.
#if 1 -> Sketch uses 4Â 178 bytes (1%) of program storage space. Maximum is 253Â 952 bytes.
#define MESH_NUM_X_POINTS 7
#define MESH_NUM_Y_POINTS 7
float z_values[MESH_NUM_Y_POINTS][MESH_NUM_X_POINTS];
#if 1
void reset() {
for (int8_t y = MESH_NUM_Y_POINTS; y--;)
for (int8_t x = MESH_NUM_X_POINTS; x--;)
z_values[y][x] = 0;
}
#else
void reset() {
for (int8_t y =0; y< MESH_NUM_Y_POINTS; y++)
for (int8_t x =0; x< MESH_NUM_X_POINTS; x++)
z_values[y][x] = 0;
}
#endif
void setup() {
Serial.begin(115200);
uint32_t s, e;
uint16_t j;
for (uint8_t i = 0; i < 5; i++) {
s = micros();
for (j = 0; j<1000; j++) {
reset();
}
e = micros();
Serial.print("reset: ");
Serial.print((float)(e-s)/j);
Serial.println("uS");
}
}
void loop() {}
@thinkyhead ...about memset yeah I was thinking that first too. But just like you wrote earlier, better not do that on floats.
Actually, I checked. It's fine for floats.
Actually, I checked. It's fine for floats.
I actually had a reason to check this because I need to know how different values are represented. I need a couple of values as flags to determine if a mesh point has been measured and is valid on Delta's when you go past Printable (and Probeable) Radius.
Just in case you guys are interested, here is the binary representation for some of the important floating point values:
0.0 = 00 00 00 00
NAN = 00 00 c0 7f
-NAN = 00 00 c0 ff
Infinity = 00 00 80 7f
-Infinity = 00 00 80 ff
And removed the Verified Bug label!
Something interesting I just found out (after sever hours of debugging). If you have a float (or a double) the statement:
if ( f == f ) {
some other logic;
}
will always pass except in one case. If f has been assigned a value of NAN, it will always fail. You can't check to see if f is NAN by doing an if statement like this:
f = NAN;
if ( f == NAN ) {
this never happens;
}
because it will ALWAYS fail. Even if f has been assigned a value of NAN, that comparison will fail. You have to use the library function isnan() to check for NAN. Either that or compare f==f and see if that fails.
@Roxy-3D can we close this issue ?
Closing due to inactivity.
Please re-open this issue if still valid.
Most helpful comment
Something interesting I just found out (after sever hours of debugging). If you have a float (or a double) the statement:
will always pass except in one case. If f has been assigned a value of NAN, it will always fail. You can't check to see if f is NAN by doing an if statement like this:
because it will ALWAYS fail. Even if f has been assigned a value of NAN, that comparison will fail. You have to use the library function isnan() to check for NAN. Either that or compare f==f and see if that fails.