The following code iterates over a Block distributed array using for and forall
and gets different results (when they should be the same).
This occurs even for single locale Chapel.
Note that if I do const D = DSpace, the code does the correct thing, and yields
diff=0.
Source Code:
use BlockDist;
use Random;
config const Ng=128;
const DSpace={0.. #Ng, 0.. #Ng, 0.. #(Ng+2)};
var targets : [0.. #numLocales, 0..0,0..0] locale;
targets[..,0,0] = Locales;
const D : domain(3) dmapped Block(boundingBox=DSpace, targetLocales=targets) = DSpace;
const Dre = D[..,..,0.. #(Ng+2) by 2 align 0];
const Dim = D[..,..,0.. #(Ng+2) by 2 align 1];
var GG,GG1 : [D]real;
fillRandom(GG, seed=1234);
GG1 = GG;
const twopi = 2*pi;
// Do an iteration over this using for
for (x0, x1) in zip(GG[Dre], GG[Dim]) {
// Box-Muller
x0 += if (x0 < 1.0e-30) then 1.0e-30 else 0.0;
const rr = sqrt(-2*log(x0));
const ang = twopi*x1;
x0 = rr*cos(ang);
x1 = rr*sin(ang);
}
// Do an iteration over this using forall
forall (x0, x1) in zip(GG1[Dre], GG1[Dim]) {
// Box-Muller
x0 += if (x0 < 1.0e-30) then 1.0e-30 else 0.0;
const rr = sqrt(-2*log(x0));
const ang = twopi*x1;
x0 = rr*cos(ang);
x1 = rr*sin(ang);
}
// This should be zero!!
var diff = max reduce abs(GG-GG1);
writeln(diff);
Compile command:
chpl bug.chpl
Execution command:
./a.out
This should return 0, but does not.
chpl Version 1.16.0 pre-release (4924299)
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: mpi-gnu *
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none *
CHPL_TASKS: fifo *
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_JEMALLOC: jemalloc
CHPL_MAKE: make
CHPL_ATOMICS: intrinsics
CHPL_GMP: none
CHPL_HWLOC: none
CHPL_REGEXP: re2
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none
I came up with a simpler test case. It's simpler in a few ways:
Ng == 2.GG is initialized with non-random unique values.forall over just GG[Dim]forall x in GG[Dim] produces the same values that forall x in GG[Dre] did correctly.forall is serial forall.use BlockDist;
use Random;
config const Ng=2;
const DSpace={0.. #Ng, 0.. #Ng, 0.. #(Ng+2)};
var targets : [0.. #numLocales, 0..0,0..0] locale;
targets[..,0,0] = Locales;
const D : domain(3) dmapped Block(boundingBox=DSpace, targetLocales=targets) = DSpace;
const Dre = D[..,..,0.. #(Ng+2) by 2 align 0];
const Dim = D[..,..,0.. #(Ng+2) by 2 align 1];
var GG : [D]real;
for (x,y,z) in D do
GG[x,y,z] = x * 10 + y + z:real / 10;
writeln("GG:");
writeln(GG);
writeln("\n");
writeln("GG Dre:");
writeln(GG[Dre]);
writeln("");
writeln("GG Dim:");
writeln(GG[Dim]);
writeln("\n");
serial { forall x in GG[Dre] do write("(",x,") "); } writeln("\n");
serial { forall x in GG[Dim] do write("(",x,") "); } writeln("");
GG and its two slices print ok, but iterating over GG[Dim]
produces the same values as iterating over GG[Dre]:
fortytwo@magrathea:~/src/chpl/6383$ ./bug -nl 1
GG:
0.0 0.1 0.2 0.3
1.0 1.1 1.2 1.3
10.0 10.1 10.2 10.3
11.0 11.1 11.2 11.3
GG Dre:
0.0 0.2
1.0 1.2
10.0 10.2
11.0 11.2
GG Dim:
0.1 0.3
1.1 1.3
10.1 10.3
11.1 11.3
(0.0) (0.2) (1.0) (1.2) (10.0) (10.2) (11.0) (11.2)
(0.0) (0.2) (1.0) (1.2) (10.0) (10.2) (11.0) (11.2)
And motivated by @cassella , here's another simple case
use BlockDist;
const DSpace = {0.. #10};
const D : domain(1) dmapped Block(DSpace) = DSpace;
const Dre = D[.. by 2 align 0];
const Dim = D[.. by 2 align 1];
serial { forall ix in Dre do write(" ",ix," "); }
writeln();
serial { forall ix in Dim do write(" ",ix," "); }
writeln();
which produces
0 2 4 6 8
0 2 4 6 8
Motivated by @npadmana and @cassella, here's a proposed fix: PR #6384
@bradcray -- stupid question -- why does one only need to fix the follower and not the leader?
I'll note that I finally understood the difference between the low and first methods on a domain. Do you think it's worth giving an example in the documentation? I think many times, I've meant first when I've used low....
I'm happy to file another issue... :)
UPDATE -- actually -- never mind my question about the leader; I see it uses first which must fix this. I guess I'm now confused as to why there isn't a standalone iterator.... (but then again, there is much about distributions I don't understand!).
I'll note that I finally understood the difference between the low and first methods on a domain. Do you think it's worth giving an example in the documentation? I think many times, I've meant first when I've used low....
I'm happy to file another issue... :)
If it hasn't been sufficiently clear to you then yes, I think it's worth adding an example or improving the documentation.
I believe the leader doesn't need to be updated because the leader is responsible for yielding dense 0-based sets of indices such that 1..n can be zipped with 0..n-1 and 2..2n by 2. Block *ought to have a standalone iterator as well, but since these iterators were a more recent addition to the language, they haven't made their way through all of the code base yet. There's no reason not to provide them and may be (modest?) performance benefit in doing so.
I have taken over the development of a bug fix for this issue. See PR #6406 for more information.
I'd like to rename this issue to the more precise "forall loops over Block don't respect alignment" -- any objections?