Chapel 1.16.0 produced correct results for the following code, but as of 1.17.1 it now produces wrong results. Not sure what's going on.
0 1 2 3 4 5 6 7
4 5 6 7 4 5 6 7 // Correct
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 // Wrong
Source Code:
module ReIndex {
config var data_size = 8;
proc reindex_test(A: [?D] ?t) {
const half = data_size / 2;
var tmp: [0..#half] t;
// Generate view into A via array.reindex
ref reA = A.reindex(D);
// ref reA = A; // This works.
tmp = reA[(half + D.low)..#half];
reA[D.low..#half] = tmp;
}
proc main() {
var data = [i in 0..#data_size] i;
writeln(data);
reindex_test(data);
writeln(data);
}
}
Note that this code works fine if the "Generate view" line is replaced with ref reA = A;, so what makes array.reindex erroneous?
The change in behavior happened at PR #7775.
Thanks for tracking this down David. @benharsh, is there a way (e.g., config param) to disable bulk transfer as a workaround?
The useBulkTransfer config param currently defaults to true, so to disable:
chpl -suseBulkTransfer=false ...
I confirmed that -suseBulkTransfer=false works around the problem.
I believe I have a fix for this bug in #9460
Note that this code works fine if the "Generate view" line is replaced with
ref reA = A;, so what makes array.reindex erroneous?
The part of the example that had incorrect results was the assignment to tmp:
tmp = reA[(half + D.low)..#half];
During execution of the bulk-transfer, we will eventually translate the reindexed slice's indices to their original space. This was done incorrectly, and the indices were translated from {5..8} to {1..4}. The result should have been {5..8} because A was reindexed over its own domain. Apparently none of our tests had this pattern 馃槥
The linked PR adds additional testing to catch this and other failing reindex transfers going forward.
@benharsh -- From a later master that includes 2861bb5 (I think), chpl version 1.18.0 pre-release (9db63162ca), the original code still yields incorrect results. Can you tell me if I'm doing something wrong?
Huh, not sure what went wrong in the PR that 'fixed' the issue, I can still replicate this bug on master... I'll investigate.
Sorry to have gotten your hopes up! #9573 should finally resolve the original code's bug.
I added additional testing in the previous PR, but overlooked the fact that all domains were converted to stridable domains. This meant that the non-strided case (and therefore the original example) was not being tested.
The new PR adds a smaller reproducer of your original code to testing, and I intend to add a dedicated test for non-strided reindexes.
@BryantLam : I have merged #9573, but won't close this Issue until you confirm that the original bug is resolved.
@benharsh -- The quick two-line addition solves this problem. Thanks!
Most helpful comment
The part of the example that had incorrect results was the assignment to
tmp:During execution of the bulk-transfer, we will eventually translate the reindexed slice's indices to their original space. This was done incorrectly, and the indices were translated from
{5..8}to{1..4}. The result should have been{5..8}becauseAwas reindexed over its own domain. Apparently none of our tests had this pattern 馃槥The linked PR adds additional testing to catch this and other failing reindex transfers going forward.