julia> any!(falses(10), trues(10, 10))
10-element BitArray{1}:
false
false
false
false
false
false
false
true
true
true
julia> any!(Vector(falses(10)), trues(10, 10))
10-element Array{Bool,1}:
true
true
true
true
true
true
true
true
true
true
I'm guessing this is an AVX512 codegen issue with the new compiler, but #27182 does not fix it. Doesn't occur with a 44-day old build (9f5351c36a) which is before the new compiler was enabled.
Minimal example:
function g(f, op, R::AbstractArray, A::AbstractArray)
@inbounds for IA in axes(A, 2)
@simd for i in axes(A, 1)
R[i] = op(R[i], f(A[i,IA]))
end
end
return R
end
g(identity, |, falses(10), trues(10, 10))
Yeah, I just got to the same reduction. That @simd
is illegal, because the memory accesses are not loop-parallel.
We could have a less strong version of @simd
that merely allows floating point reordering. That should be sufficient to retain most of the performance benefit without having the problematic semantics with respect to memory access.
Oh, interesting. This restriction makes sense, but it's news to me. So we cannot @simd
over assignments to BitArrays because they affect the same UInt64
value across multiple iterations?
So we cannot @simd over assignments to BitArrays because they affect the same UInt64 value across multiple iterations?
Correct. Technically not over reads from BitArrays either.
Dang. That really hamstrings our ability to use @simd
in generic array code. Looks like we're doing it wrong in reductions, extrema!
, and broadcast
.
I'm guessing lots of packages will hit this, too. I'd really love it if we could just fail to vectorize in cases like this instead of returning wrong results.
I'm guessing lots of packages will hit this, too. I'd really love it if we could just fail to vectorize in cases like this instead of returning wrong results.
Remove @simd
;)
Just for what it's worth — yesterday before we had totally figured this out I had kicked off a bisect. You probably already know this, but it was 56d7ebeed2ec7edbd3e1b8c5315561b650a65d67 that started causing the incorrect results.
For what it's worth, I don't think our compiler "hints" should ever have license to produce incorrect results like this. I understand that this is a bug; working around it and documenting it is fine for now, but at some point this does need fixing.
That's literally the only thing that there annotations are for. Giving the compiler license to do transformations that it otherwise thinks are illegal. If the compiler though it was allowed to do this, it'd have done it iself.
Typical uses of @simd
give the compiler license to re-associate floating-point operations even though that technically changes the result, although typically not by much. This does not seem like the same class of license.
Keno's definitely correct. Could we do something like having an @simd_unsafe
primitive though that we use to mark the BitArray getters and setters? I suspect that bitarray is fairly unique in its violation of memory reference expectations.
That's a fair point that bit arrays are a bit unusual in this regard.
Sparse arrays would also incur trouble here. And even structured arrays won't have a consistent read memory access pattern — does that mean you have liberty to give me garbage? Or does that just mean it'll fail to vectorize?
The guarantee we appear to be making is that each iteration of the loop will not observe the memory write from any other loop iteration. Random access pattern shouldn't be a problem (it'll likely just not be possible to vectorize). The BitArray issue is that a write actually consists of a read-modify-write (RMW) that makes it appear race-y here.
So we have to remove simd annotation on all generic code?
Definitely generic setindex!
is not safe. Beyond that I'm not sure.
Instead could we make @simd
do nothing for bitarrays
and have a special function when you want simd
for them? That seems better than breaking the ability to write fast generic code.
This bug makes Pkg3 completely useless on Skylake machines. We know from bugreports that we have a significant numbers of users on such machines. Bumping this for the next 0.7 pre-release.
Yeah, this is definitely a priority to get a fix in for.
In some sense, the problem is not that BitArray get/set are bad for SIMD, it is that they are too good for SIMD. That is, BitArrays getindex/setindex should talk to the simd machinery in order to try to get data processed in 64 bit chunks (or even better 128/256/512) instead of dropping down to 1-bit sequential processing.
It's not just BitArray
, though. We were allowing arbitrary functions inside @simd
loops — functions which could have similar sorts of side-effects that have a catastrophic interaction with SIMD.
Fixed by #27670
Most helpful comment
Dang. That really hamstrings our ability to use
@simd
in generic array code. Looks like we're doing it wrong in reductions,extrema!
, andbroadcast
.I'm guessing lots of packages will hit this, too. I'd really love it if we could just fail to vectorize in cases like this instead of returning wrong results.