Booleans are represented by i8, but logically represent i1. Now, if we ever come into the situation of having a denormalized Bool, like reinterpret(Bool, 0x03), we get very confusing behavior. I am not sure whether this is
julia> @noinline h(a,u) = push!(a,reinterpret(Bool, u))
julia> a=Set{Bool}(); h(a, 0x03); h(a, 0x03)
Set(Bool[1, 1])
julia> a=Set{Any}(); h(a, 0x03); h(a, 0x03)
Set(Any[true])
julia> function f(b)
b==true && 1
b==false && 2
3
end
julia> f(reinterpret(Bool, 0x03))
3
julia> reinterpret(Bool, 0x03) === true
true
Especially the last test suggests that this is pretty UB in practice
What is h?
bitcast is generally supposed to sanitize the input, but is probably failing in this case
What is h?
Oops, julia> @noinline h(a,u) = push!(a,reinterpret(Bool, u))
So the general idea is that we assume that every Bool is normalized and defend this at the boundaries? I.e. we could e.g. emit
%rest = and i8, %someBool, 248
%isNormal = icmp eq i8 %rest, 0
call assume(i1 %isNormal)
and make it hard UB?
I don't think we can realistically defend the invariant that every Bool is normalized. E.g.
julia> struct bc
a::Bool
end
julia> r=Ref(bc(false));unsafe_store!(convert(Ptr{UInt8}, pointer_from_objref(r)), 0xf0);
julia> r[]==bc(false)
true
julia> r[]===bc(false)
false
julia> @inline hh(a,b)=a===b
julia> function foo(a)
if hh(a,bc(true)) return 1
elseif hh(a,bc(false)) return 2
else return 3
end
end
julia> foo(r[])
3
julia> @noinline hh(a,b)=a===b
julia> foo(r[])
2
This example shows (1) a mismatch between interpreted and compiled code (egal vs equal, which just calls egal but in a compiled situation) and (2) that behavior depends on inlining decisions.
More realistic is to treat the other 7 bits of Bool as structure padding:
That is, egal on Bool would be changed to only consider the lowest bit.
3. If you reinterpret / unsafe_store! / unsafe_load between things with incompatible structure padding, then you get undefined bits. If properly documented, that's totally fine.
This seems like it's pretty clearly going outside of the system. You can make lots of things break by using unsafe_* and reinterpret in various ways. The language is only responsible for maintaining its abstractions if you don't go behind its back to break them.
This seems like it's pretty clearly going outside of the system. You can make lots of things break by using unsafe_* and reinterpret in various ways. The language is only responsible for maintaining its abstractions if you don't go behind its back to break them.
Absolutely correct. However, one would naively expect that foo(r[]) is semantically well-defined for non-crazy foo, if r is a ref to bitstype, regardless how the memory is filled. This is the case for normal structure-padding.
So, if we decide to docfix this, we should write:
Bool is a bitstype with one byte length, and has two valid values,
false, represented by0x00, andtrue, represented by0x01. All other bit-patterns are effectively poison and lead relatively directly to undefined behavior. This means that special care must be taken withreinterpret,unsafe_load,unsafe_store!, or when writing toVector{Bool}from C code, in order to preserve this invariant (i.e. you must never write invalid bitpatterns into memory that belongs toBool, and you must never load aBoolor any struct containingBoolfrom memory that is not guaranteed to have no invalid bitpatterns).
It is near impossible to recover from situations where invalidBool(or structs containing invalidBool) have entered the system: The compiler and runtime know that invalidBoolare impossible and will happily optimize out silly checks or assertions.
...if I was reading this docfix, I'd say think that julia is bananas.
My preferred behavior would be
Structure padding. The in-memory representation of structs may contain alignment padding; for this the usual C rules apply. Julia makes no assumptions nor guarantees about the content of alignment padding. Comparison, hashes, etc all ignore the padding. Special care must be taken when handling cryptographic keys: The alignment padding may be filled with arbitrary leftovers of your keys from registers (e.g.:
secretHandlingfun(); writeStructWithPaddingToPublicBuffer();is incorrect and cannot be made safe short of inspecting the generated@code_native; instead, one should use structures without padding). Booleans contain 7 leading bits of structure padding.
Bool. Blah, blah. Also, see cross-ref structure padding.
...if I was reading this docfix, I'd say think that julia is bananas.
Why? See e.g. https://doc.rust-lang.org/reference/behavior-considered-undefined.html
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
- A value other than false (0) or true (1) in a bool.
In your text, the part that starts with
It is near impossible to recover from situations where invalid
and the rest after it can be removed since it is already established that you mustn't do this and it seems you are just adding some passive-aggressive extra stuff because you don't like it :P.
For your second suggestion, that seems much more complicated and just worse.
unsafe_* are already fairly well documented as being, well, unsafe. reinterpret, on the other hand, could definitely use more warnings about how you can easily cause trouble with padding — both in the scalar and array case.
Also, @chethega fixing this isn't that difficult. Bool(Int(bad_bool)%2) will fix the problem. Furthermore, if you know your data might have invalid bit-patterns, it's pretty easy to load the data properly.
If there are low-cost (zero, actually) ways to make this behavior less bad in the presence of unnormalized bools, that would be great to have. Some doc addition might also be good, but the two notes above are both a bit overly alarming. The first one essentially says "if you use unsafe tools incorrectly, you might hurt yourself" which is pretty reasonable, it's just written in an alarming way. The second one is better, but I'm not sure that the padding stuff will be clear to people (or that most Julia programmers need to know or care about it).
Most helpful comment
This seems like it's pretty clearly going outside of the system. You can make lots of things break by using
unsafe_*andreinterpretin various ways. The language is only responsible for maintaining its abstractions if you don't go behind its back to break them.