Julia: bug: Behavior of denormalized Bool depends on inlining decisions

Created on 27 Feb 2020  Â·  10Comments  Â·  Source: JuliaLang/julia

Booleans are represented by i8, but logically represent i1. Now, if we ever come into the situation of having a denormalized Bool, like reinterpret(Bool, 0x03), we get very confusing behavior. I am not sure whether this is

  1. Intentional UB (needs docs)
  2. A bug
  3. Not UB, there are coherent semantics (unlikely)
julia> @noinline h(a,u) = push!(a,reinterpret(Bool, u))

julia> a=Set{Bool}(); h(a, 0x03); h(a, 0x03)
Set(Bool[1, 1])

julia> a=Set{Any}(); h(a, 0x03); h(a, 0x03)
Set(Any[true])

julia> function f(b)
       b==true && 1
       b==false && 2
       3
       end

julia> f(reinterpret(Bool, 0x03))
3

julia> reinterpret(Bool, 0x03) === true
true

Especially the last test suggests that this is pretty UB in practice

codegen

Most helpful comment

3. If you reinterpret / unsafe_store! / unsafe_load between things with incompatible structure padding, then you get undefined bits. If properly documented, that's totally fine.

This seems like it's pretty clearly going outside of the system. You can make lots of things break by using unsafe_* and reinterpret in various ways. The language is only responsible for maintaining its abstractions if you don't go behind its back to break them.

All 10 comments

What is h?

bitcast is generally supposed to sanitize the input, but is probably failing in this case

What is h?

Oops, julia> @noinline h(a,u) = push!(a,reinterpret(Bool, u))

So the general idea is that we assume that every Bool is normalized and defend this at the boundaries? I.e. we could e.g. emit

%rest = and i8, %someBool, 248
%isNormal = icmp eq i8 %rest, 0
call assume(i1 %isNormal)

and make it hard UB?

I don't think we can realistically defend the invariant that every Bool is normalized. E.g.

julia> struct bc
       a::Bool
       end
julia> r=Ref(bc(false));unsafe_store!(convert(Ptr{UInt8}, pointer_from_objref(r)), 0xf0);
julia> r[]==bc(false)
true
julia> r[]===bc(false)
false
julia> @inline hh(a,b)=a===b
julia> function foo(a)
       if hh(a,bc(true)) return 1
       elseif hh(a,bc(false)) return 2
       else return 3
       end
       end
julia> foo(r[])
3
julia> @noinline hh(a,b)=a===b
julia> foo(r[])
2

This example shows (1) a mismatch between interpreted and compiled code (egal vs equal, which just calls egal but in a compiled situation) and (2) that behavior depends on inlining decisions.

More realistic is to treat the other 7 bits of Bool as structure padding:

  1. The value of the top 7 bits is undefined
  2. Comparison / identity only compares the LSB, just like truthiness only considers the LSB
  3. If you reinterpret / unsafe_store! / unsafe_load between things with incompatible structure padding, then you get undefined bits. If properly documented, that's totally fine.

That is, egal on Bool would be changed to only consider the lowest bit.

3. If you reinterpret / unsafe_store! / unsafe_load between things with incompatible structure padding, then you get undefined bits. If properly documented, that's totally fine.

This seems like it's pretty clearly going outside of the system. You can make lots of things break by using unsafe_* and reinterpret in various ways. The language is only responsible for maintaining its abstractions if you don't go behind its back to break them.

This seems like it's pretty clearly going outside of the system. You can make lots of things break by using unsafe_* and reinterpret in various ways. The language is only responsible for maintaining its abstractions if you don't go behind its back to break them.

Absolutely correct. However, one would naively expect that foo(r[]) is semantically well-defined for non-crazy foo, if r is a ref to bitstype, regardless how the memory is filled. This is the case for normal structure-padding.

So, if we decide to docfix this, we should write:

Bool is a bitstype with one byte length, and has two valid values, false, represented by 0x00, and true, represented by 0x01. All other bit-patterns are effectively poison and lead relatively directly to undefined behavior. This means that special care must be taken with reinterpret, unsafe_load, unsafe_store!, or when writing to Vector{Bool} from C code, in order to preserve this invariant (i.e. you must never write invalid bitpatterns into memory that belongs to Bool, and you must never load a Bool or any struct containing Bool from memory that is not guaranteed to have no invalid bitpatterns).
It is near impossible to recover from situations where invalid Bool (or structs containing invalid Bool) have entered the system: The compiler and runtime know that invalid Bool are impossible and will happily optimize out silly checks or assertions.

...if I was reading this docfix, I'd say think that julia is bananas.

My preferred behavior would be

Structure padding. The in-memory representation of structs may contain alignment padding; for this the usual C rules apply. Julia makes no assumptions nor guarantees about the content of alignment padding. Comparison, hashes, etc all ignore the padding. Special care must be taken when handling cryptographic keys: The alignment padding may be filled with arbitrary leftovers of your keys from registers (e.g.: secretHandlingfun(); writeStructWithPaddingToPublicBuffer(); is incorrect and cannot be made safe short of inspecting the generated @code_native; instead, one should use structures without padding). Booleans contain 7 leading bits of structure padding.

Bool. Blah, blah. Also, see cross-ref structure padding.

...if I was reading this docfix, I'd say think that julia is bananas.

Why? See e.g. https://doc.rust-lang.org/reference/behavior-considered-undefined.html

Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):

  • A value other than false (0) or true (1) in a bool.

In your text, the part that starts with

It is near impossible to recover from situations where invalid

and the rest after it can be removed since it is already established that you mustn't do this and it seems you are just adding some passive-aggressive extra stuff because you don't like it :P.

For your second suggestion, that seems much more complicated and just worse.

unsafe_* are already fairly well documented as being, well, unsafe. reinterpret, on the other hand, could definitely use more warnings about how you can easily cause trouble with padding — both in the scalar and array case.

Also, @chethega fixing this isn't that difficult. Bool(Int(bad_bool)%2) will fix the problem. Furthermore, if you know your data might have invalid bit-patterns, it's pretty easy to load the data properly.

If there are low-cost (zero, actually) ways to make this behavior less bad in the presence of unnormalized bools, that would be great to have. Some doc addition might also be good, but the two notes above are both a bit overly alarming. The first one essentially says "if you use unsafe tools incorrectly, you might hurt yourself" which is pretty reasonable, it's just written in an alarming way. The second one is better, but I'm not sure that the padding stuff will be clear to people (or that most Julia programmers need to know or care about it).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

helgee picture helgee  Â·  3Comments

Keno picture Keno  Â·  3Comments

manor picture manor  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

ararslan picture ararslan  Â·  3Comments