Julia: revise Statistics.middle

Created on 6 Nov 2018  Â·  10Comments  Â·  Source: JuliaLang/julia

Statistics.middle does not conform to the draft IEEE 1788 Standard on Interval Arithmetic. Perhaps it can be revised along the lines discussed in the paper:

http://doi.acm.org/10.1145/2493882

(free version: https://hal.archives-ouvertes.fr/hal-00576641/document)

Specifically, they propose the following implementation (pseudo-code):

def midpoint ([ a , b ]):
  if [a , b ] is empty:
    return NaN
  if a == -inf:
    if b == +inf:
      return 0.0
    else:
      return -realmax
  if b == +inf:
    return +realmax

  round_nearest() # set rounding to nearest even
  mid = 0 . 5 *( a + b )
  if mid == - inf or mid == + inf:
    return 0 . 5 * a + 0 . 5 * b
  else
    return mid

Here is a Julia equivalent (except for the rounding to nearest even part that I don't know how to set)

"""midpoint of the interval [a,b]"""
function midpoint(a::Float64, b::Float64)
    #= Based on F. Goualard. 2014, Table VII
    DOI: 10.1145/2493882 =#

    if !(a ≤ b)
        return NaN
    elseif a == -Inf
        if b == Inf
            return 0.
        else
            return nextfloat(-Inf)
        end
    elseif b == Inf
        return prevfloat(Inf)
    end

    mid = 0.5*(a + b)
    if isfinite(mid)
        return mid
    else
        return 0.5 * a + 0.5 * b
    end
end

Most helpful comment

but for bisection you can argue that you do not have or should take the arithmetic middle anyway, but you should maybe bisect the space of possible answers evenly taking the density of floating points into account

Note that this is quite straightforward to do:

function bisect(x::Float64, y::Float64)
    a = reinterpret(Int64, x)
    b = reinterpret(Int64, y)
    reinterpret(Float64, (a + b) >>> 1)
end

With some fun and surprising results:

julia> bisect(1.0, 0.0)
1.118751109680031e-154

julia> bisect(1.0, 0.5)
0.75

julia> bisect(Inf, 0.0)
1.5

julia> bisect(Inf, 0.5)
1.0055855947456948e154

All 10 comments

I am not easily convinced by this. If Inf represents (roughly) either ∞ or an unknown number x with x >> floatmax(), then the midpoint of [a, x] with a in [-floatmax(), floatmax()] will be also an unknown number typically larger than floatmax() or ∞.
Returning floatmax() representing a large number known up do floating point precision is inadequate, the output should not be more specific than the input.

I'm with @mschauer here. I'm not at all convinced that returning anything but NaN for middle(-Inf,Inf) makes sense and Inf for middle(x,Inf) if x is finite and vice versa. The current implementation doesn't seem to overflow either so I don't think there is anything to change here.

I think the reasoning is that the IEEE draft proposal is more useful for bissection type algorithms.

I see, but for bisection you can argue that you do not have or should take the arithmetic middle anyway, but you should maybe bisect the space of possible answers evenly taking the density of floating points into account. See also https://www.shapeoperator.com/2014/02/22/bisecting-floats/
Unfortunately I do not have IEEE journals here.

It looks like we currently compute this as x/y + y/2; it might be worthwhile to try 0.5 * (x+y) first, which I guess will handle denormals better. Are there any other values where there is a difference?

I'm also skeptical of this behavior for infinities. If a particular bisection algorithm wants to handle infinity that way it can, but baking it into a standard function makes less sense. I feel like the IEEE spec has pulled this kind of thing before, for example with max(NaN,1) == 1, which is arguably useful in specific instances but doesn't fit the general meaning of max and NaN.

but for bisection you can argue that you do not have or should take the arithmetic middle anyway, but you should maybe bisect the space of possible answers evenly taking the density of floating points into account

Note that this is quite straightforward to do:

function bisect(x::Float64, y::Float64)
    a = reinterpret(Int64, x)
    b = reinterpret(Int64, y)
    reinterpret(Float64, (a + b) >>> 1)
end

With some fun and surprising results:

julia> bisect(1.0, 0.0)
1.118751109680031e-154

julia> bisect(1.0, 0.5)
0.75

julia> bisect(Inf, 0.0)
1.5

julia> bisect(Inf, 0.5)
1.0055855947456948e154

The IEEE interval "midppint" function is named mid.
I use this for floating point midpoint of signed magnitudes.

function midpoint(x::T, y::T) where {T<:Base.IEEEFloat}
    m = abs(x) < abs(y) ?     # midpoint(x,y) === midpoint(-x,-y)
    y - (y/2 - x/2) :         # midpoint(x,y) === midpoint(y,x)
    x - (x/2 - y/2)           # !isinf(x) && midpoint(x,x) === x 
    return m + zero(T)        # -0.0 + 0.0 === 0.0
end

a few more surprising results (see Stefan's note above)

bisect(-1.0, 1.0)
Inf

bisect(-1.0, 2.0)
NaN

Yep, that code is broken for negative numbers. Here's a version I came up with:

~~~
function toOrderedUInt64(x::Float64)::UInt64
v=reinterpret(UInt64,x)
if( (v & 0x8000000000000000) != 0)
return ~v
end
return v+0x8000000000000000
end

function fromOrderedUInt64(v::UInt64)
if( (v & 0x8000000000000000) != 0)
return reinterpret(Float64, v-0x8000000000000000)
end
return reinterpret(Float64, ~v)
end

function bisect(x::Float64, y::Float64)::Float64
a = toOrderedUInt64(x)
b = toorderedUInt64(y)
return fromOrderedUInt64(a÷2 + b÷2)
end
~~~

This gives seemingly sensible results even for Inf: Though I suspect it goes haywire for NaNs.

~~~~
julia> bisect(-1.0,1.0)
-0.0

julia> bisect(-1.5,-1.0)
-1.2500000000000002

julia> bisect(1.5,1.0)
1.25

julia> bisect(0.0,Inf)
1.5

julia> bisect(-0.0,-Inf)
-1.5000000000000002

julia> bisect(Inf,-Inf)
-0.0
~~~~

one more

@inline function midfloat(a::Float64, b::Float64)
    ia = reinterpret(Int64, a)
    ib = reinterpret(Int64, b)
    m = (ia & ib) + ((ia ⊻ ib)>>>1)
    return reinterpret(Float64, m)
end

@inline function midvalue(a::Float64, b::Float64)
    if abs(a) < abs(b)
        return a - (a/2 - b/2)
    else
        return b - (b/2 - a/2)
    end
end

function bisect(a::T, b::T) where {T<:Base.IEEEFloat}
    if !isfinite(a) || !isfinite(b)
        result = a + b
    end
    if signbit(a) !== signbit(b)
        result = return midvalue(a, b)
    else
        absa, absb = abs(a), abs(b)
        if absa > absb
            a, b = b, a
            absa = absb
        end
        if absa < 1.0
            result =  a - (a/2 - b/2)
        else
            result = midfloat(a, b)
        end
    end
    return result
end
Was this page helpful?
0 / 5 - 0 ratings

Related issues

StefanKarpinski picture StefanKarpinski  Â·  3Comments

omus picture omus  Â·  3Comments

arshpreetsingh picture arshpreetsingh  Â·  3Comments

TotalVerb picture TotalVerb  Â·  3Comments

Keno picture Keno  Â·  3Comments