Statistics.middle
does not conform to the draft IEEE 1788 Standard on Interval Arithmetic. Perhaps it can be revised along the lines discussed in the paper:
http://doi.acm.org/10.1145/2493882
(free version: https://hal.archives-ouvertes.fr/hal-00576641/document)
Specifically, they propose the following implementation (pseudo-code):
def midpoint ([ a , b ]):
if [a , b ] is empty:
return NaN
if a == -inf:
if b == +inf:
return 0.0
else:
return -realmax
if b == +inf:
return +realmax
round_nearest() # set rounding to nearest even
mid = 0 . 5 *( a + b )
if mid == - inf or mid == + inf:
return 0 . 5 * a + 0 . 5 * b
else
return mid
Here is a Julia equivalent (except for the rounding to nearest even part that I don't know how to set)
"""midpoint of the interval [a,b]"""
function midpoint(a::Float64, b::Float64)
#= Based on F. Goualard. 2014, Table VII
DOI: 10.1145/2493882 =#
if !(a ≤ b)
return NaN
elseif a == -Inf
if b == Inf
return 0.
else
return nextfloat(-Inf)
end
elseif b == Inf
return prevfloat(Inf)
end
mid = 0.5*(a + b)
if isfinite(mid)
return mid
else
return 0.5 * a + 0.5 * b
end
end
I am not easily convinced by this. If Inf
represents (roughly) either ∞ or an unknown number x
with x >> floatmax()
, then the midpoint of [a, x]
with a in [-floatmax(), floatmax()]
will be also an unknown number typically larger than floatmax()
or ∞.
Returning floatmax()
representing a large number known up do floating point precision is inadequate, the output should not be more specific than the input.
I'm with @mschauer here. I'm not at all convinced that returning anything but NaN
for middle(-Inf,Inf)
makes sense and Inf
for middle(x,Inf)
if x
is finite and vice versa. The current implementation doesn't seem to overflow either so I don't think there is anything to change here.
I think the reasoning is that the IEEE draft proposal is more useful for bissection type algorithms.
I see, but for bisection you can argue that you do not have or should take the arithmetic middle anyway, but you should maybe bisect the space of possible answers evenly taking the density of floating points into account. See also https://www.shapeoperator.com/2014/02/22/bisecting-floats/
Unfortunately I do not have IEEE journals here.
It looks like we currently compute this as x/y + y/2
; it might be worthwhile to try 0.5 * (x+y)
first, which I guess will handle denormals better. Are there any other values where there is a difference?
I'm also skeptical of this behavior for infinities. If a particular bisection algorithm wants to handle infinity that way it can, but baking it into a standard function makes less sense. I feel like the IEEE spec has pulled this kind of thing before, for example with max(NaN,1) == 1
, which is arguably useful in specific instances but doesn't fit the general meaning of max and NaN.
but for bisection you can argue that you do not have or should take the arithmetic middle anyway, but you should maybe bisect the space of possible answers evenly taking the density of floating points into account
Note that this is quite straightforward to do:
function bisect(x::Float64, y::Float64)
a = reinterpret(Int64, x)
b = reinterpret(Int64, y)
reinterpret(Float64, (a + b) >>> 1)
end
With some fun and surprising results:
julia> bisect(1.0, 0.0)
1.118751109680031e-154
julia> bisect(1.0, 0.5)
0.75
julia> bisect(Inf, 0.0)
1.5
julia> bisect(Inf, 0.5)
1.0055855947456948e154
The IEEE interval "midppint" function is named mid
.
I use this for floating point midpoint
of signed magnitudes.
function midpoint(x::T, y::T) where {T<:Base.IEEEFloat}
m = abs(x) < abs(y) ? # midpoint(x,y) === midpoint(-x,-y)
y - (y/2 - x/2) : # midpoint(x,y) === midpoint(y,x)
x - (x/2 - y/2) # !isinf(x) && midpoint(x,x) === x
return m + zero(T) # -0.0 + 0.0 === 0.0
end
a few more surprising results (see Stefan's note above)
bisect(-1.0, 1.0)
Inf
bisect(-1.0, 2.0)
NaN
Yep, that code is broken for negative numbers. Here's a version I came up with:
~~~
function toOrderedUInt64(x::Float64)::UInt64
v=reinterpret(UInt64,x)
if( (v & 0x8000000000000000) != 0)
return ~v
end
return v+0x8000000000000000
end
function fromOrderedUInt64(v::UInt64)
if( (v & 0x8000000000000000) != 0)
return reinterpret(Float64, v-0x8000000000000000)
end
return reinterpret(Float64, ~v)
end
function bisect(x::Float64, y::Float64)::Float64
a = toOrderedUInt64(x)
b = toorderedUInt64(y)
return fromOrderedUInt64(a÷2 + b÷2)
end
~~~
This gives seemingly sensible results even for Inf: Though I suspect it goes haywire for NaNs.
~~~~
julia> bisect(-1.0,1.0)
-0.0
julia> bisect(-1.5,-1.0)
-1.2500000000000002
julia> bisect(1.5,1.0)
1.25
julia> bisect(0.0,Inf)
1.5
julia> bisect(-0.0,-Inf)
-1.5000000000000002
julia> bisect(Inf,-Inf)
-0.0
~~~~
one more
@inline function midfloat(a::Float64, b::Float64)
ia = reinterpret(Int64, a)
ib = reinterpret(Int64, b)
m = (ia & ib) + ((ia ⊻ ib)>>>1)
return reinterpret(Float64, m)
end
@inline function midvalue(a::Float64, b::Float64)
if abs(a) < abs(b)
return a - (a/2 - b/2)
else
return b - (b/2 - a/2)
end
end
function bisect(a::T, b::T) where {T<:Base.IEEEFloat}
if !isfinite(a) || !isfinite(b)
result = a + b
end
if signbit(a) !== signbit(b)
result = return midvalue(a, b)
else
absa, absb = abs(a), abs(b)
if absa > absb
a, b = b, a
absa = absb
end
if absa < 1.0
result = a - (a/2 - b/2)
else
result = midfloat(a, b)
end
end
return result
end
Most helpful comment
Note that this is quite straightforward to do:
With some fun and surprising results: