The current implementation of samplers for eg Float23
and Float64
effectively generates numbers as n*eps(T)
, where n
is an integer between 0
and 1/eps(T)-1
(inclusive). Eg for Float64
this range is 0:(2^52-1)
.
This makes the random draws (slightly) biased, by -eps(T)/2
.
The proposal is to add this term back to the result in the implementation.
Cf this discussion, especially the proposal of @perrutquist
Just as a data point, to observe this bias, you would need ballpark
julia> Ts = [Float16, Float32, Float64]
julia> [Ts [(std(Uniform())/(eps(T)/2))^2 for T in Ts]]
3×2 Array{Any,2}:
Float16 3.49525e5
Float32 2.34562e13
Float64 6.7608e30
number of trials.
Possibly, but since we can do it right with a very simple correction, why shouldn't we?
cc @rfourquet
Some more checks (right column after adding eps(T)/2
):
julia> T = Float16;[["P(rand() <= 2^(-$i)" for i in 0:16] [mean([(n)*eps(T) <= 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] [mean([(n)*eps(T) + eps(T)/2 <= 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] ]
17×3 Array{Any,2}:
"P(rand() <= 2^(-0)" 1.0 1.0
"P(rand() <= 2^(-1)" 0.500977 0.5
"P(rand() <= 2^(-2)" 0.250977 0.25
"P(rand() <= 2^(-3)" 0.125977 0.125
"P(rand() <= 2^(-4)" 0.0634766 0.0625
"P(rand() <= 2^(-5)" 0.0322266 0.03125
"P(rand() <= 2^(-6)" 0.0166016 0.015625
"P(rand() <= 2^(-7)" 0.00878906 0.0078125
"P(rand() <= 2^(-8)" 0.00488281 0.00390625
"P(rand() <= 2^(-9)" 0.00292969 0.00195313
"P(rand() <= 2^(-10)" 0.00195313 0.000976563
"P(rand() <= 2^(-11)" 0.000976563 0.000976563
"P(rand() <= 2^(-12)" 0.000976563 0.0
"P(rand() <= 2^(-13)" 0.000976563 0.0
julia> T = Float16;[["P(rand() < 2^(-$i)" for i in 0:16] [mean([(n)*eps(T) < 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] [mean([(n)*eps(T) + eps(T)/2 < 2.0^(-i) for n in 0:(1/eps(T)-1)]) for i in 0:16] ]
17×3 Array{Any,2}:
"P(rand() < 2^(-0)" 1.0 1.0
"P(rand() < 2^(-1)" 0.5 0.5
"P(rand() < 2^(-2)" 0.25 0.25
"P(rand() < 2^(-3)" 0.125 0.125
"P(rand() < 2^(-4)" 0.0625 0.0625
"P(rand() < 2^(-5)" 0.03125 0.03125
"P(rand() < 2^(-6)" 0.015625 0.015625
"P(rand() < 2^(-7)" 0.0078125 0.0078125
"P(rand() < 2^(-8)" 0.00390625 0.00390625
"P(rand() < 2^(-9)" 0.00195313 0.00195313
"P(rand() < 2^(-10)" 0.000976563 0.000976563
"P(rand() < 2^(-11)" 0.000976563 0.0
"P(rand() < 2^(-12)" 0.000976563 0.0
This doesn't look too bad at first sight.
@mschauer: I am sorry, I don't understand the purpose of your execise.
The bias exists and can be described exactly. There is no need to simulate anything.
The bias is correctable, rather cheaply. Whether it "looks bad" in some statistical tests that may or may not have power for detecting it (it is a rather small bias, after all) is not the issue here.
The key question is whether there are any objections to correcting this bias on the table. Otherwise, I would just make a PR. @rfourquet, I would appreciate your opinion on this.
I am sorry, I don't understand the purpose of your exercise.
A good implementation of rand()
would for example have the property that the probabilities of rand() <= p
and rand() < p
should be equal to p
. I checked*) this for our Random.rand(T)
and for your Random.rand(T) + eps(T)/2
and concluded "it doesn't look to bad" for your idea because e.g. P(rand() <= 0.5) = 0.5
and not 0.500977
. I am not sure why you are a bit defensive here. Why not discuss the consequences of this change?
*) Using the information "rand() effectively generates numbers as n*eps(T), where n is an integer between 0 and 1/eps(T)-1 (inclusive)" you gave above.
Sorry if I came across as defensive, that was not the intention.
The consequences of this change are that rand
will be unbiased for floats, including the bias you demonstrated, at no measurable performance loss (seems to be negligible compared to noise):
julia> using Random, BenchmarkTools
julia> rand_corrected(T) = rand(T) + eps(T)/2
rand_corrected (generic function with 1 method)
julia> Random.seed!(1);
julia> @btime foreach(_ -> rand(Float64), 1:1024)
4.382 μs (0 allocations: 0 bytes)
julia> Random.seed!(1);
julia> @btime foreach(_ -> rand_corrected(Float64), 1:1024)
4.368 μs (0 allocations: 0 bytes)
The performance loss wouldn't just be negligible, it would be exactly zero. The current implementation is something like rand(r, CloseOpen12()) - 1.0
and this could be replaced by rand(r, CloseOpen12()) - prevfloat(1.0)
I.e. the only change is in a compile-time constant.
My understanding is that this rearrangement is valid because eps
is 2^-52
in the [1, 2)
range and 1.0-prevfloat(1.0) == 2^-53
, but I don't know enough about floating point error to prove it formally. But I can of course make the PR do this.
I can verify the bias.
First the Float16
case: With
rnd(::Type{Float16}, i) = Float16(reinterpret(Float32,
(UInt32(i) << 13) | 0x3f800000) - 1)
rndnew(::Type{Float16}, i) = Float16(reinterpret(Float32,
(UInt32(i) << 13) | 0x3f800000) - 0.9995117f0)
being a deterministic model for rand(Float16)
generation where i
ranges in 0:0x000003ff
, I get
julia> mean(rnd(Float16, i) for i in 0:0x000003ff)
Float16(0.4995)
julia> extrema(rnd.(Float16, 0:0x000003ff))
(Float16(0.0), Float16(0.999))
julia> mean(rndnew(Float16, i) for i in 0:0x000003ff)
Float16(0.5)
julia> extrema(rndnew.(Float16, 0:0x000003ff))
(Float16(0.0004883), Float16(0.9995))
Although floating-point addition/subtraction is not associative in general, it's quite easy to show that (r - 1.0) + eps()/2 == r - (1.0 - eps()/2)
in this case (on systems that adhere to the IEEE standard). It follows directly from these two facts:
(r - 1.0)
, eps()/2
and (1.0 - eps()/2)
can all be expressed exactly, i.e. are not affected by rounding.We do not need to add eps(Float16)/2
, we need to subtract the right number from r
where r
in [1,2)
such that the result is unbiased in [0,1], which is 0.9995117f0
in my code above. This coincides with julia> Float32((1.0 - eps(Float16)/2))
0.9995117f0, but we do not really need to worry about it.
The problem discussed here appears to be of very small significance in practice. See https://github.com/JuliaLang/julia/pull/8958#issuecomment-62491681 and https://github.com/JuliaLang/julia/pull/8958#issuecomment-66564921
Can you please elaborate on this? Which of the Test01 test do you think would pick this up?
My understanding is that to detect an eps()/2
difference empirically would take a lot of samples, too many for practical purposes, and very precise mean calculations at the same time. I don't think that any "random" test would pick this up practically, but that does not mean it is not worth fixing.
Not for Float64
, but for Float16
this can be observed in a fraction of a second
julia> mean((Float64(rand(Float16)) for i in 1:10^7))
0.4994191243164062
julia> mean((Float64(rand(Float16) + eps(Float16)/2) for i in 1:10^7))
0.5000916360351563
Which of the Test01 test do you think would pick this up?
If an issue with a pseudo RNG isn't picked up by the best random number test suite around then I don't think there is an issue. To my understanding, it's the only meaningful way of evaluating pseudo RNGs for statistical applications.
I think that this is a pretty narrow view: in this case the bias is too small to be picked up using a black-box test, but can be demonstrated analytically, and fixed very easily.
Test01 is a very nice suite, but I am not sure that even its authors would claim that what it was designed to pick up everything, and conversely what it doesn't show is by construction not important.
Thinking about it, having P(rand() == 0)
too large (see my table) can break Metropolis-Hastings:
using Distributions
using Makie
using Colors
using LinearAlgebra
logπ(x) = logpdf(Normal(), x)
Point2
function mh!(corr, xs, N = 10_000_000, samples = 1:100:N, σ = 2.)
x = xs[end]
xmax = maximum(x)
xmin = minimum(x)
acc = 0
for i in 1:N
xáµ’ = x
while true
xᵒ = x + σ*randn(Point2)
norm(xáµ’) > 3 && break
end
if rand(Float16) + corr <= exp(logπ(xᵒ[1]) - logπ(x[1]) + logπ(xᵒ[2]) - logπ(x[2]))
x = xáµ’
acc += 1
xmax = max(x[1], x[2], xmax)
xmin = min(x[1], x[2], xmin)
end
if i in samples
push!(xs, x)
end
end
@show (xmin, xmax)
@show acc/N
end
xs = [Point2(3.1, 3.1)]
ys = [Point2(3.1, 3.1)]
mh!(0.0, xs)
mh!(eps(Float16)/2, ys) # with correction
N = length(ys)
ii = randperm(2N)
scatter([xs;ys][ii], markersize=0.2, color=[fill(:blue, N);fill(:red, N)][ii])
Output
(xmin, xmax) = (-11.804179996165205, 10.500512563920687)
acc / N = 0.1778087
(xmin, xmax) = (-5.7318743113302615, 5.766862517806166)
acc / N = 0.176106
Short explanation: A bivariate Normal(0,1) conditioned on having distance larger than 3 to the origin, sampled with symmetric random walk Metropolis-Hastings. Should "never" values as large as 11, because
julia> cdf(Normal(), -11)
1.9106595744986566e-28
That's a nice demo for Float16
. In practice I think one would use Float64
though, where the issue may not be this prominent. But it should be fixed; I will make a PR.
Or -randexp(x) < ll
instead of x < exp(ll)
. Anyway: yes, this is just meant as illustration.
If an issue with a pseudo RNG isn't picked up by the best random number test suite around then I don't think there is an issue.
There's nothing special about the tests in Test01. Testing that the mean of values sampled from [0,1]
is 1/2 is perhaps such a simple test that they didn't think to include it, but clearly it is a valid statistical test and one that we would currently fail given enough samples. So why wouldn't we want to fix it? Or put another way, if the current "best random number test suite" in existence doesn't include a bias test, then we could add a bias test (which we'd currently be failing) and then we would fail the best random number test suite (i.e. the new one that we just added the bias test to).
Would we fail that test suite because of the bias test? It would take a huge number of draws for Float64
. It may be that another test already on it would fail us with fewer draws.
However, I am a fan of this issue and associated PR.
No, detecting such a small bias using black-box statistical techniques is prohibitively expensive. From the CLT,
S√n ∼ N(1/2, 1/12)
where S
is the running mean and 1/12 is the variance of the uniform on [0,1]
. The 99.5% quantile for N(0, 1/12)
(assuming we are aiming for a two-tailed test at a 1% significance level) is around 0.75. So you would need around
julia> abs2(2*0.75/eps())
4.563542160821626e31
draws to detect this reliably.
Even detecting something much more crude is expensive. Eg a bias of 1000*eps()
, would require around 4e25
draws. For 1e9 Float64
s/second (*/ factors of 10, depending on the hardware) which is a reasonable ballpark, this would still take 5e9
years. (Hope I calculated these things right, corrections welcome).
Thats correct, I got numbers of the same order (see my first post, using a slightly simpler approach of just checking after how many samples the bias exceeds the standard error of the mean.)
Most helpful comment
There's nothing special about the tests in Test01. Testing that the mean of values sampled from
[0,1]
is 1/2 is perhaps such a simple test that they didn't think to include it, but clearly it is a valid statistical test and one that we would currently fail given enough samples. So why wouldn't we want to fix it? Or put another way, if the current "best random number test suite" in existence doesn't include a bias test, then we could add a bias test (which we'd currently be failing) and then we would fail the best random number test suite (i.e. the new one that we just added the bias test to).