Godot: RandomNumberGenerator::randi_range is biased.

Created on 15 Mar 2019  路  18Comments  路  Source: godotengine/godot

Godot version: 775e74e (current master)

OS/device including version: N/A

Issue description: RandomNumberGenerator::randi_range generates integers by the naive modulo method: ret % (to - from + 1) + from. This method is biased: when the range of the underlying RNG cannot be perfectly divided by to - from + 1, some samples will be underrepresented. When to - from + 1 is larger than half of the RNG's range, some results can occur up to twice as often than others.

See also: a detailed comparison of methods to generate ranged random number, by the author of PCG herself: http://www.pcg-random.org/posts/bounded-rands.html

Steps to reproduce:

  • Suppose randbase.rand() has the uniform range of 0, 1, 2 (three numbers).
  • Suppose we call randi_range(0, 1).
  • We can see that should the base RNG be uniform, 0 will be output twice as often as 1.

Minimal reproduction project: N/A

bug discussion core

Most helpful comment

A RandomNumberGenerator::set_mode can be problematic for many reasons:

  • States between different RNGs generally aren't convertible. There is no sane way to dictate an initial state for a RNG that have been used previously, on which set_mode is called later.
  • If a set_mode is added, then all uses of RNGs from now on have to begin with a set_mode. This have to be documented, taught, and kept in mind by users at all times. Again, too much baggage to carry around. Deprecation is much better at this, because users will be notified if they try to use the old class in new code.
  • The API of RandomNumberGenerator itself is not properly designed, and is not suitable for the general case. In general, seeds can't be deduced from a RNG's internal state, because the seeding function may be "one-way" and infeasible to inverse. Yet, the way seed exists as a property in RandomNumberGenerator gives the illusion that one can say rng.seed = rng.seed and expect the internal state to not change, which is not, and can not be the case, if the RNG is properly implemented. You can remove get_seed, but then it's a breaking change, and we're back to square one.
  • On the C++ side, that would require all methods of randbase to be virtual and accessed through a pointer, because different RNGs have different sizes, and you don't want to put them into an union because you 1. don't want to allocate space for the biggest one at all times, and 2. don't want an exceptional opportunity to shoot yourself in the foot.

All 18 comments

Hmm, this might explain why I'm getting the same generated world from apparently different seed values.

Hmm, this might explain why I'm getting the same generated world from apparently different seed values.

I don't think this can completely explain your situation unless you're generating very large integers. Maybe you're using low quality seeds? PRNGs often give similar output streams given similar seeds -- they don't have the avalanche effect characteristic to hashes.

@Chaosus Actually I don't think Godot's PCG is properly seeded. This should probably be a new issue. This is from random_pcg.h:

_FORCE_INLINE_ void seed(uint64_t p_seed) {
    current_seed = p_seed;
    pcg.state = p_seed;
    pcg32_random_r(&pcg); // Force changing internal state to avoid initial 0
}

Actual srandom function from PCG's C minimal implementation: https://github.com/imneme/pcg-c-basic/blob/master/pcg_basic.c

// Copyright 2014 Melissa O'Neill <[email protected]>, Licensed under the Apache License, Version 2.0
void pcg32_srandom_r(pcg32_random_t* rng, uint64_t initstate, uint64_t initseq)
{
    rng->state = 0U;
    rng->inc = (initseq << 1u) | 1u;
    pcg32_random_r(rng);
    rng->state += initstate;
    pcg32_random_r(rng);
}

Quite different. I don't believe they do the same thing...

Yeah, I can look into this..

This should probably be a new issue.

Could you create ?

See #27166.

Maybe you're using low quality seeds?

How low quality seed is differentiated from high quality one? Currently I set the seed to be in the range of 0..1000 as an integer as a simple way to regenerate a world from editor and produce deterministic results that way due to testing purposes.

Currently I set the seed to be in the range of 0..1000 as a simple integer

That's pretty low quality. A seed is "high quality" if it contains a lot of randomness to begin with, i.e. something that looks like 0xf1492ace35fef0c34575f1d791131a5f. If you want easy to remember numbers you can use a hash function to transform it before giving it to the RNG, using the hash's avalanche effect.

The hash does not need to be a secure one. Something like xxHash is likely enough.

Interestingly, does this affect randf_range too ?

I don't think this particular issue affects randf_range, as it does not involve modulus. However now that I look into it, it is possible that the float version (also the default, according to math_defs.h) is buggy:

float RandomPCG::random(float p_from, float p_to) {
    unsigned int r = rand();
    float ret = (float)r / (float)RANDOM_MAX; // Here, a uint32 casted to float before division
    return (ret) * (p_to - p_from) + p_from;
}

A float is a 32-bit floating point number. I'm sure that already sound alarming: float can only represent integers up to 2^24 exactly. Integers greater than that will be rounded to a multiple of 2, 4, 8, and so on.

I believe you'll want to either stick to the double version, or truncate the uint down to 24 bits before dividing by 2^24 instead. I made a PR that does the latter: #27193


Also, a certain Taylor R. Campbell claims that division by max is not uniform in the floating point domain, however I haven't attempted to verify their claim nor their proposed algorithm. I consider the supposed bias small enough to be ignored for gamedev purposes, but you can check it out if you want. Be sure you do your own research though: https://mumble.net/~campbell/2014/04/28/uniform-random-float

Currently I set the seed to be in the range of 0..1000 as a simple integer

That's pretty low quality. A seed is "high quality" if it contains a lot of randomness to begin with, i.e. something that looks like 0xf1492ace35fef0c34575f1d791131a5f. If you want easy to remember numbers you can use a hash function to transform it before giving it to the RNG, using the hash's avalanche effect.

The hash does not need to be a secure one. Something like xxHash is likely enough.

I remember encountering this issue myself and having quite a hard time finding out about this nuance of setting the seed. Adding a line or two about this to the docs would definitely be helpful to other clueless devs like me.

@toasteater Its still have bias ?

@Chaosus randi_range's implementation didn't change, so yes, it still does have bias. The merged PRs that show up here are not fixes to this issue.

I can make a PR to fix this, but debiasing the function will by definition break compatibility. I'm not sure whether that's welcome as is, or do we need to make changes to the API to address different needs, e.g. keeping a "frozen" PRNG around as I suggested the other time.

@toasteater what about adding an optional boolean parameter - like "debiased" (which will be off by default) and then implement logic around that? - the compatibility will be preserved in that case.

@Chaosus I tend to think that it's too much baggage to carry around, if from now on everyone is required to write randi_range(low, high, true) just because some people wrote code that relied on stability between versions which was never promised in the first place. If we needed to do that, we might as well just declare RandomNumberGeneratordeprecated, and introduce a new, properly written RNG class that should be used from now on.

Or, we can acknowledge that the RandomNumberGenerator class had never been stable in the first place, and see improvements to it as non-breaking changes.

Why create different classes? Maybe the concept around making different random generator modes helps. PCK, PCK_DEBIASED etc. And user can set it up using random.set_mode()

A RandomNumberGenerator::set_mode can be problematic for many reasons:

  • States between different RNGs generally aren't convertible. There is no sane way to dictate an initial state for a RNG that have been used previously, on which set_mode is called later.
  • If a set_mode is added, then all uses of RNGs from now on have to begin with a set_mode. This have to be documented, taught, and kept in mind by users at all times. Again, too much baggage to carry around. Deprecation is much better at this, because users will be notified if they try to use the old class in new code.
  • The API of RandomNumberGenerator itself is not properly designed, and is not suitable for the general case. In general, seeds can't be deduced from a RNG's internal state, because the seeding function may be "one-way" and infeasible to inverse. Yet, the way seed exists as a property in RandomNumberGenerator gives the illusion that one can say rng.seed = rng.seed and expect the internal state to not change, which is not, and can not be the case, if the RNG is properly implemented. You can remove get_seed, but then it's a breaking change, and we're back to square one.
  • On the C++ side, that would require all methods of randbase to be virtual and accessed through a pointer, because different RNGs have different sizes, and you don't want to put them into an union because you 1. don't want to allocate space for the biggest one at all times, and 2. don't want an exceptional opportunity to shoot yourself in the foot.

Now that Godot 4.0 development has started, we can break backwards compatibility. Feel free to open a PR to make the RNG unbiased :slightly_smiling_face:

BTW, I noticed that the PCG32 basic implementation that we use actually has a pcg32_boundedrand method that seems to handle this use case:

https://github.com/imneme/pcg-c-basic/blob/master/pcg_basic.c

We only included the very minimal code initially but we could sync with upstream pcg-basic.c and use pcg32_boundedrand for randi_range.

Was this page helpful?
0 / 5 - 0 ratings