Runtime: Proposal: add a static, thread-static Random.Current instance

Created on 27 Oct 2020  路  22Comments  路  Source: dotnet/runtime

Background and Motivation


The Random class is a very frequently used API from even non experienced developers - it's used anywhere from creating simple programs for beginners ("let's create a guessing game"), to generating random data for unit tests and benchmarks, to all sorts of other applications (eg. shuffling a music playlist). Not everyone is aware of some of the specific details of this class, and the current API surface doesn't help in this aspect, leading a number of developers to use the type incorrectly and stumbling upon seemingly weird bugs ("why are these random numbers all zeros?!").
In particular:

  1. The Random class doesn't have static APIs, which make is less intuitive to use. Most of the time when a developer just needs a one-shot usage of an API of some class, they'd just type the type name (pun not intended) and look at the available static APIs, rather than trying to instantiate a new throwaway instance to use in that single line of code. I believe this is the same rationale that led to https://github.com/dotnet/runtime/issues/17590.
  2. To work around 1., I've seen a number of developers just storing some Random instance into a static field. This will inevitably make the code both more verbose (need to define a field) as well as more error prone, because Random is not thread safe and will just break down if used concurrently. I've seen a few developers discovering this issue, and I'd be lying if I said I didn't make this same exact mistake as well a few years ago 馃槃

The proposal is to add a new static, thread-static Random instance that devs could access. This would simply return some Random instance that they could use without worrying about concurrently or creating an instance. Making the instance thread-static would allow the BCL to reuse the same Random class without the need to actually come up with a new thread-safe implementation that could be shared across threads.

Proposed API

namespace System
{
    public class Random
    {
        public static Random Current { get; }
    }
}

Here I used the Current name from conceptually similar APIs such as SynchronizationContext.Current, which specifically refers to an instance that's directly related to the current thread. Other examples can be found in APIs such as Application.Current and Window.Current, or also a similar naming scheme is also used by APIs such as DispatcherQueue.GetForCurrentThread.

Usage Examples

For one-liners:

// BEFORE
int number = new Random().Next();

// AFTER
int number = Random.Current.Next();

For cases where multiple usages would be needed:

// BEFORE
var numbers = new int[100];
var random = new Random();

foreach (ref var x in numbers.AsSpan())
{
    // Can't use the batched API if we want a given range
    x = random.Next(100);
}

// AFTER
var numbers = new int[100];

foreach (ref var x in numbers.AsSpan())
{
    x = Random.Current.Next(100);
}

Note how in the second example we no longer need to also have the extra verbosity of storing a Random instance in a local.

Alternative Designs


I had two other possible ideas:

  • Expose static versions of the various APIs. That won't work since you can't overload just between static/instance methods in C#, and just having different names for the static versions would be extremely awkward, so no.
  • Expose a thread-safe Shared instance (脿 la ArrayPool<T>.Shared). That would make the implementation unnecessarily more complex, as a whole new Random type would have to be written.

Risks

Technically, devs could still just get Random.Current and store it in a field, then use it concurrently. But I mean, they could already do (and in fact, they do) that today anyway. Having a static field would effectively nudge them towards not doing this at all anymore, so I think the API addition would actually reduce the overall risks related to the usage of this type.

api-suggestion area-System.Runtime untriaged

Most helpful comment

If we're discussing updating Random, I think it may be worth looking into changing the API to include some unmanaged generic implementations, as so:

class Random
{
    public T Next<T>() where T : unmanaged;
    public T Next<T>(T max) where T : unmanaged;
    public T Next<T>(T min, T max) where T : unmanaged;
    public void Next<T>(Span<T> fill) where T : unmanaged;
}

Where Next<T>() simply returns a random T, Next<T>(Span<T> fill) fills the provided span with random Ts, and so on. This API would simplify more complex use-cases of Random, while still being easy-to-use in the common cases. It has the added benefit of unifying the surface of the API, so we can get rid of specialized functions such as NextDouble() or NextBytes().

There is a concern with use-cases attempting to randomize an arbitrary unmanaged type (i.e. not an int, uint, byte, etc). For this, it might be best to go the way of Vector<T> with an allow-list for whatever unmanaged types are justifiable. Additionally, generic specialization could be used for types like float, where filling the type with random bytes could result in NaN or infinity (which doesn't seem in the spirit of randomizing floating-point).

Edit: I've whipped up an example implementation here (missing the maxValue & minValue, maxValue overloads as of writing this edit).

All 22 comments

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley
See info in area-owners.md if you want to be subscribed.

If we're discussing updating Random, I think it may be worth looking into changing the API to include some unmanaged generic implementations, as so:

class Random
{
    public T Next<T>() where T : unmanaged;
    public T Next<T>(T max) where T : unmanaged;
    public T Next<T>(T min, T max) where T : unmanaged;
    public void Next<T>(Span<T> fill) where T : unmanaged;
}

Where Next<T>() simply returns a random T, Next<T>(Span<T> fill) fills the provided span with random Ts, and so on. This API would simplify more complex use-cases of Random, while still being easy-to-use in the common cases. It has the added benefit of unifying the surface of the API, so we can get rid of specialized functions such as NextDouble() or NextBytes().

There is a concern with use-cases attempting to randomize an arbitrary unmanaged type (i.e. not an int, uint, byte, etc). For this, it might be best to go the way of Vector<T> with an allow-list for whatever unmanaged types are justifiable. Additionally, generic specialization could be used for types like float, where filling the type with random bytes could result in NaN or infinity (which doesn't seem in the spirit of randomizing floating-point).

Edit: I've whipped up an example implementation here (missing the maxValue & minValue, maxValue overloads as of writing this edit).

This addition would require users to always evaluate the Random.Current property on every random call. However, nothing prevents them from storing that instance on a field, which can happen by accident e.g. by capturing the instance in a closure. This could result in unexpected concurrent access, also impacting unrelated code that happens to depend on Random.Current.

However, nothing prevents them from storing that instance on a field, which can happen by accident

Wouldn't the same be true for APIs such as SynchronizationContext.Current? As in, other thread-static APIs that could still technically be used incorrectly. Or is the main issue just about the fact that a dev using this wrong on one thread might cause other code running in the same thread in the future to potentially break? In that case, is the only solution to go ahead and write a fully thread-safe Random implementation to expose through a Shared instance, or would there be other alternatives?

My rationale here was mostly just to have _some_ static helper to create random numbers, for convenience 馃檪

My rationale here was mostly just to have _some_ static helper to create random numbers, for convenience

FWIW, if all you need is _some_ static helper for random numbers, some exist:
https://github.com/dotnet/runtime/blob/ae9da6432360728c77ab10b1ef0c414d1a9ee287/src/libraries/System.Security.Cryptography.Algorithms/ref/System.Security.Cryptography.Algorithms.cs#L548-L549

I feel that the RNG methods are way too heavy-weight for the use cases mentioned (they hang off of the class that's in System.Security.Cryptography and is meant for generating "secure" randomness, which has its overhead).

The proposal essentially tries to address the extremely common scenario for using Random as a static readonly singleton and make it both easier to use in general and easier to use correctly. However, it does seem like the solution with a thread-static Random instance is, perhaps, not the best one.

I think straight static methods, as proposed by @avirule, would work better.

Average java's ThreadLocalRandom.current() enjoyer itt: I've recently started building my own small things with c# after several years working with java and it appears I completely forgot that stdlib lacking such a vital feature. Took me some time refreshing my memory with the help of the docs to get couple of 42's in a threadsafe manner.

A note about methods such as this public void Next<T>(Span<T> fill) where T : unmanaged;

I would be cautious about this kind of signature, since a user may write something like:

var key = new byte[32];
Random.Current.Next(key);

And would expect to get cryptographic level randomness here.

And would expect to get cryptographic level randomness here.

Note: that is a problem with Random.NextBytes(Span<byte> buffer) already.

Next<T>(Span<T> fill)

This API would be unsafe (not type-safe) for the same reason MemoryMarshal.AsBytes<T>(Span<T>) is unsafe. See the table in https://github.com/dotnet/runtime/issues/41418 for more information. To avoid the type safety issues you'd need to provide concrete implementations for all the types you care about: byte, int, double, etc.

Wouldn't the same be true for APIs such as SynchronizationContext.Current?

SynchronizationContext.Current is _intended_ to be captured and stored into a field, then called from other random threads. SynchronizationContext instances are required to be thread-safe to support this scenario. Random instances are not guaranteed to be thread-safe.

If we had a Random.Current, perhaps it should accompany an analyzer rule that discourages storing the value into a field? Though honestly just providing static methods as suggested in another comment might end up being easier for people to reason about. Neither Random.Current nor Random.<static-method> would allow for predictability, hence code which uses them isn't testable in either case.

To avoid the type safety issues you'd need to provide concrete implementations for all the types you care about: byte, int, double, etc.

If I'm understanding correctly, I believe the example implementation I provided (here) does just that. As far as I'm aware, this sort of generic specialization isn't uncommon as an optimization path.

As for whether it's sensible to have an unsafe/not-type-safe method exposed as an API for Random is another matter. To me it's not much of an issue, as the most common use cases will still end up being .Next<int>(); but anyone wanting to get more complex should be able to easily reason about what makes sense to use as T in Next<T>().

To @avirule's point, while the Next<T> idea may have started as a way to fill structs with random bytes, I personally view it as a workaround for the fact that we cannot define static methods with the same name as instance methods, and we kind of want the static API to be short and sweet. It also has an additional advantage of being extensible in the future without necessarily requiring to go through the full process of API review (one could imagine adding support for something like TimeSpan in addition to the primitive types via a simple PR).

Generic specialization is normally used to enlighten a code path around a particular _T_, but we almost never introduce generic APIs which are allowed to operate only over a subset of _T_. For example, calling Random.Next<DateTime>() would be nonsensical. And would... throw?

Vector<T> and the Unsafe-equivalent classes are the exception to this rule since they have very specialized use cases.

Yep, that's very fair. I mean, to me personally it does not matter how would the static methods be called, NextInt32 or Next<int>, whichever people here think would work better for the ecosystem. Generics have the downside of being an "advanced" feature, and I would expect these static helpers to be useful for beginners.

That said, are we in agreement on value of these methods or do you think that the current state of Random is fine as-is in in terms of ease of use?

Honestly the majority of calls I see follow this pattern:

int value = new Random().Next();

.NET devs are very familiar with object construction, perhaps to the point of being a bit _too_ eager to instantiate things. I don't really come across people storing these instances (or instances of anything, really) into a static field except for when they're trying to make a perf optimization. That might suggest that we need an analyzer for folks who try to stick Random instances into a static field, but that analyzer might incur significant false positives. (If you access it under lock, for instance, it's fine.)

What's the scenario here - power users are looking for more efficient random number generation? If that's the case then we don't really have to worry about catering to the average dev; we can cater toward users who are already looking for a higher-throughout or lower-allocation API.

I agree that exposing a generic API could potentially lead to some confusion regarding which types are supported, but I don't see how that would be an issue if that was all properly documented and if the implementation supported all numeric types out of the box. Especially if we're talking about begineer devs, wouldn't that make your average "get some random numbers" code easier?

var numbers = new int[100];

// BEFORE
var random = new Random();

for (int i = 0; i < numbers.Length; i++)
{
    numbers[i] = random.Next(100);
}

// AFTER
Random.NextValues(numbers);

This would literally shrink the code from multiple lines including a whole loop to just a single one liner. That would automatically cast the int[] array to a Span<int> and run he optimized code too. The same would be useful in all sorts of other scenarios as well, eg. imagine someone working on a neural network and needing to populate a tensor with random float values.

Even just for getting single values, right now we only have the (instance) APIs to get either an int or a double. Other than the bad efficiency in some cases (eg. why do I need to go through a double if I only need a float?), if we're still talking about individual devs wouldn't it be easier for them to just specify the type they need in angle bracket instead of forcing them to do additional steps on their end to convert the random int/double to the type they need? Eg. if they need a short, or a decimal, or something else. Especially because some types also require some tweaking to still guarantee a proper/valid distribution, so having all this be automatically handled would remove one more chance of writing error prone code as well.

I personally would like very much to have two static, generic APIs (Next<T>() and NextValues<T>(Span<T>)), with a list of supported types (including all numeric types) in the summary, and just throwing otherwise. I mean, it's full of APIs that throw with invalid parameters, I don't really see why having one that throws with eg. a negative index is fine, but one that does for an invalid generic argument is not. You're still just throwing in case of improper/unsupported usage of said API in both cases 馃

int value = new Random().Next();

Huh, that is the first time I see that kind of pattern... I mean, for me, the singletonian nature of the class always suggested that the proper usage is to store it in a static field. I guess my intuition was not to be generalized (as a sidenote, I am working on just the tool that would be able to answer such question).

What's the scenario here - power users are looking for more efficient random number generation?

No, I don't think so. For me the scenario is easy & idiomatic generation of ephemeral random data as well as the ability to evolve Random beyond the place it is in today: since the APIs wouldn't be dealing with seeds directly, the underlying algorithm could be tweaked for efficiency and quality without introducing breaks. Random today is widely used and known to be useful, but suffers from quality and efficiency problems, as well as being a bit overkill for ephemeral randomness generation. The statics proposal attempts to address all these issues. The problem with it would of course be the fact that seed-based usage would effectively become a second-class citizen without some more APIs being introduced.

Generic specialization is normally used to enlighten a code path around a particular _T_, but we almost never introduce generic APIs which are allowed to operate only over a subset of _T_. For example, calling Random.Next<DateTime>() would be nonsensical. And would... throw?

It could be nonsensical, or a generic specialization could be introduced for it. My example only includes T of numeric types, but DateTime would be a sensible candidate for a specialization鈥攁nd generating a random DateTime with this method, rather than the few others I've looked at, would be far less verbose, and without losing explicitness.

As for unsupported T, if you're using Random.Next<T>() on arbitrary T, my example actually covers this. Treat the T as a Span<byte> with sizeof(T), use .Next<T>(Span<T> fill), then a valid T can be read from the Span<byte>. While isn't a widely-applicable use-case, it is new functionality, without hindering existing use-cases.

I think the most important thing to consider is whether this new API would be functionally superior or inferior to what Random enables currently. At the moment, it's a very simple API鈥攁nd to do anything truly complex with Random requires casting and/or tinkering with its return values. While there is something to be said for the simplicity it offers, I would argue that requiring workarounds to attain a desired result (of any complexity beyond an int or double) works against that principle.

As for unsupported T, if you're using Random.Next<T>() on arbitrary T, my example actually covers this.

Yes, and I don't think this is a good idea. Please see my response at https://github.com/dotnet/runtime/issues/43887#issuecomment-718345995 and the linked issue for more info.

I have defined a thread-local random instance in a utility library of mine. It is very useful.

For the BCL, though, I think the risk for misuse is rather large. I have had bugs around this myself. I tried being clever caching the thread-local instance and inadvertently shared it across threads. Data races are hard enough to find, but when random data is generated by design it is a nasty bug.

A safe alternative would be:

static class StaticRandom {
 public int NextInt32(...);
 public double NextDouble();
 ...
}

This would be what people want 99% of the time. The other 1% of cases are where the overhead of using a thread-local instance is too large, or a repeatable sequence must be generated.

_If_ we wanted to add a Random.Current that used a thread-local, we could do so in a way that's safe even if the instance is transferred between threads, e.g. https://github.com/stephentoub/runtime/commit/62eb52fea5f377b6b26529b227125be7d548f7c9.

That's a nice technique, @stephentoub.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omajid picture omajid  路  3Comments

EgorBo picture EgorBo  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

jkotas picture jkotas  路  3Comments

btecu picture btecu  路  3Comments