Runtime: Test using Random() should be seeded

Created on 30 Mar 2017 · 21Comments · Source: dotnet/runtime

There are many tests using unseeded Random number generators. When failures happen it is difficult to debug/repro.

Here are examples which are currently causing intermittent failures in arm64...

tests/src/JIT/Methodical/MDArray/basics/classarr.cs
tests/src/JIT/Methodical/MDArray/basics/doublearr.cs
tests/src/JIT/Methodical/MDArray/basics/jaggedarr.cs
tests/src/JIT/Methodical/MDArray/basics/stringarr.cs
tests/src/JIT/Methodical/MDArray/basics/structarr.cs

Many tests are using seeded random number generators using code in tests/src/Common/CoreCLRTestLibrary/TestFramework.cs.

It would be nice to use simialr code in place of all tests using Random() without a seed.
category:correctness
theme:testing
skill-level:beginner
cost:small

Hackathon JitUntriaged area-CodeGen-coreclr easy test enhancement up-for-grabs

Source

sdmaclea

Most helpful comment

Inner loop CI is mainly about preventing regressions and not about finding bugs.

None of the commonly used CI legs should rely on "unfixed seed" random behavior. Failures that happen unexpectedly and are unrelated to PR changes just confuse people and waste their time. For reference see all the fun we have with GC Stress.

More exhaustive testing based on randomness to expose bugs belongs in outerloop somewhere.

AndyAyersMS on 14 Jun 2018

👍4

All 21 comments

@Jkotas Please mention correct people

sdmaclea on 30 Mar 2017

cc @dotnet/jit-contrib

jkotas on 30 Mar 2017

        if (Environment.GetEnvironmentVariable("CORECLR_SEED") != null)
        {
            try
            {
                seed = int.Parse(Environment.GetEnvironmentVariable("CORECLR_SEED"));
            }
            catch (FormatException) { seed = rand.Next(); }
        }
        else
        {
            seed = rand.Next();
        }

You suggest to use this code on all tests which use Random?

stefannikolei on 9 Apr 2017

@reaction1989 that code plus a fixed default seed and the code to print the seed into the trace. Fixed default seed may want to be unique per test.

sdmaclea on 9 Apr 2017

i would tackle this. If someone wanna assign me

stefannikolei on 10 Apr 2017

@reaction1989: I don't think we can formally assign you since you're not affiliated with any of the required GitHub organizations, but feel free to take this on in any case. Thanks!

pgavlin on 10 Apr 2017

@reaction1989 still interested?

danmosemsft on 2 Jun 2018

@sdmaclea, @reaction1989. I don't believe a fixed seed is the right way to go. That causes you to lose the benefits of using RNG (which is testing arbitrary inputs behave correctly).

The issue here is that:

There is no way to tell the test the seed to use
The tests do not tell you the seed that was used

If we had a way of doing both, then it should resolve all the issues while keeping the benefits.

tannergooding on 14 Jun 2018

The typical usage in other tests is to use a fixed seed (i.e. the date) for CI and print the result. The seed may be from the command line. It is certainly acceptable to randomly generate a seed, print it, then use it as the seed. So that the failures can be reproduce. However, that means the logs for all failures must be available. I have not always been able to find the logs.

sdmaclea on 14 Jun 2018

The typical usage in other tests is to use a fixed seed (i.e. the date)

This is probably the disconnect. I wouldn't consider "the date" a fixed seed, as it could vary from run to run 😄 (I would probably say the seed comes from a fixed source instead)

However, that means the logs for all failures must be available. I have not always been able to find the logs.

Right, I think this is the real problem.

Because the logs don't tell you the seed used, there is currently no chance for reproducability.

tannergooding on 14 Jun 2018

In the date cases, the date was the date the seed was chosen, not the current date.

sdmaclea on 14 Jun 2018

For instance

tests/src/Common/CoreCLRTestLibrary/TestFramework.cs:            seed = 20010415;

sdmaclea on 14 Jun 2018

@sdmaclea, I don't see how that is useful. It means that you don't get any of the benefits that using random inputs would normally provide (one of which is testing values that the dev didn't consider "special" or needing explicit tests).

At that point, why not just pick random inputs and hard code them.

tannergooding on 14 Jun 2018

Take a look at the file for the whole context. It allows environment variable to override the seed. It prints the seed used in the logs.... CI is not for exhaustive checking typically, it is generally for general coverage.

If you want more exhaustive testing you can run the test with more seeds.... But that is not needed in general CI.

sdmaclea on 14 Jun 2018

If you want more exhaustive testing you can run the test with more seeds.... But that is not needed in general CI.

The point is that it is no more work (the same number of tests get run), but it provides "greater" coverage (as it tests more inputs over the course of multiple runs).

The fact that there are tests without fixed seeds that are failing is "proof" that there is something wrong somewhere (some random input is causing a failure). If you change that to be a fixed seed, the inputs will never change and the tests will no longer "randomly" fail and you will miss out on the fact that something is wrong (either in the test code or in source code).

That is why using a fixed source seed (such as DateTime.Now) combined with printing said seed to the output (for reproducability) is a better solution.

tannergooding on 14 Jun 2018

IMO, having a seed that varies run-to-run is only worthwhile if it is justified by the testing context, as it adds to the burden of producing a repro. For the HW intrinsics tests, as has been pointed out, we are less interested in getting correctness coverage, and more interested in ensuring the correct instruction has been generated.

At that point, why not just pick random inputs and hard code them.

That makes the test source more fragile, harder to understand, and less adaptable should one want to use an alternate seed.

I could live with using a pseudo-random seed such as the date, and then printing the seed used in the log file, but I am not in favor of that. I would like the general property that the behavior I get when I run my tests locally is the same as the behavior when run in the CI system. Adding a test option that would, perhaps, set an environment variable to use a pseudo-random seed would be an acceptable alternative, and that could be run as a stress test mode, but not in the standard CI tests.

@dotnet/jit-contrib - other thoughts?

CarolEidt on 14 Jun 2018

Inner loop CI is mainly about preventing regressions and not about finding bugs.

More exhaustive testing based on randomness to expose bugs belongs in outerloop somewhere.

AndyAyersMS on 14 Jun 2018

👍4

Totally agree with Andy. Test runs should be 100% repeatable, with no work on the dev side.

We definitely could add more "outerloop" testing with more randomization. E.g., for our existing COMPlus_JitStress variable, we only use the values "1" and "2". We could use other values as well, since other values cause us to run different stress mixes. It's hard to see many tests benefiting from varying the random seed, but perhaps some would.

BruceForstall on 14 Jun 2018

Failures that happen unexpectedly and are unrelated to PR changes just confuse people and waste their time.

I know I'm not going to win this battle 😄, but it is my opinion that finding bugs in the code (mostly for production, but also partially for test code) far outweighs any confusion devs might see in a given CI leg from an unrelated failure. The product, as a whole, would be considerably better if devs had the habit of doing basic investigation on a failure and logging a bug if it was unrelated to their changes (most of the time, this is fairly trivial, as the tests that do fail are clearly out of scope).

If there are tests that are flaky, then there is a bug somewhere (production or test) and it needs to be investigated/fixed. Having it "hidden" because we missed an edge case in testing helps no-one in the long run.

tannergooding on 15 Jun 2018

👍3

I've been looking for some easy code changes I can do to contribute - this one seems to fit the bill...except I'm not clear If a decision has been reached on the best way to fix this?

Output the seed used to the trace log
Use a fixed seed
Use a fixed seed if injected in from the commandline (if not injected use a hardcoded fixed seed...or continue to use a random one?)
Investigate why some tests fail for certain random numbers (how frequent are the intermittent failures -> would some simple brute force running of the tests a lot, with some logging be a practical way to find the problems?)

...or another solution?