Glow: Use deterministic PRNG seeds for testing

Created on 23 May 2018  路  5Comments  路  Source: pytorch/glow

Over in #1065, we have an example of a unit test that fails intermittently depending on the pseudo-random numbers delivered by nextRand(). This function uses a static PRNG, so while it is seeded deterministically, reordering tests will give them different pseudo-random numbers.

The PRNG state is also shared between threads, so in a multithreaded environment the pseudo-random numbers will be non-deterministic.

One way of solving this problem would be to make the PRNG state a member of the Module class and change the initXavier() and randomize() methods of Tensor to take an explicit PRNG state parameter.

Most helpful comment

I agree that we should do the thing that leads to least flakiness in automated testing.

I'd also suggest that we should also have an easy way to run our tests in a "stress mode" that uses a true randomly seed for the PRNG. Otherwise it is very easy to create tests that coincidentally work with the fixed seed, but that fail to converge when real randomness is introduced. (I've made that mistake many times).

All 5 comments

Another possible approach could be creating a common test fixture that resets the internal PRNG state (and maybe make the generator thread_local to take care of MT issues).

Non-deterministic compilers

Non-determinism in compilers is a problem that people spend a lot of engineering effort to eliminate. Typical sources are:

  • ASLR makes malloc() return different pointer values in different runs. If pointers are stored in a hash table, such a table will have a non-deterministic iteration order.
  • The current time or other deliberate source of (real) randomness makes it into the compiler's data. Some hash tables deliberately seed their hash function to mitigate denial of service attacks.
  • Global data shared by threads. The kernel scheduler is effectively non-deterministic.

Non-determinism in compilers can create failures that are very hard to debug because they can't be reproduced. Bug reports about intermittent failures, or bugs that only reproduce on the reporter's machine tend to get ignored.

Non-deterministic unit tests

Non-determinism in unit tests is a problem for similar reasons. How do you debug a unit test that fails intermittently? In practice, such tests tend to get ignored because you can just run the test suite again and everything is fine. If an intermittent test starts failing for a valid reason, it still gets ignored because it has been crying wolf for so long.

There is also quasi-non-deterministic tests. (That's not a word). These are tests that are technically deterministic, but their behavior depends on the environment in some way. This could be a test that starts failing when another unrelated unit test is added, and when you try to debug it in isolation with --gtest_filter=..., it changes behavior and starts working again. Quasi-non-deterministic tests are typically caused by global state shared between tests.

Safe by default

I think we should design Glow's libraries to be "safe by default". In this case, "safe" means deterministic and with minimal dependencies on the environment. Our unit tests should get this behavior by default, without doing anything.

Concretely, this means that pseudo-random numbers should not depend on global state.

With a PRNG that uses global state, you would need to reseed it explicitly, i.e. it wouldn't reseed by default. If you forget, you may not notice. A thread-local PRNG state addresses the non-determinism from threading, but not the "quasi-non-determininsm" from interactions with other tests.

I agree that we should do the thing that leads to least flakiness in automated testing.

I'd also suggest that we should also have an easy way to run our tests in a "stress mode" that uses a true randomly seed for the PRNG. Otherwise it is very easy to create tests that coincidentally work with the fixed seed, but that fail to converge when real randomness is introduced. (I've made that mistake many times).

@bertmaher that's a good idea. We could combine with the --gtest_repeat flag to stress test the unit tests.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alannnna picture alannnna  路  3Comments

jackm321 picture jackm321  路  3Comments

opti-mix picture opti-mix  路  4Comments

QiJune picture QiJune  路  5Comments

ayermolo picture ayermolo  路  3Comments