As a follow-on to https://github.com/chapel-lang/chapel/issues/9730 and a recent test framework deep dive, here is a strawperson design proposal for Chapel's test framework.
Proposal: Go-inspired test interface, Cargo-inspired test launcher
Test data type, which allows one to set test metadata and provides test helper methodsproc f(t: Test)Why argument-based test annotations?
test proc foo()Why have an external test launcher?
Source code:
use UnitTest;
// Some library function to be tested
proc inc(x) {
return x + 1;
}
// test defined by argument
proc testInc(test: Test) {
test.assert(inc(3) == 5);
}
Running through test launcher:
# Compiles and runs with the test flags that run UnitTest.main()
mason test testProgram.chpl
=========================== test session starts ============================
collected 1 test
================================= FAILURES =================================
_______________________________ testProgram ________________________________
proc testInc(test: Test) {
> test.assert(inc(3) == 5);
E test.assert(4 == 5);
testProgram.chpl:<line number> AssertionError
========================= 1 failed in 0.12 seconds =========================
Note: This output is just a mockup inspired by pytest output and is subject to
change. The assertion introspection would require some fancy compiler features,
and is not necessary.
Compiling and running the test program with test flags enabled outside of mason
is a topic outside of the scope of this proposal.
For the purpose of demonstrating the division between library and compiler implementation, below is _sketch_ of what the UnitTest.main may look like:
module UnitTest {
proc main() {
use Reflection;
// Assuming 1 global test suite for now
// Per-module or per-class is possible too
var testSuite = new TestSuite();
// Go find all the tests with `Test` as an argument
for t in __primitive('gatherTests') { // <--- compiler features needed
testSuite.add(t);
}
for test in testSuite {
try {
// Create a test object per test
var testObject = new Test();
// Could filter some tests out here
// option 1: if test is a FCF:
test(testObject);
// option 2: if test is a string:
Reflection.call(test, testObject); // <--- compiler features needed
// other options exist as well
}
// A variety of catch statements will handle errors thrown
catch e: TestSkipped {
// Print info on test skipped
}
catch e: TestDependencyNotMet {
// Pop test out of array and append to end
}
catch ... { }
}
class Test {
proc assert() { } // <--- compiler features wanted (but not needed)
proc skipIf() { }
proc dependsOn() { }
proc fail() { }
// ...
// Could store pass/fail, test name, performance timings, dependency informations, etc.
}
class TestSuite() {
// Stores array of functions to be run
}
Notable compiler features required for this are:
Reflection.call()__primitive('gatherTests')UnitTest implementationf(test: Test) throws { }Reflection module that finds all functions of a specific signatureAssert module asserts or UnitTest-specific asserts.e.g.
// Assertion introspection example
assert(x + 2 > 3);
// prints:
// assertion failed:
// assert(x + 2 > 3)
// assert(2 > 3)
Some other considerations:
Test is a class so that users could potentially extend it.gatherTests could work beyond this example.use UnitTest to access the Test type in their test signatures.The test launcher is an entirely separate program that will run the test program and parse
stdout/stderr for information that it can print to user and use as feedback for
the test loop. This proposal suggests we implement this test launcher within mason test.
The information parsed would include:
use UnitTest;
proc s1(test: Test) throws {
}
proc s2(test: Test) throws {
test.skipIf(here.maxTaskPar < 2); // throws "TestSkipped"
test.dependsOn('s1'); // throws "TestDependencyNotMet"
}
# Compiles and runs
mason test testProgram.chpl
=========================== test session starts ============================
collected 2 tests
========================= 2 passed in 0.09 seconds =========================
# If the skipIf was true:
mason test testProgram.chpl
=========================== test session starts ============================
collected 2 tests
testProgram.chpl s2(test: Test) SKIPPED
========================= 1 passed, 1 skipped in 0.09 seconds ==============
Some corner cases to consider:
use UnitTest;
// This test requires 8 locales
proc s1(test: Test) throws {
test.numLocales(8);
}
// This test can run with 2-4 locales
proc s2(test: Test) throws {
test.maxLocales(4);
test.minLocales(2);
}
// This test can run with 8 or 16 locales
proc s3(test: Test) throws {
test.numLocales(8);
test.numLocales(16);
}
// This test require 8 and 16 locales, so we wrap it with s4 and s5
proc t(test: Test) throws {
test.ignore(); // UnitTest.main does not consider this as a test
}
proc s4(test: Test) throws {
test.numLocales(8);
t(test);
}
proc s5(test: Test) throws {
test.numLocales(16);
t(test);
}
When the test launcher encounters an TestIncorrectNumLocales error, it will queue
up another run with the correct number of locales (accounting for tests that
have already been run).
Specifying multiple numLocales could be an error, or treated as a logical
"or", i.e. run with either configurations, like the example above suggests.
Some features not demonstrated in this proposal:
Some random reactions (a few of which reflect comments from others that resonated with me):
[Disclaimer: I've never worked in a language/framework/team that's made use of unit testing, so am pretty ignorant here.]
Due to the importance of testing and the compiler's involvement, I could imagine this becoming a language feature over time if/when we gained confidence in our direction, but I think that starting with a no-language-change approach is prudent in the meantime.
I worry a little bit about the approach for specifying numLocales as (to my understanding) it feels a little passive/reactive/roundabout ("You called me and BTW I need this many locales, so maybe you should run me again with more if you didn't have that many") rather than proactive ("As this test's / package's authors, I think these tests need to be run with this many locales, so please do that.")
Similarly, the dependence-between-tests approach feels a little weak to me (e.g., the use of a string to name the test routine when we could just refer to the routine directly; a question in my mind about whether the one test should just call the other if there's a dependence between them rather than creating a constraint graph for the test harness to solve.
It seems to me that current FCF support could plausibly be sufficient for what we need here.
While I think making sure this is well-integrated with mason is a "must", my intuition is that there ought to be a way to invoke the testing outside of mason as well (?). Put another way, it's clear to me that mason ought to have a path to running unit tests in code; it's not clear to me why mason would be necessary to run unit tests in code.
I think the biggest thing that worried me when Ben was walking us through this was the statement that the unit test framework wouldn't be able to use the user's modules, so would need some special way to get access to their namespaces... This set off alarms, though I understand the challenge (my head goes to "Maybe the UnitTest.chpl module could / should be dynamically generated or modified by the compiler / mason when building unit tests?"
I think the biggest thing that worried me when Ben was walking us through this was the statement that the unit test framework wouldn't be able to use the user's modules, so would need some special way to get access to their namespaces... This set off alarms, though I understand the challenge (my head goes to "Maybe the UnitTest.chpl module could / should be dynamically generated or modified by the compiler / mason when building unit tests?"
Yep, it'd be possible for example for buildDefaultFunctions (or some other phase in the compiler) to add to module init code some code to populate a global list of test FCFs with any test function in that module. I don't think this changes very much what the test launcher / test library needs though (and as a result havn't considered it a key part of the above proposal for general direction).
I.e. the compiler could generate per-module code like this:
use UnitTest;
// User-written tests
proc t1(test: Test) { ... }
proc t2(test: Test) { ... }
// Generated code at the end of module-init
UnitTest.registerTest(t1);
UnitTest.registerTest(t2);
UnitTest.registerTest would just create a global list of test FCFs that could be run later.
If we are to have a collection of testable things, as testSuite above, it is best built by the compiler during compilation. Think of the Locales array: Chapel code sees is as an already-populated array. Similarly, the compiler can emit code to build it, then the testing framework can access testSuite with all elements already there. Its size can probably be a compile-time constant.
In the meeting with Krishna today, I mentioned that I thought it would be helpful to have a flag that users could use when running the launcher to say not to try tests again with a different number of locales than what they sent to the program.
Normally, when a test that specifies 4 locales gets attempted with 8 due to the launcher's argument, we would give an error about the incorrect number of locales, and then run the test again with the correct number. With this flag that I am suggesting, the user would be able to say "no, just tell me if the test can't run with that number of locales and don't do anything else".
I have implemented the depends on functionality and wanted to know what should be the done when a test depends on other test and have different requirement of numLocales?
proc s1(test: Test) throws {
test.addNumLocales(6);
}
proc s2(test: Test) throws {
test.dependsOn(s1);
test.addNumLocales(16);
}
@bradcray Thanks for the comments. I tried implementing dependsOn method and was successful in doing so. What I think directly calling one test from another can lead to some issues. For example how to handle a case when we these two tests have mismatched numLocales which I mentioned in my previous comment. This can lead to a infinite loop and I was not able to find a solution for this. Waiting for feedback. Thanks.
In the future, we may consider adding test metadata such as dependencies to entire test suites, which will be in the form of modules or classes/records. This approach will have the advantage of being defined separately from the test functions themselves such that the test launcher can process the dependencies, number of locales, etc. before ever running a test function.
@bradcray Thanks for the comments. I tried implementing dependsOn method and was successful in doing so. What I think directly calling one test from another can lead to some issues. For example how to handle a case when we these two tests have mismatched numLocales which I mentioned in my previous comment. This can lead to a infinite loop and I was not able to find a solution for this. Waiting for feedback. Thanks.
At a glance, this looks related to my concern about the passivity of how the number of locales is specified currently. I don't have any specific suggestions myself at this moment, though I like the flavor of what @ben-albrecht suggests just above.
Test parameterization will be very useful for the users. There are two approaches which I can think of for implementing it:
Until this feature is implemented user can just loop through these parameters and run the test function with each of them.
A TODO for @krishnadey30 && @ben-albrecht - Split off remaining features to be implemented into standalone feature request issues, and eventually close this issue.
Some examples:
assert*() helper functions are not needed)Just wondering if people have thought about property-based testing rather than example-based testing. unit_threaded allows for check in D. proptest and quickcheck are available for Rust. testing/quick is available for Go. Quickcheck in Haskell is the place where all this started. Hypothesis is the Python way of doing the right thing.
I hadn't, but it seems worth a separate issue
Agreed. @russel - would you be interested in opening an issue on that topic as a place for further discussion?
We intend to close/archive this issue, and spin off future unit test design discussions into separate issues.
I think it will be a good feature. We generally test based on the assumptions which we made while writing the code and the tests sometimes miss the bug. Having a feature testing can solve this issue to some extent.
@ben-albrecht I cheated and just did a cut and paste to a new issue, #15488 . :-)