Junit5: Introduce ArgumentsProvider that computes the cartesian product of collections of datapoints

Created on 21 May 2018  路  39Comments  路  Source: junit-team/junit5

This issue originates from a dicussion in https://github.com/junit-team/junit5/pull/1422#issuecomment-390582297.

While often it suffices to specify each set of arguments for the invocation of a parameterized test explicitly, there are cases where it's simply the combination of lists of potential values for each parameter, i.e. the cartesian product of these collections. While it's certainly already possible to compute these combinations manually, e.g. in a method referenced by @MethodSource, it would be more convenient if there were a declarative way to specify that.

Recently, @sormuras has written a sample CartesianProductProvider extension which does that by implementing TestTemplateInvocationContextProvider. However, by implementing this functionality as a ArgumentsProvider users would not have to learn a new concept and we could reuse the existing infrastructure.

One possibility would be to use fields and methods, like the Theories runner from JUnit 4.

@BaseSet
static final Set<String> strings = Set.of("foo", "bar");

@BaseSet
static Set<Integer> ints() {
    return Set.of(23, 42);
}

@ParameterizedTest
@AllCombinationsSource
void test(String a, int b) {...}

@junit-team/junit-lambda Thoughts?

Deliverables

  • [ ] Introduce new ArgumentsProvider and related annotations
Jupiter parameterized tests enhancement

Most helpful comment

Based on internal team discussions, we are currently leaning toward one of the following _programmatic_ APIs:

List<Arguments> Arguments.cartesianProduct(Set<?>... sets);

Or:

Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);

In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument _tuples_.

All 39 comments

I think that sounds reasonable in terms of a new feature.

The hardest part will be coming up with good names for the annotations. 馃槈

Maybe, instead of introducing a bunch of new annotations, a simple API to combine sets of arguments from a @MethodSource method would be enough.

Something like

static Collection<Arguments> cartesianProduct() {
    return Arguments.builder(Set.of("foo", "bar"))
        .combineWith(Set.of(23, 42))
        .build();
}

or

static Collection<Arguments> cartesianProduct() {
    return Arguments.cartesianProduct(Set.of("foo", "bar"), Set.of(23, 42));
}

True, that may prove more elegant, and it definitely avoids an explosion of annotations (which we should aim to avoid).

So I agree: let's go with a programmatic approach via @MethodSource first and see what people think of that!

Though I would have Arguments.cartesianProduct() return a Set instead of a Collection, since it _must_ be a set by definition -- right? 馃

Well, a Set is a Collection... 馃槈

While, mathematically speaking, it must be a set, I think we should be more lenient than that. I think we should make it work with arbitrary Collections and just return a Collection or Iterable. So, for instance, if one passes a List including duplicates (according to equals()), we would include the duplicates in the resulting Collection. Does that make sense?

Ummmm.... no, not really. 馃槆

My point was that the framework should actually require/enforce _set_ semantics and avoid duplicated invocations of _equivalent_ products.

Does _that_ make sense? 馃槈

I think that's debatable because of potentially broken/slow equals implementations... 馃

Hmmmmmmmmmmmmm

I can't see your method signature for Arguments.cartesianProduct().

Are you proposing the following?

```java
Collection Arguments.cartesianProduct(Collection... collections);
````

... b/c I was thinking of this:

```java
Set Arguments.cartesianProduct(Set... sets);
````

I think that's debatable because of potentially broken/slow equals implementations.

I'm of course not referring to built-in Java types, but user-defined ones. E.g. I've often encountered JPA entities where equals() only compared their id field. Or, the best ever equals() implementation I've stumbled upon which unconditionally returned false. Of course, that's nonsense and people shouldn't do that. On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods so I'm not convinced we should be this strict in this case.

I can't see your method signature for Arguments.cartesianProduct().

Almost! I was thinking of this:

Collection<Arguments> Arguments.cartesianProduct(Collection<?>... collections);

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

I knew what you meant regarding equals/hashCode. I was just confirming the overall signature. 馃槈

Oops... yeah... when I typed <Object> I of course meant <?>.

On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods

That's true, but... those duplicates would be returned by user code; whereas, Arguments.cartesianProduct() would be framework code that can be more _exact_.

In any case, I see your point regarding broken equals/hashCode implementations in user code. Hence, my follow-up comment below.

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

I think that would almost be a requirement to support that even if only as an opt-in feature in the builder.

the best ever equals() implementation I've stumbled upon which unconditionally returned false.

That's quite a _jewel of software engineering_, but... I've seen something that tops it (on another level): a method in the service layer annotated with @Ignore from JUnit 4. I'll just leave it at that. 馃槆

A couple of random (hopefully useful) notes from a JUnit user:

  • The annotation-driven approach does provide some (minor) advantages--it could produce some interesting testing options when mixed with nested tests (e.g. "all tests share parameters from the base class, but this nested class provides extra parameters"). I also feel that the annotation form is slightly more readable than the method-driven. (That said, it's not like someone couldn't just write a new ArgumentsSource to use annotations if they really wanted it--just throwing out a couple of opinions! 馃槃)
  • Whatever solution you decide on, I would recommend allowing the user to have multiple parameters of the same type. For example: void testAddTwoNumbers(int first, int second). I find that this is a common use case for reducers and other pieces of code that need to combine two values (or more). This ended up being a limiting factor for @ValueSource in a lot of our testing.
  • While someone is working in this codebase, you might want to fix the docs for parameterized tests--I believe that Section 3.15 was actually intended to be section 3.14.2

I believe that Section 3.15 was actually intended to be section 3.14.2

Good catch!

Fixed in 5343323db975f1f1469ea6e7601205f94dc0ab81.

For the programmatic approach, I would suggest that the method should return a lazy Stream<Arguments> instead of an already materialized collection.

I had a similar requirement some time ago. There was a huge number of input combinations generated by a cartesian product, but only a few of them were actually valid combinations for the test scenarios. An example would be that for the german version of a webshop, only certain payment methods and only a shipping address in germany would be valid.

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first. I can provide an implementation if needed.

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

So if we truly implement it solely based on streams instead of building a collection upfront, we would then likely need to keep performance in mind in order to support large data sets.

I can provide an implementation if needed.

@jhorstmann, perhaps just sharing your ideas/intentions would be a good start.

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

@sbrannen Guava's Sets.cartesianProduct is relatively lazy by my understanding, so I imagine it wouldn't be impossible to make a (sequential) streams version which is at least as lazy and performant. But I admit that I've not personally read anything about the performance of stream-based Cartesian sets, so I may be speaking nonsense... :)

Sure. I'm not saying it's not possible. I'm just saying that I _assume_ it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and _stream_ the Cartesian product on the fly back to the Jupiter TestEngine for processing. 馃槈

Based on internal team discussions, we are currently leaning toward one of the following _programmatic_ APIs:

List<Arguments> Arguments.cartesianProduct(Set<?>... sets);

Or:

Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);

In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument _tuples_.

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first.

That sounds like a reasonable argument to me in favor of streaming results: I had not considered that a user might want to _filter_ the Cartesian product tuples before returning them to the Jupiter TestEngine.

@jbduncan, regarding possible performance issues, I think this blog is what I was recalling (or similar at least).

Quoting that blog:

this code computes the entire stream of cartesian product values before processing the first value.

I haven't verified the claim of course, but it at least sounds like the author put a lot of consideration into the presented solution.

p.s. FWIW, most _solutions_ on Stack Overflow and similar channels suggest the use of flatMap() but don't mention any issues with performance.

@sbrannen Cool, thank you for pointing me in the direction of that blog. I gave it a scan yesterday, and it looks promising to me!

Sure. I'm not saying it's not possible. I'm just saying that I assume it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and stream the Cartesian product on the fly back to the Jupiter TestEngine for processing. :wink:

@sbrannen Yeah, okay. Sorry, I wasn't terribly clear earlier: I was sub-consciously hoping that we could easily derive an implementation like Guava's Sets.cartesianProduct for this issue without needing to use the full-blown spliterator-based solution given in that blog, as I vaguely recalled that Sets.cartesianProduct's javadoc gave a citation for the algorithm its implementation was based on. But looking back at said javadoc, I realise that I was mistaken!

Thus, using the blog you linked earlier sounds like the best way forward to me.

P.S. FYI, when I was scanning through the blog, I actually recognised the JDK bug report that the author opened regarding flatMap(), for I saw a similar one being discussed on the OpenJDK mailing lists not too long ago. It turns out it's actually a duplicate of this bug, which is fixed already but only for Java 10+.

If the JUnit 5 team does use that blog as inspiration for resolving this issue, might it be worth opening another issue to test out the simpler flatMap()-based solution in the future if or when JUnit 5+ depends on Java 10+?

For reference, my implementation is available at https://gist.github.com/jhorstmann/a7aba9947bc4926a75f6de8f69560c6e and actually looks very similar to the blog post, except it uses an iterator which is then wrapped into a spliterator. I can add any license if needed.

The cartesianProduct method returning Stream<Arguments> could already be implemented outside of junit. I'm thinking if we could make it even easier to use. Especially with a larger number of dimensions, mapping those to method arguments by index could be error prone. Especially if you'd want different subsets of dimensions for different test methods. Another approach could be to allow the different Source annotations on method parameters

void test(@CsvFileSource(...) @CsvToPerson Person person,
          @EnumSource(...) PaymentMethod paymentMethod,
          @MethodSource(...) Address shippingAddress,
          @MethodSource(...) Address billingAddress)

If I understand it correctly, the above annotations could also be used as meta-annotations, making the example even nicer. Different test methods could so easily use different sets of dimensions. Filtering of valid combinations could be done inside the method by using Assumptions.

Interesting idea! However, it wouldn't support multiple parameters per source at all. For starters I think we should define the cartesianProduct method in Arguments.

@jhorstmann Would you like to submit a PR with your implementation?

Is there some advances on that matter? I am just looking for such a feature, actually.

Is there some advances on that matter? I am just looking for such a feature, actually.

Nothing official, but if I recall correctly, @sormuras created a proof of concept.

@sormuras ?

Aye. The POC wants to be integrated into JUnit Pioneer. See https://github.com/junit-pioneer/junit-pioneer/issues/68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

I think a simple cartesianProduct method would be easy to implement and provide a way to implement such use cases along with @MethodSource without requiring any additional dependencies.

For example, I needed sth. like that here:
https://github.com/marcphilipp/nexus-publish-plugin/blob/master/src/test/kotlin/de/marcphilipp/gradle/nexus/NexusPublishPluginTests.kt#L55-L61

I have a professional case (sorry I can't share it here) where I have many tests, including a bunch to be parametrized. These parameters build on the same few sets, but they are often different combinations (different size and order) of those sets. Rather than using a different function for each of them, I would rather provide the few annotations each of them needs, with the right sets in the right order. It would be way better to read than a double method every time.

@matthieu-vergne Could you please explain in more detail what you mean with "a double method every time"?

The test method + the argument provider method.

Based on internal team discussions, we are currently leaning toward one of the following _programmatic_ APIs:

List<Arguments> Arguments.cartesianProduct(Set<?>... sets);

Or:

Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);

In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument _tuples_.

@sbrannen
I like this approach, but would prefer cartesianProduct(Collection ... collections) as suggested by @marcphilipp previously. The reason being that when writing tests in Groovy, we'd have to do:
[false, true] as Set
etc. for each parameter. With Collection we won't have to add the as Set every time.

I'd be okay with a more loose definition based on Collection if it doesn't make the implementation more complex.

If you can go this way, an Iterable may be enough.

No, that's just a sample project that shows how to accomplish this in an extension. It's not an officially supported feature (yet) since it's not as extensible as we'd like it to be.

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

@marcphilipp Thanks for the clarification.

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.
Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

It seems it was not yet advertised: @Michael1993 brought the feature to life with junit-pioneer/junit-pioneer#321, and I'm proposing some additions in junit-pioneer/junit-pioneer#409 and junit-pioneer/junit-pioneer#416

Was this page helpful?
0 / 5 - 0 ratings