Proposal: Add a way to mark test cases as optional in the canonical data.
For example with a flag "optional": true for optional cases, null or no flag for cases that (almost) every track should implement. Perhaps with an additional comment or reference to an issue/PR that explains when to implement it and when not to.
This allows automated test generators to handle these cases and leaves it up to the maintainers of the track if they want to add them or not.
Motivation: There are test cases (e.g. #1487 or complex-numbers#L291-L300) that some tracks want to implement, but other tracks don't, for example because it would unnecessarily increase the complexity of the exercise. While every track can implement additional tests manually, once a certain amount of tracks would use such a case, it may be useful to add them to the canonical data. (full context that lead to this issue being created)
To help reviewers evaluate and decide on this proposal, I contribute three additional motivations as potential factors.
Floating-point tests in triangle:
Performance tests, such as pythagorean-triplet (https://github.com/exercism/problem-specifications/issues/1347) and alphametics (https://github.com/exercism/problem-specifications/pull/1024)
Invalid inputs: recall that we've had discussions in https://github.com/exercism/problem-specifications/issues/902 . Currently, unless they are essential to the exercise, we will not put them in canonical-data.json , but acknowledge that tracks may choose to add their own. There may be situations where multiple tracks would choose to add such cases, in which cases it could be valuable to share them, and this could be a mechanism for doing so.
I'm interested to see how this proposal goes.
A few questions:
optional" flag is not really enough. Personally, I'd like if it were a string, briefly explaining the reason for introducing the case.@pgaspar wrote:
Is using "optional" as a category too broad?
Yes, and additionally: Aren't all tests technically optional?
For example, the Haskell track's tests for acronym deliberately preserves a test case that was removed from the canonical tests at some point. I'd assume there are similar cases of deliberately leaving out canonical tests, or having them commented out / skipped by individual language tracks at the track's own discretion.
(Would we want to maintain a set of narrower categories?)
We could maintain a set of categories for why tests may be conditionally excluded.
@NobbZ wrote:
I'd like if it were a string, briefly explaining the reason for introducing the case.
Like the following?
"comment": "This test uses big numbers."
"comment": "This test uses floating-point numbers."
Since we have had no objections from makers of test generators yet, I would favor this solution for being
1) least invasive wrt. changing the format for canonical data, making existing test generators technically support the format without changes, and
2) least complex wrt. optionality: Let maintainers deal with excluding test cases, accept that some tests may not apply to all language tracks, and be more inclusive towards adding tests that only address some tracks and not others.
@NobbZ wrote:
I'd like if it were a string, briefly explaining the reason for introducing the case.
Correction: How would this reason deviate from the test message?
For example, if it were the case of armstrong-numbers in #1487, wouldn't the test description be something like "Test against large number to ensure that a coercing library function does not lose precision"?
In my opinion, just an "optional" flag is not really enough. Personally, I'd like if it were a string, briefly explaining the reason for introducing the case.
"optional": "Explanation for why this is an edge case" instead of "optional": true could work, too.
Correction: How would this reason deviate from the test message?
Imo, the test descriptions/names are more often an explanation for the student, while these comments/flags are purely for maintainers.
For example, if it were the case of armstrong-numbers in #1487, wouldn't the test description be something like "Test against large number to ensure that a coercing library function does not lose precision"?
That's not the only reason for the test and this doesn't tell maintainers that, in a language where 64 bit integers aren't typically/always used, they might not want to use this because it introduces an additional concept. There are also some examples above where the comment would be far too long for the description.
Why not just use a comment?
@ErikSchierboom brought up in the chat that one should not have to scan comments to find out if a test case should be omitted. The difference is that generators would be able to detect the kind of edge cases described above. For manual creation of test suites, there isn't really a difference.
The difference is that generators would be able to detect the kind of edge cases described above.
Maybe it would be easier for me to understand this point with examples of how a generator could use this flag.
One example I can think of is showing a warning whenever a maintainer runs the generator and it sees an optional test. Wouldn't it become a bit repetitive to always see these warnings even after initially verifying that we do want the test? Is the assumption that generators would have a way to "resolve" these warnings somehow?
Sorry if these questions are a bit of a tangent. I'm only looking at this through what I know of the Ruby generators so it's very likely I'm missing something.
It feels like an optional test really imposes a requirement for some
statefulness on a generator: a useful generator will require action from
the person using it the first time it encounters a particular optional
test, but on all subsequent runs will remember the previous choice made for
that test.
This imposes an additional engineering burden on the people maintaining the
generators. As one of them, I personally would prefer to avoid an optional
flag. All tests are implicitly optional anyway, as things currently stand.
On Mon, Apr 1, 2019 at 9:32 PM Pedro Gaspar notifications@github.com
wrote:
The difference is that generators would be able to detect the kind of edge
cases described above.Maybe it would be easier for me to understand this point with examples of
how a generator could use this flag.One example I can think of is showing a warning whenever a maintainer runs
the generator and it sees an optional test. Wouldn't it become a bit
repetitive to always see these warnings even after initially verifying that
we do want the test? Is the assumption that generators would have a way to
"resolve" these warnings somehow?Sorry if these questions are a bit of a tangent. I'm only looking at this
through what I know of the Ruby generators so it's very likely I'm missing
something.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/exercism/problem-specifications/issues/1492#issuecomment-478714663,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHdeTqCpVkZ2xIsokdLKjgVhhstmiGXfks5vcl7lgaJpZM4cSv2U
.
This imposes an additional engineering burden on the people maintaining the generators. As one of them, I personally would prefer to avoid an optional flag. All tests are implicitly optional anyway, as things currently stand.
As @NobbZ said above:
Yes. The assumption is, that most testgenerators will just continue to work without any specialcasing of the optional tests. We are aware though, that some generators might fail badly when seeing a JSON that is not formed as expected. In the chat we assumed, leaking optionals is the lesser of all discussed evils.
Would this break the generators you're maintaining, @coriolinus?
One example I can think of is showing a warning whenever a maintainer runs the generator and it sees an optional test. Wouldn't it become a bit repetitive to always see these warnings even after initially verifying that we do want the test? Is the assumption that generators would have a way to "resolve" these warnings somehow?
This is up to the maintainer of the generator. I'm currently working on creating Julia generators and these are some of the options I'm considering:
Sorry if these questions are a bit of a tangent. I'm only looking at this through what I know of the Ruby generators so it's very likely I'm missing something.
No need to be sorry, it's very valuable to know how different generators would handle this.
"optional": "Explanation for why this is an edge case" instead of "optional": true could work, too.
This is my preferred option. I'm hesitant to use comments for this, as I feel people should be able to freely change the comments without breaking any generators.
Applying this to @petertseng's example:
{
"comments": [
" Your track may choose to skip this test ",
" and deal only with integers if appropriate "
],
"description": "sides may be floats",
"property": "scalene",
"input": {
"sides": [0.5, 0.4, 0.6]
},
"expected": true,
"optional": "floating-point-numbers"
}
Or something like that.
"optional": "Explanation for why this is an edge case" instead of "optional": true could work, too.
I also like this idea.
Is there anyone against @petertseng's proposal?
I'm 100% in favour. I do suggest we hold off on actually _adding_ flags until #1496 's other suggestions have a similar consensus. It makes sense to me to (potentially) add a bunch of flags all at once instead of many over a certain timespan in regards with strain to generator maintainers.
Is there anyone against @petertseng's proposal?
Small point of order, sorry to interrupt...
I do not perceive myself as having made any proposal in this issue - to help people evaluate whether people are for or against, I think it would be helpful if you would point out the proposal that I must have accidentally made. (Or maybe this is a case of misattribution, in which case I think it would be most courteous to the proposer if the proposal were attributed to the right person)
Sorry for chiming in late.
I'm opposed to this as the Dart track generally tries to re-run our generator every so often and it would be very repetitive to have to figure out whether we chose to do include an optional test or not.
Additionally, this suggestion would require all the tracks that use generators to update their generator on top of any other existing work or improvements.
All just for the sake of a few tracks that chose to have similar tests added.
I would prefer a more empirical driven proposal, just as exercism having a mechanism for looking at the implementations of every exercise across all the tracks. From that point, exercism could compare the number of tests, the test names, and the expected results with the problem-specification.
Once that happens and we see a better automated way of telling how many tracks have included tests that seem similar, then I think we could evaluate whether to add those specific tests to the specification.
I would much prefer that then adding another key to the specification that our generator would have to handle.
To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.
That seems better to me than having optional tests across all the specs that all the generators would have to be updated to address/ignore.
It sounds to me like the choice we're trying to balance is this: _Do we make the a call about inclusion/exclusion at the problem-specifications level, or at the track level?_
My understanding is that even if we mark some test cases as optional in the specification, then a track might want to include only some optional tests, and exclude some of the tests that are considered "core" to the exercise.
I think it would be worth considering adding a "metadata" or similar section to the spec for each individual test, which could have a unique identifier that doesn't change, a type field (e.g. "basic", "errors", "whatever", ...) and an optional comment field, which explains what the motivation for having that test. We could update all existing specs automatically in one fell swoop to give them identifiers, setting the type to "basic" (or whatever we decide the term should be).
The assumption would then be that the generator would include everything by default, but could have a configuration file that maintainers can update to exclude based on a type or an identifier.
Thoughts?
TL;DR I still prefer the simple "optional": "floating-point-numbers" approach.
My understanding is that even if we mark some test cases as optional in the specification, then a track might want to include only some optional tests, and exclude some of the tests that are considered "core" to the exercise.
I think this is indeed correct. although excluding "core" tests is really, really uncommon I think. Once a track start skipping "core" tests, maybe that is an indication that this exercise is not particularly well suited for the track?
We could update all existing specs automatically in one fell swoop to give them identifiers, setting the type to "basic" (or whatever we decide the term should be).
I'm not completely sold on this. Having the field be present in all records is great for consistency and tooling (like excluding things etc.), but it seems like it is only _really_ necessary for those corner cases in which some tracks might want to exclude a test case.
My preference would still be just a simple additional property:
"optional": "floating-point-numbers"
My reasoning for this is: it is easy to parse, simple to filter on and can be judiciously applied to only those "problematic" test cases.
I'm opposed to this as the Dart track generally tries to re-run our generator every so often and it would be very repetitive to have to figure out whether we chose to do include an optional test or not.
Well, the amount of work this entails really depends on how often we add an "optional" test, right? I expect there to be relatively little, although this is just an educated guess of course.
To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.
This would be nice to have. We currently do this a bit more informally in the PR/issue discussion.
To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.
I also think this would be very nice to have but implementing that would likely take way more work than changing the generators (which is optional! A track can also choose to include all optional tests, which is usually the default when the additional key is ignored, or ignore them entirely). One would have to create parsers for test suites of more than 50 tracks and the tests may not have a direct reference to the canonical data.
A track can also choose to include all optional tests, which is usually the default when the additional key is ignored, or ignore them entirely.
This is also my viewpoint.
In the interest of moving forward, I would like to ask @Stargator, who said that maintainers of test generators will have extra work in dealing with the "optional": "..." addition to the canonical schema:
Does the presence of a new dictionary value not only pose a problem to those who do typed parsing of the schema? (E.g. using Haskell's aeson.) My assumption would be that maintainers of test generators can ignore this property to the extent that their generated tests work fine without it.
It would in fact help the OCaml test generator if such a property was available, because then we could e.g. disable triangle's floating-point tests, since the exercise is intentionally written with integers for that track without fuzzy matching on test case comments. See exercism/ocaml#338.
All just for the sake of a few tracks that chose to have similar tests added.
You are right in saying that there are very few deviations (both in terms of loose comments in canonical data saying something is optional, and in terms of tracks that choose for themselves for something to be optional or additional).
But I don't think you can use this count as a reasonable argument against this feature, because it's structural: We want tracks to conform, because this is easier to maintain, but we want tracks to be able to break out (e.g. let the Haskell track test that pangram is made lazy without having to express that in JSON).
An example of something you don't see is Unicode in track-specific tests. We want more Unicode tests. But we don't want to force Unicode on all tracks. Having "optional": "unicode" will enable meaningful tests like exercism/haskell#827 so mentors don't have to point out trivial mistakes made in a language that is Unicode by default.
I would prefer a more empirical driven proposal [...]
So would I.
[...] looking at the implementations of every exercise across all the tracks. From that point, exercism could compare the number of tests, the test names, and the expected results with the problem-specification.
I've done empirical tests of all track repos for other purposes, and it's a lot of work. In my latest attempt I started to write a jq tutorial with use-cases from Exercism, hoping that this would eventually become a cookbook for track maintainers. But it's stashed, because it's a lot of work.
So I also don't think that this is a reasonable argument to oppose the feature.
Once that happens and we see a better automated way of telling how many tracks have included tests that seem similar, then I think we could evaluate whether to add those specific tests to the specification.
To begin with, we don't even know how many tracks that keep track of the canonical test version for each exercise in a systematic way. See for example exercism/javascript#628 for a track that's getting there. I'm willing to bet that the number is quite low.
So we either start fuzzy-matching across language syntaxes (using multiple historical phrasings for canonical data for tests where the wording changed altogether) or we push for a canonical version system for each language (which must subsequently be maintained), until we eventually get to a point where we can ask: How common are tests on each track?
It's a lot of work.
And until then mergers of problem-specifications have a good intuition about what tests are not being added when they could because our current all-or-nothing policy prevents us from seeing what we're missing out on.
To summarize, I would prefer an automated way for Exercism to see what tests are implemented within an exercise across tracks and then determine whether to add it as a normal/default test case.
So would I.
But I don't think that the work involved in this alternative is realistic.
While adding this key is extremely low-effort and parallelizable.
I, for one, will start labelling "optional": "unicode" and "optional": "big-integer" and "optional": "floating-point" to a number of exercises for the Haskell and OCaml track.
Edit: Removed a practical note to the PR itself.
But I don't think that the work involved in this alternative is realistic.
While adding this key is extremely low-effort and parallelizable.
I agree with both statements. It is a relatively minor change that will definitely have a positive impact on a number of tracks.
p.s. the C# and F# generators won't be affected by this change, they will have to opt-in to use the feature
To begin with, we don't even know how many tracks that keep track of the canonical test version for each exercise in a systematic way. See for example exercism/javascript#628 for a track that's getting there. I'm willing to bet that the number is quite low.
I believe it was 6 or so the last time I checked. That's 6 out of ~50.
Resolved by #1518 (unless I'm missing something)
Most helpful comment
This is my preferred option. I'm hesitant to use comments for this, as I feel people should be able to freely change the comments without breaking any generators.
Applying this to @petertseng's example:
Or something like that.