Problem-specifications: acronym: add test case for all caps within mixed case word

Created on 11 May 2017 · 18Comments · Source: exercism/problem-specifications

@weibeld pointed out to me that although my acronym Java solution passes all current test cases, it will not work for the word/phrase SuperHTMLParser. The correct acronym is SHP, not SH as given by my earlier solution.

I suggest we add SuperHTMLParser (or something of similar form) to the test suite, with the expected acronym value of SHP.

(Originally posted as https://github.com/exercism/xjava/issues/520)

Source

chuckwondo

Most helpful comment

@Insti Yes, I will do it ASAP (removing HyperText Markup Language).

weibeld on 12 May 2017

❤1 👍1

All 18 comments

I guess you are talking about acronym and not anagram, right?

I would assume most tracks treat acronym as a simple (difficulty 1, in the first 1/3 of the exercises) problem. Would this change if this test gets added? (My unchecked feeling is it would make it more complicated.)

We would probably also need to expand the minimalistic problem description to explain our rules to build an acronym.
There is already a small mismatch as it talks about "TLA (Three Letter Acronyms)" and than we test Complementary metal-oxide semiconductor -> CMOS.

behrtam on 11 May 2017

Yes, sorry, I meant acronym. I updated the issue and description.

Regarding difficulty, I can certainly see your point. If this is meant to be simple, then that test case arguably makes it less simple, so perhaps things should be left as they are.

chuckwondo on 11 May 2017

There are already tests for:

"phrase": "HyperText Markup Language",
"expected": "HTML"

and

"phrase": "GNU Image Manipulation Program",
"expected": "GIMP"

so a test for SuperHTML Parser to SHP would be consistent.
but SuperHTMLParser to SHP would not. (I would expect SH)

Can you describe the property that you are trying to test rather than just an example, this might help work out whether the test is appropriate for the problem.

Ref:
https://github.com/exercism/x-common/blob/master/exercises/acronym/canonical-data.json
https://github.com/exercism/x-common/blob/master/exercises/acronym/description.md

Insti on 11 May 2017

I don't agree that SuperHTMLParser should be SH, which is why it is given as another test case, which IMHO should be SHP, but if SH is the consensus, then never mind.

chuckwondo on 11 May 2017

What should SuperHTMLarser (:blush:) be?

Insti on 11 May 2017

SHL

The point is that you would start every "word" with a capital letter, so you wouldn't have something like SuperHTMLparser (yielding SH), or to be more extreme, superhtmlparser (yielding S).

chuckwondo on 11 May 2017

The point is that you would start every "word" with a capital letter, so you wouldn't have something like SuperHTMLparser (yielding SH), or to be more extreme, superhtmlparser (yielding S). think there is a good rule that needs testing.

Sorry, I'm confused by the negative in your examples.

Are you saying:
SuperHTMLparser should be SHL
superhtmlparser should be `` (empty string)

I suspect you have identified something that needs testing, I'm trying to work out what the rule is.

Insti on 11 May 2017

Sorry, my examples were perhaps phrased poorly.

What I meant is that if you _expect_ SHP, you wouldn't get it from superhtmlparser, which instead would correctly product S as the acronym. You would only get SHP from any of the following:

SuperHTMLParser
superHTMLParser
super html parser
SuperHTML Parser
Super HTML parser
etc.

These are the cases for finding the start of a word:

It's the first letter (case insensitive) in the phrase
It's the first letter (case insensitive) following a non-letter
It's the first capital letter following a lowercase letter
It's the first capital letter following a run of capital letters (i.e., the first capital letter following an acronym embedded within the phrase)

SuperHTMLParser fits the last case.

These cases are also why PHP: Hypertext Processor is PHP and _not_ PHTP. But PHPHypertextProcessor should _also_ yield PHP, not PP, as you seem to suggest it should.

chuckwondo on 11 May 2017

It's the first capital letter [, followed by lowercase letters,] following a run of capital letters (i.e., the first capital letter following an acronym embedded within the phrase)

This is the rule that is not currently tested, and is one of two(?) possible interpretations of what should happen to a run of capital letters.

These cases are also why PHP: Hypertext Processor is PHP and not PHTP. But PHPHypertextProcessor should also yield PHP, not PP, as you seem to suggest it should.

PP is the result of the other interpretation, where uppercase letters are always grouped to the left.

I'm not making any claim as to which is "correct", but as neither is specified in the description accepting solutions using either interpretation seems reasonable.

Insti on 11 May 2017

Yes, and I suppose that's the point. This needs clarification to remove the ambiguity. I feel that the case I'm bringing up is a valid clarification, but it doesn't matter to me which answer is deemed correct. The important point is the removal of the ambiguity, otherwise this thread wouldn't exist.

Since this particular point is clearly fraught with additional difficulty (as noted by @behrtam), perhaps the clarification to remove the ambiguity is to simply go with your suggestion that SuperHTMLParser should yield SH. That would keep it simple and require no modifications other than an additional test case to go with the problem disambiguation in the README file.

How does that sound?

chuckwondo on 11 May 2017

I agree that the issue you have raised is valid and I'm glad we've worked out what the root of it is.

Personally, I'm not sure it needs fixing.

I interpret TLA as an example rather than a specification and some think that a little ambiguity in specifications leads to interesting review discussion possibilities.

(But I'm not the boss around here.)

If you think it's important, please submit a PR with your proposed changes.
Note that it is also possible to add a HINTS.md file to the xjava repository to provide clarifications that are specific to the Java version of this problem.

Insti on 11 May 2017

An interesting case. I see it as likely to appear as an identifier in programming. The accepted conventions for some languages would say that you must write it as SuperHTMLParser, such as https://github.com/golang/go/wiki/CodeReviewComments#initialisms, while others would say it must be SuperHtmlParser such as https://msdn.microsoft.com/en-us/library/ms229043(v=vs.110).aspx

Of course, what form an identifier takes has only tangential bearing to what the tests should be, so I'm sorry for going off on a tangent.

The current https://github.com/petertseng/x-common/blob/verify/exercises/acronym/verify.rb would render SuperHTMLParser as SHTMLP, hilariously. Given my word choice, I clearly agree that it's hilariously wrong, but the hilarity may point out that the verifier only expected to take in certain kinds of inputs. Obviously a possible objection is that the verifier is inadequate. The objection is noted.

Here's one thing I noticed, if it helps: The current inputs to acronym are all written in a form I would expect them to appear as in natural language. There is no test for typical programming identifiers such as "phpHypertextPreprocessor" but there is one for "PHP: Hypertext Preprocessor". So if someone is looking for an explanation that ties the current test caes together and tells us why we don't see SuperHTMLParser, maybe that is it.

Oh yes, you probably noticed that I forgot to mention whether I personally think SuperHTMLParser should be added as a test. Silly me, always so forgetful. I'm sorry for going off topic.

petertseng on 11 May 2017

I pointed out the SuperHTMLParser case to @chuckwondo.

My opinion is that if acronym is a simple exercise (i.e. first 1/3 of exercises as pointed out by @Insti), then mixed-case phrases should probably be eliminated altogether. I think the single mixed-case phrase HyperText Markup Language (among all the other easy cases), leads many people to use ugly hacks to split up the specific HyperTextcase instead of using a proper regex. That's at least my observation looking at some of the submissions.

On the other hand, if the focus of acronym is on regexes and it's an intermediate exercise, then I would definitely include SuperHTMLParser, just for the sake of completeness. Anyway, once you have a regex for handling HyperText, it's relatively easy to extend it to handle SuperHTMLParser.

weibeld on 12 May 2017

My vote would be for simple and eliminating mixed case phrases. Great suggestion. 👍

Insti on 12 May 2017

👍2

The current version of acronym is easily solvable by only considering the
previous character when deciding if I have to insert the current one.
Making "SuperHTMLParser" return "SHP" rather than "SH" would mean one of
the following which would make the exercise unnecessarily complext:

Consider the previous and the next char when deciding if we add the
current to the abbrev
Considering the previous to decide about current and consider the current
char to decide about the previous one.

Tammo Behrends notifications@github.com schrieb am Do., 11. Mai 2017 um
12:33 Uhr:

I guess you are talking about acronym and not anagram, right?

I would assume most tracks treat acronym as a simple (difficulty 1, in
the first 1/3 of the exercises) problem. Would this change if this test
gets added?

We would probably also need to expand the minimalistic problem description
to explain our rules to build an acronym.
There is already a small mismatch as it talks about "TLA (Three Letter
Acronyms)" and than we test Complementary metal-oxide semiconductor ->
CMOS.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/exercism/x-common/issues/787#issuecomment-300750481,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADmR7FTg4HD8bptPcGCeL0nb5gmlqF_ks5r4uPzgaJpZM4NXvJQ
.

NobbZ on 12 May 2017

My vote would also be to remove HyperText Markup Language from this exercise.

And then I could imagine a new exercise which is specifically about parsing mixed-case. So, this new exercise would contain only mixed-case phrases like HyperText Markup Language, SuperHTMLParser and other variations. In this way, people would be kind of forced to us regexes from the beginning.

weibeld on 12 May 2017

❤1 👍1

@weibeld will you make a PR with the necessary changes?

Insti on 12 May 2017

@Insti Yes, I will do it ASAP (removing HyperText Markup Language).

weibeld on 12 May 2017

❤1 👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

complex-numbers: description for exponential function is unclear

wolf99 · 5Comments

pangram: missing edge case

kytrinyx · 4Comments

WIP app vs Draft Pull Requests

petertseng · 5Comments

scale-generator: Implement canonical-data.json

kytrinyx · 3Comments

circular-buffer: Overwrite operation is not clear

shaleh · 3Comments