@weibeld pointed out to me that although my acronym Java solution passes all current test cases, it will not work for the word/phrase SuperHTMLParser. The correct acronym is SHP, not SH as given by my earlier solution.
I suggest we add SuperHTMLParser (or something of similar form) to the test suite, with the expected acronym value of SHP.
(Originally posted as https://github.com/exercism/xjava/issues/520)
I guess you are talking about acronym and not anagram, right?
I would assume most tracks treat acronym as a simple (difficulty 1, in the first 1/3 of the exercises) problem. Would this change if this test gets added? (My unchecked feeling is it would make it more complicated.)
We would probably also need to expand the minimalistic problem description to explain our rules to build an acronym.
There is already a small mismatch as it talks about "TLA (Three Letter Acronyms)" and than we test Complementary metal-oxide semiconductor -> CMOS.
Yes, sorry, I meant acronym. I updated the issue and description.
Regarding difficulty, I can certainly see your point. If this is meant to be simple, then that test case arguably makes it less simple, so perhaps things should be left as they are.
There are already tests for:
"phrase": "HyperText Markup Language",
"expected": "HTML"
and
"phrase": "GNU Image Manipulation Program",
"expected": "GIMP"
so a test for SuperHTML Parser to SHP would be consistent.
but SuperHTMLParser to SHP would not. (I would expect SH)
Can you describe the property that you are trying to test rather than just an example, this might help work out whether the test is appropriate for the problem.
Ref:
https://github.com/exercism/x-common/blob/master/exercises/acronym/canonical-data.json
https://github.com/exercism/x-common/blob/master/exercises/acronym/description.md
I don't agree that SuperHTMLParser should be SH, which is why it is given as another test case, which IMHO should be SHP, but if SH is the consensus, then never mind.
What should SuperHTMLarser (:blush:) be?
SHL
The point is that you would start every "word" with a capital letter, so you wouldn't have something like SuperHTMLparser (yielding SH), or to be more extreme, superhtmlparser (yielding S).
The point is that you would start every "word" with a capital letter, so you wouldn't have something like SuperHTMLparser (yielding SH), or to be more extreme, superhtmlparser (yielding S). think there is a good rule that needs testing.
Sorry, I'm confused by the negative in your examples.
Are you saying:
SuperHTMLparser should be SHL
superhtmlparser should be `` (empty string)
I suspect you have identified something that needs testing, I'm trying to work out what the rule is.
Sorry, my examples were perhaps phrased poorly.
What I meant is that if you _expect_ SHP, you wouldn't get it from superhtmlparser, which instead would correctly product S as the acronym. You would only get SHP from any of the following:
SuperHTMLParsersuperHTMLParsersuper html parserSuperHTML ParserSuper HTML parserThese are the cases for finding the start of a word:
SuperHTMLParser fits the last case.
These cases are also why PHP: Hypertext Processor is PHP and _not_ PHTP. But PHPHypertextProcessor should _also_ yield PHP, not PP, as you seem to suggest it should.
- It's the first capital letter [, followed by lowercase letters,] following a run of capital letters (i.e., the first capital letter following an acronym embedded within the phrase)
This is the rule that is not currently tested, and is one of two(?) possible interpretations of what should happen to a run of capital letters.
These cases are also why PHP: Hypertext Processor is PHP and not PHTP. But PHPHypertextProcessor should also yield PHP, not PP, as you seem to suggest it should.
PP is the result of the other interpretation, where uppercase letters are always grouped to the left.
I'm not making any claim as to which is "correct", but as neither is specified in the description accepting solutions using either interpretation seems reasonable.
Yes, and I suppose that's the point. This needs clarification to remove the ambiguity. I feel that the case I'm bringing up is a valid clarification, but it doesn't matter to me which answer is deemed correct. The important point is the removal of the ambiguity, otherwise this thread wouldn't exist.
Since this particular point is clearly fraught with additional difficulty (as noted by @behrtam), perhaps the clarification to remove the ambiguity is to simply go with your suggestion that SuperHTMLParser should yield SH. That would keep it simple and require no modifications other than an additional test case to go with the problem disambiguation in the README file.
How does that sound?
I agree that the issue you have raised is valid and I'm glad we've worked out what the root of it is.
Personally, I'm not sure it needs fixing.
I interpret TLA as an example rather than a specification and some think that a little ambiguity in specifications leads to interesting review discussion possibilities.
(But I'm not the boss around here.)
If you think it's important, please submit a PR with your proposed changes.
Note that it is also possible to add a HINTS.md file to the xjava repository to provide clarifications that are specific to the Java version of this problem.
An interesting case. I see it as likely to appear as an identifier in programming. The accepted conventions for some languages would say that you must write it as SuperHTMLParser, such as https://github.com/golang/go/wiki/CodeReviewComments#initialisms, while others would say it must be SuperHtmlParser such as https://msdn.microsoft.com/en-us/library/ms229043(v=vs.110).aspx
Of course, what form an identifier takes has only tangential bearing to what the tests should be, so I'm sorry for going off on a tangent.
The current https://github.com/petertseng/x-common/blob/verify/exercises/acronym/verify.rb would render SuperHTMLParser as SHTMLP, hilariously. Given my word choice, I clearly agree that it's hilariously wrong, but the hilarity may point out that the verifier only expected to take in certain kinds of inputs. Obviously a possible objection is that the verifier is inadequate. The objection is noted.
Here's one thing I noticed, if it helps: The current inputs to acronym are all written in a form I would expect them to appear as in natural language. There is no test for typical programming identifiers such as "phpHypertextPreprocessor" but there is one for "PHP: Hypertext Preprocessor". So if someone is looking for an explanation that ties the current test caes together and tells us why we don't see SuperHTMLParser, maybe that is it.
Oh yes, you probably noticed that I forgot to mention whether I personally think SuperHTMLParser should be added as a test. Silly me, always so forgetful. I'm sorry for going off topic.
I pointed out the SuperHTMLParser case to @chuckwondo.
My opinion is that if acronym is a simple exercise (i.e. first 1/3 of exercises as pointed out by @Insti), then mixed-case phrases should probably be eliminated altogether. I think the single mixed-case phrase HyperText Markup Language (among all the other easy cases), leads many people to use ugly hacks to split up the specific HyperTextcase instead of using a proper regex. That's at least my observation looking at some of the submissions.
On the other hand, if the focus of acronym is on regexes and it's an intermediate exercise, then I would definitely include SuperHTMLParser, just for the sake of completeness. Anyway, once you have a regex for handling HyperText, it's relatively easy to extend it to handle SuperHTMLParser.
My vote would be for simple and eliminating mixed case phrases. Great suggestion. 👍
The current version of acronym is easily solvable by only considering the
previous character when deciding if I have to insert the current one.
Making "SuperHTMLParser" return "SHP" rather than "SH" would mean one of
the following which would make the exercise unnecessarily complext:
Tammo Behrends notifications@github.com schrieb am Do., 11. Mai 2017 um
12:33 Uhr:
I guess you are talking about acronym and not anagram, right?
I would assume most tracks treat acronym as a simple (difficulty 1, in
the first 1/3 of the exercises) problem. Would this change if this test
gets added?We would probably also need to expand the minimalistic problem description
to explain our rules to build an acronym.
There is already a small mismatch as it talks about "TLA (Three Letter
Acronyms)" and than we test Complementary metal-oxide semiconductor ->
CMOS.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/exercism/x-common/issues/787#issuecomment-300750481,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADmR7FTg4HD8bptPcGCeL0nb5gmlqF_ks5r4uPzgaJpZM4NXvJQ
.
My vote would also be to remove HyperText Markup Language from this exercise.
And then I could imagine a new exercise which is specifically about parsing mixed-case. So, this new exercise would contain only mixed-case phrases like HyperText Markup Language, SuperHTMLParser and other variations. In this way, people would be kind of forced to us regexes from the beginning.
@weibeld will you make a PR with the necessary changes?
@Insti Yes, I will do it ASAP (removing HyperText Markup Language).
Most helpful comment
@Insti Yes, I will do it ASAP (removing
HyperText Markup Language).