Yii2: BaseInflector camel2words behavour changed in 2.0.16

Created on 20 Feb 2019  Ā·  31Comments  Ā·  Source: yiisoft/yii2

What steps will reproduce the problem?

echo Inflector::camel2words("LEAD");

What is the expected result?

It should echo "Lead"

What do you get instead?

It echos "L E A D"

Additional info

| Q | A
| ---------------- | ---
| Yii version | 2.0.16
| PHP version | 7.1.25
| Operating system | Ubuntu

important

Most helpful comment

The primary point of camel2words() is to find the _words_ in the given string, without being too pedantic about whether the input was actually camelCase. Evidenced by the current test cases:

Input | Expected Output
-- | --
'camelCase' | 'Camel Case'
'lower_case' | 'Lower Case'
' tricky_stuff.it-is testing... ' | 'Tricky Stuff It Is Testing'
'ІЦеДійсноТак!' | 'І Це Дійсно Так!'

(If it were strict about only converting actually-camel-cased text, only the first and last test cases would pass.)

Given that, I think expected behavior should be that the function does its best to recognize the _words_ regardless of casing. And I suspect most would agree that FOO should be considered a single word, and FOOBar should be considered two words.

All 31 comments

Indeed in previous version it returned Lead but it was wrong because description is:

Converts a CamelCase name into space-separated words.
For example, 'PostTag' will be converted to 'Post Tag'.

2.0.16 change has been introduced in https://github.com/yiisoft/yii2/commit/b9aa0001b65400d5986488d2b64c64291dca1eeb#diff-1d93dd129d9cbdd656d16f9f4716e24e but it was not mentioned in UPGRADE file.

@cebe I hope this is considered a BC bug and not just a docs issue. Inflector::camel2words('TEST') should output Test not T E S T. And it has imposed a security bug on Craft, which was using this method to check environment variable names & other things for certain sensitive-sounding words, (e.g. key in S3_KEY).

While I understand the problem from @brandonkelly point of view we've got here this xkcd case and I would suggest using custom method and not change the core Yii helper.

Previously the regex was looking for capital letters _that weren’t preceded by another capital letter_ for this exact reason, so converting TEST => Test was the clearly intended behavior; and since there was no mention of this behavior change in the upgrade guide, I think it’s safe to say that the BC break was not intentional.

It was definitely intentional, there are tests to cover this behavior, and this is how camel case works - capital letters indicates word separators, so TEST should not result Test.

Maybe the method can be adjusted to account for whether the uppercase letter is followed by a lowercase letter. I think that will lead to generally more acceptable results.

Examples:

Input | Expected Output
-- | --
FooBARBaz | Foo Bar Baz
XFoo | X Foo
fooBar | Foo Bar
FOO | Foo

I’ll give that a shot and update the PR…

…and done (https://github.com/yiisoft/yii2/pull/17151/commits/c5d51f42e5981fa5aeba213a026e55a327377532).

How do you explain the (?<![A-Z]) lookbehind in the original regex then? Someone fell asleep on the keyboard?

You should ask author of this regexp ;), but there are no tests for this case and this is not how camel case works. In general this should be true:

Inflector::camel2words(Inflector::camelize('t e s t'), false) === 't e s t'

There was a test case to ensure lower_case gets converted to Lower Case. That’s not strictly camel case either, so I don’t think we should get too hung up on the semantics.

But still, I’m sure something like MyAWSBucket would be considered camel case, and I would expect that to be converted to My Aws Bucket, not My A W S Bucket. Latest commit in #17151 will fix that as well.

There was a test case to ensure lower_case gets converted to Lower Case. That’s not strictly camel case either, so I don’t think we should get too hung up on the semantics.

Underscore has no meaning in camel case, capitalized letter has. You're trying to create some own camel case variation which is not even internally consistent (camelize() does not match camel2words() behavior).

You're trying to create some own camel case variation which is not even internally consistent (camelize() does not match camel2words() behavior).

echo Inflector::camel2words('lower_case'); // outputs: Lower Case
echo Inflector::camelize('Lower Case');    // outputs: LowerCase (not lower_case)

Why do you expect that Inflector::camelize('Lower Case') will return lower_case?

I don’t. My point is that there’s no precedent for those two functions behaving consistently.

Obviously they're not always interchangeable - both transformations are lossy, so you may loss case of letters and type of word separators. What you should not loos is word separator itself. Your PR just make this even worse, since word separators are no longer preserved.

The primary point of camel2words() is to find the _words_ in the given string, without being too pedantic about whether the input was actually camelCase. Evidenced by the current test cases:

Input | Expected Output
-- | --
'camelCase' | 'Camel Case'
'lower_case' | 'Lower Case'
' tricky_stuff.it-is testing... ' | 'Tricky Stuff It Is Testing'
'ІЦеДійсноТак!' | 'І Це Дійсно Так!'

(If it were strict about only converting actually-camel-cased text, only the first and last test cases would pass.)

Given that, I think expected behavior should be that the function does its best to recognize the _words_ regardless of casing. And I suspect most would agree that FOO should be considered a single word, and FOOBar should be considered two words.

(If it were strict about only converting actually-camel-cased text, only the first and last test cases would pass.)

It does not matter if the input string is not correct camel case, as long as the correct camel case is handled as an actual camel case. Supporting everything else is additional value to the method, it does not affect the main purpose of it. But ignoring case (just a reminder - the case of the letters indicates word separators in camel case, so it is crucial for detecting words) and trying to guess words by using fancy regexp is something completely different - you're breaking the main purpose of this method by introducing some magic behavior. It is only a matter of time when someone hits an edge case when this will not work as expected.

If this is all about ALLCAPS strings, I suggest to lowercase them and then use old regexp to detect words - it should cover LEAD and S3_KEY and does not affect most of the edge cases with actual camel case.

Again I remind you that up until 2.0.16, the regex had a specific check for consecutive capital letters. Whether there was a test case for that use case or not, it was clearly the intent of the function to treat consecutive letters as a single word. And there is no mention of the regression in either the upgrade guide or the commit message (b9aa0001b65400d5986488d2b64c64291dca1eeb):

Fixed yii\helpers\Inflector::camel2words() to work with UTF-8

We can argue about whether the function should or should not be worrying about this until the cows come home, but this is a very clear break in backwards compatibility. Knowing how seriously Yii core devs generally take BC breaks, I’m frankly a bit shocked that we’re even debating this.

We can argue about whether the function should or should not be worrying about this until the cows come home, but this is a very clear break in backward compatibility.

Every other argument aside, the fact this breaks backward compatibility can't be denied, therefore should be considered a regression that should be addressed.

@brandonkelly For me, BC break is the only reason we're debating this. Basically, you have a string that obviously is not a camel case. And you're using a method which Converts a CamelCase name into space-separated words. And you're not getting output you're expecting, so you're proposing to change implementation of this method to not treat camel case as camel case and instead try to detect words using regexp based on some arbitrary rules (why 3 consecutive uppercase letters are treated as a word, but 2 consecutive letters is not a word?). The fact that it worked like that before is the only reason why this does not sounds insane. :P But this is pure https://xkcd.com/1172/ - you can't fix incorrect implementation because people rely on it.

why 3 consecutive uppercase letters are treated as a word, but 2 consecutive letters is not a word?

There’s nothing in the regex that cares about number of characters. Just more closely looking for what looks like actual camelCased text, rather than assuming all capital letters = camelCase.

Pre-2.0.16 behavior: Find all uppercase letters _that aren’t preceded by another uppercase letter_, and put spaces before them.

2.0.16 behavior: Find all uppercase letters and put spaces before them.

My proposed behavior: Find all uppercase letters that aren’t preceded by another uppercase letter _or_ are followed by a lowercase letter, and put spaces before them.

  | fooBar | FOO_BAR | FOOBar
-- | -- | -- | --
Pre-2.0.16 output | āœ… Foo Bar | āœ… Foo Bar | āŒ Foobar
2.0.16 output | āœ… Foo Bar | āŒ F O O B A R | āŒ F O O Bar
#17151 output | āœ… Foo Bar | āœ… Foo Bar | āœ… Foo Bar

The fact that it worked like that before is the only reason why this does not sounds insane. :P But this is pure https://xkcd.com/1172/ - you can't fix incorrect implementation because people rely on it.

You’re not the first one to link to that here, but it’s apples and oranges. In the comic, we can assume there was no code specifically instructing the CPU to overheat. In this case, the pre-2.0.16 version of the function was purposefully looking for consecutive uppercase characters to treat them as a single word. My assumption is that @SilverFire just didn’t fully understand the consequences when he removed that regex lookbehind in b9aa0001b65400d5986488d2b64c64291dca1eeb, and was only concerned with getting his new test case to pass (where ІЦе should be interpreted as two separate words).

You’re not the first one to link to that here, but it’s apples and oranges.

I disagree. Things that signals intentions are:

  • documentation (PHPDoc),
  • method name,
  • tests.

None of these suggests that this method should work in a way you're expecting.

In the comic, we can assume there was no code specifically instructing the CPU to overheat.

If there were no code for that, there would be no CPU overheat. :)

Anyway, I don't see any sense in continuing this discussion. It is not my decision how to deal with that, I just want to point that this method finally works correctly (in a way that name and PHPDoc suggest) and this issue is about reverting valid bugfix (intentional or not). If BC is a reason for this, then so be it (I still hope it will be fixed in Yii 3 and your implementation will be extracted to separate method and cale2words() will handle camel case as camel case), but note that #17151 breaks BC anyway...

I have some other examples where it makes more sense to not split up upper case characters:

  • generateCSRF -> generate CSRF
  • findCPU -> find CPU
  • ...

Also we had a similar discussion before: https://github.com/yiisoft/yii2/pull/4166 where we agree on the fact that splitting these does not make sense. It just turned out that no test had been added for camel2words() only for other similar methods.

@cebe These are just bad names - it should be generateCsrf and findCpu. For the same reason why we have Html helper instead of HTML and Json instead of JSON. I was hoping that more strict rules in Inflector will result more consistent and predictable names (including core framework itself).

@rob006 they are acronyms. The comparison does not make sense.

The Wikipedia entry on Camel Case literally has an example of this mixed-case camel-case format under Programing and Coding:

Programming identifiers often need to contain acronyms and initialisms that are already in uppercase, such as "old HTML file". By analogy with the title case rules, the natural camel case rendering would have the abbreviation all in uppercase, namely "oldHTMLFile".

@brandonkelly The same paragraph has explanation why this is problematic and unpractical. Yii already had CJSON episode, I was hopping that we've learned from the mistakes that this is the wrong approach.

Only problematic when there’s a second abbreviation; that doesn’t mean there aren’t cases where it’s valid.

Only problematic when there’s a second abbreviation

Or for consecutive one-letter words. For me its is enough to call it unreliable - loos of case will give less harm than loos of word separators.

I've re-checked everything and spoke with @SilverFire who introduced the change.

  1. The change was to add unicode only. Logic change wasn't intentional.
  2. In @brandonkelly variant it gives more logical results that are both suitable for his use case, URL slugs and other cases.

I think we'll merge it.

I am totally for redesigning these APIs in 3.0 btw. While trying to write tests for this method I found a few edge cases where a different behavior would be expected. Also the second parameter does really weird things on some inputs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AstRonin picture AstRonin  Ā·  49Comments

samdark picture samdark  Ā·  63Comments

alexraputa picture alexraputa  Ā·  53Comments

sapsxxxil picture sapsxxxil  Ā·  50Comments

gpoehl picture gpoehl  Ā·  47Comments