Yii2: BaseInflector camel2words behavour changed in 2.0.16

Created on 20 Feb 2019 · 31Comments · Source: yiisoft/yii2

What steps will reproduce the problem?

echo Inflector::camel2words("LEAD");

What is the expected result?

It should echo "Lead"

What do you get instead?

It echos "L E A D"

Additional info

| Q | A
| ---------------- | ---
| Yii version | 2.0.16
| PHP version | 7.1.25
| Operating system | Ubuntu

important

Source

jain2015

👎1 👍1

Most helpful comment

The primary point of camel2words() is to find the _words_ in the given string, without being too pedantic about whether the input was actually camelCase. Evidenced by the current test cases:

(If it were strict about only converting actually-camel-cased text, only the first and last test cases would pass.)

Given that, I think expected behavior should be that the function does its best to recognize the _words_ regardless of casing. And I suspect most would agree that FOO should be considered a single word, and FOOBar should be considered two words.

brandonkelly on 21 Feb 2019

👍3

All 31 comments

Indeed in previous version it returned Lead but it was wrong because description is:

Converts a CamelCase name into space-separated words.
For example, 'PostTag' will be converted to 'Post Tag'.

2.0.16 change has been introduced in https://github.com/yiisoft/yii2/commit/b9aa0001b65400d5986488d2b64c64291dca1eeb#diff-1d93dd129d9cbdd656d16f9f4716e24e but it was not mentioned in UPGRADE file.

bizley on 20 Feb 2019

👍3

@cebe I hope this is considered a BC bug and not just a docs issue. Inflector::camel2words('TEST') should output Test not T E S T. And it has imposed a security bug on Craft, which was using this method to check environment variable names & other things for certain sensitive-sounding words, (e.g. key in S3_KEY).

brandonkelly on 21 Feb 2019

👍1

While I understand the problem from @brandonkelly point of view we've got here this xkcd case and I would suggest using custom method and not change the core Yii helper.

bizley on 21 Feb 2019

👍2

Previously the regex was looking for capital letters _that weren’t preceded by another capital letter_ for this exact reason, so converting TEST => Test was the clearly intended behavior; and since there was no mention of this behavior change in the upgrade guide, I think it’s safe to say that the BC break was not intentional.

brandonkelly on 21 Feb 2019

👍1

It was definitely intentional, there are tests to cover this behavior, and this is how camel case works - capital letters indicates word separators, so TEST should not result Test.

rob006 on 21 Feb 2019

Maybe the method can be adjusted to account for whether the uppercase letter is followed by a lowercase letter. I think that will lead to generally more acceptable results.

Examples:

I’ll give that a shot and update the PR…

brandonkelly on 21 Feb 2019

…and done (https://github.com/yiisoft/yii2/pull/17151/commits/c5d51f42e5981fa5aeba213a026e55a327377532).

brandonkelly on 21 Feb 2019

👍1

How do you explain the (?<![A-Z]) lookbehind in the original regex then? Someone fell asleep on the keyboard?

You should ask author of this regexp ;), but there are no tests for this case and this is not how camel case works. In general this should be true:

Inflector::camel2words(Inflector::camelize('t e s t'), false) === 't e s t'

rob006 on 21 Feb 2019

There was a test case to ensure lower_case gets converted to Lower Case. That’s not strictly camel case either, so I don’t think we should get too hung up on the semantics.

But still, I’m sure something like MyAWSBucket would be considered camel case, and I would expect that to be converted to My Aws Bucket, not My A W S Bucket. Latest commit in #17151 will fix that as well.

brandonkelly on 21 Feb 2019

There was a test case to ensure lower_case gets converted to Lower Case. That’s not strictly camel case either, so I don’t think we should get too hung up on the semantics.

Underscore has no meaning in camel case, capitalized letter has. You're trying to create some own camel case variation which is not even internally consistent (camelize() does not match camel2words() behavior).

rob006 on 21 Feb 2019

You're trying to create some own camel case variation which is not even internally consistent (camelize() does not match camel2words() behavior).

echo Inflector::camel2words('lower_case'); // outputs: Lower Case
echo Inflector::camelize('Lower Case');    // outputs: LowerCase (not lower_case)

brandonkelly on 21 Feb 2019

Why do you expect that Inflector::camelize('Lower Case') will return lower_case?

rob006 on 21 Feb 2019

I don’t. My point is that there’s no precedent for those two functions behaving consistently.

brandonkelly on 21 Feb 2019

Obviously they're not always interchangeable - both transformations are lossy, so you may loss case of letters and type of word separators. What you should not loos is word separator itself. Your PR just make this even worse, since word separators are no longer preserved.

rob006 on 21 Feb 2019

The primary point of camel2words() is to find the _words_ in the given string, without being too pedantic about whether the input was actually camelCase. Evidenced by the current test cases:

(If it were strict about only converting actually-camel-cased text, only the first and last test cases would pass.)

brandonkelly on 21 Feb 2019

👍3

(If it were strict about only converting actually-camel-cased text, only the first and last test cases would pass.)

It does not matter if the input string is not correct camel case, as long as the correct camel case is handled as an actual camel case. Supporting everything else is additional value to the method, it does not affect the main purpose of it. But ignoring case (just a reminder - the case of the letters indicates word separators in camel case, so it is crucial for detecting words) and trying to guess words by using fancy regexp is something completely different - you're breaking the main purpose of this method by introducing some magic behavior. It is only a matter of time when someone hits an edge case when this will not work as expected.

If this is all about ALLCAPS strings, I suggest to lowercase them and then use old regexp to detect words - it should cover LEAD and S3_KEY and does not affect most of the edge cases with actual camel case.

rob006 on 22 Feb 2019

Again I remind you that up until 2.0.16, the regex had a specific check for consecutive capital letters. Whether there was a test case for that use case or not, it was clearly the intent of the function to treat consecutive letters as a single word. And there is no mention of the regression in either the upgrade guide or the commit message (b9aa0001b65400d5986488d2b64c64291dca1eeb):

Fixed yii\helpers\Inflector::camel2words() to work with UTF-8

We can argue about whether the function should or should not be worrying about this until the cows come home, but this is a very clear break in backwards compatibility. Knowing how seriously Yii core devs generally take BC breaks, I’m frankly a bit shocked that we’re even debating this.

brandonkelly on 22 Feb 2019

👍1

We can argue about whether the function should or should not be worrying about this until the cows come home, but this is a very clear break in backward compatibility.

Every other argument aside, the fact this breaks backward compatibility can't be denied, therefore should be considered a regression that should be addressed.

angrybrad on 22 Feb 2019

👍1

@brandonkelly For me, BC break is the only reason we're debating this. Basically, you have a string that obviously is not a camel case. And you're using a method which Converts a CamelCase name into space-separated words. And you're not getting output you're expecting, so you're proposing to change implementation of this method to not treat camel case as camel case and instead try to detect words using regexp based on some arbitrary rules (why 3 consecutive uppercase letters are treated as a word, but 2 consecutive letters is not a word?). The fact that it worked like that before is the only reason why this does not sounds insane. :P But this is pure https://xkcd.com/1172/ - you can't fix incorrect implementation because people rely on it.

rob006 on 22 Feb 2019

👍1

why 3 consecutive uppercase letters are treated as a word, but 2 consecutive letters is not a word?

There’s nothing in the regex that cares about number of characters. Just more closely looking for what looks like actual camelCased text, rather than assuming all capital letters = camelCase.

Pre-2.0.16 behavior: Find all uppercase letters _that aren’t preceded by another uppercase letter_, and put spaces before them.

2.0.16 behavior: Find all uppercase letters and put spaces before them.

My proposed behavior: Find all uppercase letters that aren’t preceded by another uppercase letter _or_ are followed by a lowercase letter, and put spaces before them.

The fact that it worked like that before is the only reason why this does not sounds insane. :P But this is pure https://xkcd.com/1172/ - you can't fix incorrect implementation because people rely on it.

You’re not the first one to link to that here, but it’s apples and oranges. In the comic, we can assume there was no code specifically instructing the CPU to overheat. In this case, the pre-2.0.16 version of the function was purposefully looking for consecutive uppercase characters to treat them as a single word. My assumption is that @SilverFire just didn’t fully understand the consequences when he removed that regex lookbehind in b9aa0001b65400d5986488d2b64c64291dca1eeb, and was only concerned with getting his new test case to pass (where ІЦе should be interpreted as two separate words).

brandonkelly on 22 Feb 2019

You’re not the first one to link to that here, but it’s apples and oranges.

I disagree. Things that signals intentions are:

documentation (PHPDoc),
method name,
tests.

None of these suggests that this method should work in a way you're expecting.

In the comic, we can assume there was no code specifically instructing the CPU to overheat.

If there were no code for that, there would be no CPU overheat. :)

Anyway, I don't see any sense in continuing this discussion. It is not my decision how to deal with that, I just want to point that this method finally works correctly (in a way that name and PHPDoc suggest) and this issue is about reverting valid bugfix (intentional or not). If BC is a reason for this, then so be it (I still hope it will be fixed in Yii 3 and your implementation will be extracted to separate method and cale2words() will handle camel case as camel case), but note that #17151 breaks BC anyway...

rob006 on 22 Feb 2019

I have some other examples where it makes more sense to not split up upper case characters:

generateCSRF -> generate CSRF
findCPU -> find CPU
...

Also we had a similar discussion before: https://github.com/yiisoft/yii2/pull/4166 where we agree on the fact that splitting these does not make sense. It just turned out that no test had been added for camel2words() only for other similar methods.

cebe on 22 Feb 2019

👍2

@cebe These are just bad names - it should be generateCsrf and findCpu. For the same reason why we have Html helper instead of HTML and Json instead of JSON. I was hoping that more strict rules in Inflector will result more consistent and predictable names (including core framework itself).

rob006 on 22 Feb 2019

@rob006 they are acronyms. The comparison does not make sense.

marcelodeandrade on 22 Feb 2019

The Wikipedia entry on Camel Case literally has an example of this mixed-case camel-case format under Programing and Coding:

Programming identifiers often need to contain acronyms and initialisms that are already in uppercase, such as "old HTML file". By analogy with the title case rules, the natural camel case rendering would have the abbreviation all in uppercase, namely "oldHTMLFile".

brandonkelly on 22 Feb 2019

@brandonkelly The same paragraph has explanation why this is problematic and unpractical. Yii already had CJSON episode, I was hopping that we've learned from the mistakes that this is the wrong approach.

rob006 on 22 Feb 2019

Only problematic when there’s a second abbreviation; that doesn’t mean there aren’t cases where it’s valid.

brandonkelly on 22 Feb 2019

Only problematic when there’s a second abbreviation

Or for consecutive one-letter words. For me its is enough to call it unreliable - loos of case will give less harm than loos of word separators.

rob006 on 22 Feb 2019

I've re-checked everything and spoke with @SilverFire who introduced the change.

The change was to add unicode only. Logic change wasn't intentional.
In @brandonkelly variant it gives more logical results that are both suitable for his use case, URL slugs and other cases.

I think we'll merge it.

samdark on 22 Feb 2019

🎉1

I am totally for redesigning these APIs in 3.0 btw. While trying to write tests for this method I found a few edge cases where a different behavior would be expected. Also the second parameter does really weird things on some inputs.

cebe on 22 Feb 2019

👍2

https://github.com/yiisoft/yii-core/issues/175

cebe on 22 Feb 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings