Angular.js: Why does Angular need its own angular.uppercase and lowercase methods?

Created on 21 Mar 2015 · 16Comments · Source: angular/angular.js

What's wrong with the standard toUpperCase and toLowerCase methods?
Looks like there was some issue with the Turkish locale, but does it still exist in the browsers supported by Angular? I asked about it on StackOverflow, but didn't get an answer.

misc core low investigation inconvenient

Source

thorn0

Most helpful comment

I doubt it was pulled from thin air - that is not how the Angular team functions.

wesleycho on 21 Mar 2015

😄2

All 16 comments

The reason is documented in a source comment: https://github.com/angular/angular.js/blob/e5d1d6587d1f38898bca673a829a0bc0ffaaf2c8/src/Angular.js#L161-L163

It's also documented on MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleLowerCase

Lastly, I think the StackOverflow answers were pretty accurate.

realityking on 21 Mar 2015

❤1

Neither the comment nor the SO answers are informative.
The question is

In which exactly JS engines are toLowerCase() & toUpperCase() locale-sensitive?"

How do those answers answer it? How does the source comment answer it? Moreover, the comment and the answers contradict each other. In theory, toLowerCase() & toUpperCase() shouldn't be locale-sensitive. That's what one of the answers says. And your MDN link says the same thing. But the comment states something very opposite:

String#toLowerCase and String#toUpperCase don't produce correct results in browsers with Turkish locale

Again, in which exactly browsers does this happen?

thorn0 on 21 Mar 2015

Searching on Google, this seems like a general problem due to the Turkish language itself - this article seems to explain it: http://www.i18nguy.com/unicode/turkish-i18n.html

If I had to guess, this is a problem with every browser.

wesleycho on 21 Mar 2015

The general problem exists, that's true. But who said that it (still) exists also in JavaScript? Who faced it during say last 10 years?

thorn0 on 21 Mar 2015

I doubt it was pulled from thin air - that is not how the Angular team functions.

wesleycho on 21 Mar 2015

😄2

The hacks and workarounds for IE8 are being removed from the code of Angular now. But this thing looks like a workaround for a bug in some even older browser.

thorn0 on 21 Mar 2015

@thorn0 why not testing on your end (you seem to be from Turkey) and sending a PR to remove those work-arounds if no longer necessary? I'm pretty sure that everyone would be more than happy to remove unnecessary code.

pkozlowski-opensource on 21 Mar 2015

I wanted to know why this code is there. If the issue still exists (if ever existed) in some browser, it'd be nice to know about it. I'm going to check it in different browsers, but of course I don't have access to all the needed OS+device+browser combinations.

thorn0 on 21 Mar 2015

@thorn0 AFAIK Turkish locale was the only / main reason. So if you can confirm that this bug doesn't affect users of modern browsers and submit a PR to see if all the tests are passing on CI this could potentially be merging. But as you are saying, no one will be able to re-test this on all the possible devices in the wild, so we might potentially introduce a breaking change here....

pkozlowski-opensource on 21 Mar 2015

BTW, another library containing code like that is Google Closure Library. See these lines. Supposedly, it came to Angular from there.

thorn0 on 21 Mar 2015

I think this can be split into two issues

Why does Angular need an implementation of toUpperCase and toLowerCase
Why does Angular needs to have a public method that exposes this

The former point should be easy to decide if someone can verify that this is no longer an issue with the supported browsers
The later would be hard to remove, but I would be ok to deprecate it (even if it still works)

lgalfaso on 30 Mar 2015

In case it helps, the code was added to the angular.js file by @mhevery in Oct 2010 as part of this commit "create HTML sanitizer to allow inclusion of untrusted HTML in safe manner".

Everything that I could find on the topic only mentioned this being a problem in Java. The ECMAScript standards (3 and 5) state that the toUpperCase and toLowerCase functions are _not_ locale sensitive and the toLocaleUpperCase and toLocaleLowerCase functions _are_ locale sensitive.

So it doesn't seem to be required, but I don't know how to test on multiple browsers in the Turkish locale.

ryanhart2 on 4 Oct 2015

I think the right thing to do for now would be just undocumenting angular.uppercase and angular.lowercase without removing them.

thorn0 on 24 Oct 2015

The basic issue is that when malicious code writes SCRIPT we need to detect it and take it out. The way we do it is to do toLowerCase on it, but that will produce scrıpt (notice no dot on ı) in some locales and then 'scrıpt' === 'script' fails which means that malicious code gets by the sanitizer.

If you can prove that in JS "I".toLowerCase() === "i" true on all locals, than the code is safe to remove. This code was added on request of google security team.

mhevery on 21 Mar 2017

I notice that there are various places in the sanitizer that "don't" use our custom lowercase. E.g. https://github.com/angular/angular.js/blob/master/src/ngSanitize/sanitize.js#L378
Does this need to be fixed?

petebacondarwin on 21 Mar 2017

As far as I can tell, AngularJS and the Closure Library are the only libraries that include this workaround, and nowhere else on the Internet is it mentioned that this problem ever existed in the JS world, only in Java.

thorn0 on 21 Mar 2017

Was this page helpful?

0 / 5 - 0 ratings