Toml: Why are keys case sensitive?

Created on 4 Dec 2017 · 11Comments · Source: toml-lang/toml

Is there any use case where this isn't ambiguous? I understand the values being case sensitive as it makes perfect sense, however for keys I struggle to find a rationale.

Source

mterron

Most helpful comment

Mostly because I prefer case sensitivity as I believe it reduces sloppiness and confusion. But also since we have quoted keys, it would be strange to have those be case-insensitive (since they are essentially literals) and also open up a huge can-of-worms about what UTF-8 characters are upper and lower case of which other UTF-8 characters.

mojombo on 4 Dec 2017

👍4

All 11 comments

mojombo on 4 Dec 2017

👍4

Can't argue with that, a preference is a preference.

The usual example I give my dev team is: When I ask you to "pass me the wine", did I say wine in lowercase or uppercase or mixed case? Is there any situation where it would matter?

Cheers and thanks for replying!

mterron on 5 Dec 2017

@mterron wrote:

Is there any situation where it would matter?

Yes. Did you ask for the wine, or the WİNE? Or maybe you asked for the wıne, or the WINE.

If you haven't spotted the problem yet, that's because you don't live in Turkey, where the uppercase of wine is not WINE, but WİNE. This is just one small part of the "can of worms" that @mojombo was referring to. There are lots of others — but the Turkish ı character (dotless i) is the one that everyone runs into first when they make assumptions like "uppercase and lowercase work the same way all around the world". It would be nice if that was true, but it's not true — and so case-sensitivity is the simplest thing to do when you're dealing with Unicode. If you want to treat Unicode text in a case-insensitive way, you should not be trying to roll your own solution, but use a library like ICU which has already worked out all of those weird special cases.

rmunn on 7 Dec 2017

Ooh, the good old Turkish "i/I" argument. It's a good strawman however how many real life applications use Turkish names for their configuration key names? I've never run into any application that had configuration keys using anything but 7 bit ASCII in the 29 years I've been dealing with computers.

Thanks for commenting anyway @rmunn , I appreciate the input.

mterron on 7 Dec 2017

... how many real life applications use Turkish names for their configuration key names?

Only those written in Turkey, which is a small (but non-zero) number. But the question you should be asking is how many real-life applications are used in Turkey. Because the system locale, not the language of the string, is what matters. If the user's computer is set to the Turkish locale, then the uppercase of wine will be WİNE, and when your naïve program compares that to WINE then it will fail.

If you haven't read the http://www.i18nguy.com/unicode/turkish-i18n.html link I put in my previous comment, you need to. You are currently ignorant of a rather large problem, and your code will have unexpected bugs when used in Turkey, a country of about 80 million people.

rmunn on 7 Dec 2017

❤1

@rmunn I'm not a native English speaker, I've worked in code bases in at least 3 different national languages (not counting English) and I have never seen a single instance of identifiers using anything but plain old ASCII, I'm pretty sure the situation is Turkey is exactly the same and could work perfectly fine with the case folding rules for ASCII. In fact that was pretty much my point, UNICODE seems overkill for the use case (configuration key names).
I know it's not the end all and be all of code, but the following search in GitHub: https://github.com/search?q=%C4%B1&type=Code&utf8=%E2%9C%93 can't find a single instance of the ı (dot less lowercase i) being use in code in any project. The results are the same for the İ (dotted uppercase i).

Anyway @mojombo gave me a perfectly acceptable answer "Because I like it that way".

Thanks for participating in the discussion and highlighting the unique situation of the Turkish alphabet.

mterron on 7 Dec 2017

Anything that's case-insensitive feels sloppy (I can't believe there's filesystems that allow such), so I also prefer it this way.

tshepang on 8 Dec 2017

simple
unicode
case-insensitive

Choose any two.

lmna on 8 Dec 2017

Choose any two.

I agree and I'd vote for TOML to be simple and unicode. :)

pradyunsg on 8 Dec 2017

Everyone here agrees that the current position is opinion based and they're all fine with it/in support of it. If anyone does not agree, I think it's best to make a new issue for that discussion.

pradyunsg on 8 Dec 2017

Just for completeness' sake, this is Unicode official word on case folding:
https://www.unicode.org/versions/Unicode10.0.0/ch03.pdf#G33992 in particular Page 156 discusses the specific subject of case insensitive identifier comparison.

mterron on 10 Dec 2017

Was this page helpful?

0 / 5 - 0 ratings