Test:
import strutils except isUpper
import unicode
echo "A B".isUpperAscii() # Returns false
echo "A B".isUpper() # From unicode, Returns false
echo "AB?!".isUpper() # From unicode, Returns false
echo "1, 2, 3 GO!".isUpper() # From unicode, Returns false
Just for precedent, Python, Emacs Lisp return a "true" for upper-case checking of "A B", "AB?!" and "1, 2, 3 GO!".
Just to add, the isUpper documentation says:
Returns true iff s contains all upper case unicode characters.
I believe that all non-alpha characters should be skipped from the check for "upper case unicode characters".
.. because with the current logic.. both of the below return false:
echo "A B".isUpper()
echo "a b".isLower()
.. which seems counterintuitive.
It feels like the isUpper, isLower are conflating their behavior with that of isAlpha.
Seems right to me, only alphabetic characters can be either uppoer or lower
all non-alpha characters should be skipped from the check for "upper case unicode characters".
Why?
echo "A B".isUpper()
echo "a b".isLower()
Your tests contains space, it's non alpha. And this the right logic.
@data-man That's what I said.. the isUpper/isLower checks are conflating with isAlpha.
Bear with me.. here is the reasoning..
Then currently str.isUpperX is behaving like str.isAlpha and str.isUpperY. We need to remove the isAlpha check out of isUpper. All of this applies to isLower too.
When we are dealing with real world text, parsing docs, web pages, we rarely face all alphabetical characters strings. We normally deal with sentences that usually contain at least a space. In that case both isAlpha and isUpper will always be false which is not very useful.
If the implementation is changed to isUpperY, getting the current behavior is as simple as anding with isAlpha. But deriving isUpperY from isUpperX as I need to do today is very hacky.
Another way to think about this.. you would do an uppercase/lowercase check only on alphabets (or characters that can have upper and lower cases). You wouldn't do that check on spaces (and numbers, full stops, etc) as they don't have cases. So such characters should not affect the isUpper, isLower, etc.
Crude analogy follows..
It's like asking a 3-year old if he likes Guinness (beer), and if he says no, you don't order that for the whole table of 20 adults.
That space is being that 3-year old in that "A B".isUpper ().. it's giving vote for something it's not eligible for.
We normally deal with sentences that usually contain at least a space.
No. Passwords and usernames almost always can't contain spaces.
You can implement your MyRightIsUpper for your needs.
No. Passwords and usernames almost always can't contain spaces.
I agree and also almost is the key there. And consider how few cases of parsing such space-less, digit-less, punctuation-less are out there in the huge text parsing world. As I said, you can still get the current behavior by simply anding the proposed isUpperY with isAlpha.
In the upcoming unicode module (RFC #7902) will be possible this:
let myRunes = alphas + spaces + digits + '.' + ','
if check(myString, myRunes):
...
@data-man why did you close this issue? New unicode module wasn't implemented yet
Well since these have been designed with Python compat in mind, they should follow Python's behaviour (as weird as that is to me).
@Yardanico
Do you want to do double work? Well, go ahead!
@data-man what double-work? You would just make a PR with "fixes #7963", and as I can tell we didn't fully decide for the Unicode module RFC yet.
@data-man Just from language semantics, wouldn't you agree that the following statement is all uppercase: "I AM USING ALL CAPS, BUT JUST AS AN EXAMPLE, NOT BEING RUDE."
If someone wants to parse space-less passwords, relying on isUpper for "space-less check " sounds wrong.. they should first use isAlpha, isAlphaNumeric or whatever that meets the valid password chars check.
@kaushalmodi
wouldn't you agree that the following statement is all uppercase: "I AM USING ALL CAPS, BUT JUST AS AN EXAMPLE, NOT BEING RUDE."
Of course, I disagree. Spaces aren't letters.
If someone wants to parse space-less passwords
If someone wants to check whole sentences, he can use split and isUpper.
The current behavior shouldn't change, this is a breaking change with unpleasant consequences for someone.
The current behavior shouldn't change, this is a breaking change with unpleasant consequences for someone.
I have some data sample to disprove that! :D
I can of course see only the Nim code that in public on GitHub..
Github search of public repos containing isUpper in Nim files
This returned just 50 hits in all, many of which are false hits like proc g_unichar_isupper*(c: gunichar): gboolean{.cdecl, dynlib: gliblib,, or just hits in Nim forks.
I started documenting and understanding the use of isUpper in all those, and it got boring quite quickly.. below 13 rows cover the use of isUpper in public Nim code on GitHub in the last 2 years.
As you see my proposal won't break any of those.. in fact it's fixing the py2nim behavior.
| | Line | Update year | User | Will break? | Reason |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-------------------|-------------------------|-------------------------------------------------------------------------|
| 1 | regex.nim | 2018 | @nitely | No | isUpper used on Rune |
| 2 | Nim/pegs.nim | 2018 | @Araq | No | isUpper used on Rune |
| 3 | Nim/unicode.nim | 2018 | @Araq | No/Yes? | This is what we are fixing here! |
| 4 | bob/example.nim | 2018 | @amscotti | No | isUpper used on Rune |
| 5 | unicodeplus.nim | 2018 | @nitely | No | N/A - That library implements its own isUpper, again for Rune |
| 6 | string_methods.nim | 2018 | @metacraft-labs | It will actually fix it | Because that library translates Python isupper to Nim isUpperAscii. |
| 7 | builtin_methods.nim | 2018 | _same as above_ | _same as above_ | _same as above_ |
| 8 | NimMirror/tpegs.nim | 2017 | @georgiy-pruss | N/A -- Nim Mirror | |
| 9 | NimMirror/unicode.nim | 2017 | _same as above_ | _same as above_ | |
| 10 | NimMirror/strutils.nim | 2017 | _same as above_ | _same as above_ | |
| 11 | NimMirror/pegs.nim | 2017 | _same as above_ | _same as above_ | |
| 12 | uetypes.nim | 2017 | @pragmagic | No | isUpper (deprecated version from strutils) used on char |
| 13 | another Nim mirror/tpegs.nim | 2016 | @FedericoCeratto | N/A -- Nim Mirror | |
NOTE: My proposal is to fix isUpper, isUpperAscii, isLower, isLowerAscii only with the string input, not char or Rune input. See this summary: https://scripter.co/notes/string-fns-nim-vs-python/#is-upper -- The Nim implemention of the above 4 deviates from Python only for string input, and that too, only for strings that contain alphabetical chars too.
(Below "WRONG" is with respect to Python implementation.)
echo "[Wrong] ", "A B".isUpperAscii() # Returns false! BUG https://github.com/nim-lang/Nim/issues/7963
echo "[Wrong] ", "A B".isUpper() # from unicode, Returns false! BUG https://github.com/nim-lang/Nim/issues/7963
echo "[Wrong] ", "AB?!".isUpper() # from unicode, Returns false! BUG https://github.com/nim-lang/Nim/issues/7963
echo "[Wrong] ", "1, 2, 3 GO!".isUpper() # from unicode, Returns false! BUG https://github.com/nim-lang/Nim/issues/7963
echo ' '.isUpperAscii() # checking this proc on a non-alphabet char
echo ' '.Rune.isUpper() # checking this proc on a non-alphabet Rune
echo "(*&#@(^#$ ".isUpper() # checking this proc on a non-alphabet string
Output:
[Wrong] false
[Wrong] false
[Wrong] false
[Wrong] false
false
false
false
@data-man
Of course, I disagree. Spaces aren't letters.
That's a good excuse for violating general email/forum etiquette of not using all-caps :D
If someone wants to check whole sentences, he can use split and isUpper.
In real-world parsing, sentences don't have just spaces.. you have to consider commas, fullstops, other punc chars, digits, spaces with variety of ascii codes, hyphens with variety of ascii codes, so much more.
My point is plain and simple.. the current isUpper (and others) implementation is too specialized.. my proposal allows isUpper to be better composable with other functions.
isUpper (my proposed version).and that with isAlpha.I can of course see only the Nim code that in public on GitHub.
What about a private repos, GitLab, BitBucket, etc.?
P.S. I'll not argue anymore, because it's pointless.
P.S. I'll not argue anymore, because it's pointless.
I have the same feeling. I am hoping that one or more of the folks using isUpper that I pinged in my above analysis comment chimes in with their opinion.
What about a private repos, GitLab, BitBucket, etc.?
I cannot help with that, and so I made it clear about "public GitHub repos". But if anything, at least from this data-sample, I proved my understanding of current isUpper implementation:
isUpper elegantly for strings. So they resort to hacks or using just str.toUpper == str to mimick isUpper. See http://exercism.io/tracks/nim/exercises/bob to see what I mean.isUpper and family for chars and Runes.isAlpha to get back the current behavior.isupper: the case where a string (not char or Rune) contains at least one alphabetical char with optional non-alphabetical chars. But fixing this will make str.isUpper Do The Right Thing (TM), and you will see isUpper being then used for strings too in the wild.I'll rest my arguments till we get more opinions; I'm now just repeating all the points I made earlier.
We collect breaking changes and call the new thing version 0.19. Please fix it to be compatible with Python, these have been added for Python compat in the first place!
Nim's native way is s.allCharsInSet({'a'..'z', ' '}) etc
IMO:
doAssert "I AM SHOUTING".isUpper
Should hold. Especially if it's consistent with Python (which from what I can tell it is?)
@Araq
Please fix it to be compatible with Python, these have been added for Python compat in the first place!
Thank you!
@dom96
doAssert "I AM SHOUTING".isUpper
Should hold
Exactly! Below fails the assertion currently.
import unicode
doAssert "I AM SHOUTING".isUpper
Here are the tests for isUpperAscii and isLowerAscii that should pass after this behavior change:
doAssert isLowerAscii("a b")
doAssert isLowerAscii("ab?!")
doAssert isLowerAscii("1, 2, 3 go!")
doAssert(not isLowerAscii("(*&#@(^#$ ")) # None of the string chars are alphabets
doAssert isUpperAscii("A B")
doAssert isUpperAscii("AB?!")
doAssert isUpperAscii("1, 2, 3 GO!")
doAssert(not isUpperAscii("(*&#@(^#$ ")) # None of the string chars are alphabets
tagging @krux02 since I also had some ideas on where strutils should go forward (the trash? :P).
For me spaces shouldn't false isUpper or isLower, otherwise those are completely useless.
@kaushalmodi I think a better metric would be checking what other languages do. I doubt many libraries make use of isUpper/isLower, the only time I used them was for case swapping, and that was 'cause there is no swapCase(Rune). That said, I think this should be always valid: assert isUpper(toUpper(mystring)) == true which currently is not true.
That said, I think this should be always valid: assert isUpper(toUpper(mystring)) == true which currently is not true.
That is NOT true if we copy Python's design! Because isUpper("") is false.
Right, in python the string must contain at least one cased character, which makes sense, otherwise "123" would also return true. But Nim doesn't has to follow python, see what other languages with a rich stdlib do.
Same for isUpper(";: "), there are plenty of special cases here...
I realized that and edited my comment.
see what other languages with a rich stdlib do
Python is the only language I personally know that has isUpper/isLower for strings in its standard library (perhaps it's forced to since it's dynamic). Other languages only have it for chars and codepoints. In Apache Commons Lang however the StringUtils class has isAllUpperCase/isAllLowerCase methods that do what the Nim methods do currently. As bloaty as it is, my suggestion would be to change isUpper/isLower behaviour and add isAllUpper/isAllLower for the old behaviour.
@hlaaftana I didn't think that would be the case, but you are right, I checked: go, java, rust, swift, ruby, C++, C# and none of them has isUpper for strings, only for code points :open_mouth:
@nitely
Julia has is* for strings.
With the correct behavior. :-)
@data-man Except they're deprecated now with all(isupper, str) being the alternative.
IMO if those methods are not compatible with python they might as well be removed, they are nearly useless anyway. The fact that not even a single statically typed language has been found that implements this proc should be an indication that it might not be that useful...
No. Passwords and usernames almost always can't contain spaces.
In the entire discussion about this, I think this is the only actual use case that has been mentioned for either of the implementations, and this use case seems strange. Why would a website care if my username only contains uppercase letter? Passwords sometimes has rules, but they are typically much more complex than just "not entirely uppercase" (and fwiw, it's very uncommon that passwords can't contain whitespace).
@data-man Except they're deprecated now with all(isupper, str) being the alternative.
That's what Nim should do as well IMO. There are like ~20 procs in strutils that are just bloat on top of allCharsInSet and the in operator.
Well since I got tagged here, I will also provide my two cents. I don't like the idea to copy Python semantics just for the sake of copying python semantics. If Python has reasonable design decisions behind it's library and those dosign decisions do not stand in conflict with priorities that you would want in a statically typed compiled language, then yes I think it is a good idea to design the nim library around an established system.
So please provide different reasoning than "this is how it is done in python". For me isUpper(' ') == falseis totally fine, because a space is neither upper case nor lower case. So also isUpper("A B") is false, because not every character in that string is strictly upper case. What you want to check semantically is "does not contain lower case letters". so maybe something like this is what you are looking for: {'a'..'z'} notin "A B". Maybe you want to have some predefined sets, too. So lowercaseASCII notin "A B".
@krux02
So also isUpper("A B") is false
see https://github.com/nim-lang/Nim/issues/8600 in which I argue we can completely avoid this senseless debate by deprecating isUpper(string) and only have isUpper on char, Rune
Most helpful comment
Well since I got tagged here, I will also provide my two cents. I don't like the idea to copy Python semantics just for the sake of copying python semantics. If Python has reasonable design decisions behind it's library and those dosign decisions do not stand in conflict with priorities that you would want in a statically typed compiled language, then yes I think it is a good idea to design the nim library around an established system.
So please provide different reasoning than "this is how it is done in python". For me
isUpper(' ') == falseis totally fine, because a space is neither upper case nor lower case. So also isUpper("A B") is false, because not every character in that string is strictly upper case. What you want to check semantically is "does not contain lower case letters". so maybe something like this is what you are looking for:{'a'..'z'} notin "A B". Maybe you want to have some predefined sets, too. SolowercaseASCII notin "A B".