Having see #3745, I thought it might be interesting to do something similar for string functions:
This is a comparison of string functions, with the aim of helping to identify if any functions are missing that may need to be added, etc. (in preparation for Nim V1.0)
Don't assume I correctly identified the Nim string functions (I may have missed some, or are available in other Nimble packages). Functions are in strutils.nim unless otherwise specified.
| Python Function | Nim Function |
| --- | --- |
| str[:-1] | str[0..<str.high] # Should be str[..^1], but I can't get this to work |
| str[1:] | str[1..str.high] # Should be str[1..], but I can't get this to work |
| str[1:-1] | str[1..<str.high] # Should be str[1..^1], but I can't get this to work |
| str.capitalize | capitalize |
| str.center | center (from Nim 0.15) |
| str.count | count |
| str.decode | |
| str.encode | |
| str.endswith | endsWith (for use with chars rather than strings, requires Nim 0.15 or greater) |
| str.expandtabs | expandTabs (from Nim 0.15) |
| str.find | find |
| str.format | format but nowhere near as powerful as python's version |
| str.index | find |
| str.isalnum | isAlphaNumeric |
| str.isalpha | isAlpha |
| str.isdigit | isDigit |
| str.islower | isLower |
| str.isspace | isSpace |
| str.istitle | unicode.isTitle (from Nim 0.15) |
| str.isupper | isUpper |
| str.join | join |
| str.ljust | _done manually with repeatChar according to Nim docs, although it is deprecated_ |
| str.lower | toLower |
| str.lstrip | strip(s, leading=false) |
| str.partition | strmisc.partition (from Nim 0.15) |
| str.replace | replace |
| str.rfind | rfind |
| str.rindex | rfind |
| str.rjust | align |
| str.rpartition | strmisc.rpartition (from Nim 0.15) |
| str.rsplit | rsplit (from Nim 0.15) |
| str.rstrip | strip(s, trailing=false) |
| str.split | split |
| str.splitlines | splitLines |
| str.startswith | startsWith for strings, str[0] == c for characters |
| str.strip | strip |
| str.swapcase | unicode.swapCase (from Nim 0.15) |
| str.title | unicode.title (from Nim 0.15) |
| str.translate | No direct equivalent (strutils.translate is for word translation). Closest equivalent is parallelReplace in pegs or re modules. |
| str.upper | toUpper |
| str.zfill | align(padding='0') |
This is brilliant! Thank you for doing this. It's exactly what I want to be done for the standard library before I can safely say "it's now 1.0 ready".
In addition to this it would be nice to think about procedures which are currently in the standard library but perhaps shouldn't be, or their API/semantics are bad.
Some things I've noticed:
str[:-1] -> str[.. ^1], similar for other slicing examplesstr.format -> I would say that % is equivalent: "Foo bar: $1 $2" % [23, 42]str[:-1] -> str[.. ^1], similar for other slicing examples
That's really useful: thank you. I was trying to figure out a tidy way of doing it and obviously my documentation searching failed.
str.format -> I would say that % is equivalent: "Foo bar: $1 $2" % [23, 42]
Personally I'd disagree with this - it is similar, but nowhere near as capable:
s = "{greet} {name}, the square root of {x:d} is {rtx:0.4f}".format(
name="Dominik",
x=40,
rtx=math.sqrt(40),
greet="Hello"
)
I'll edit the list to make it slightly less provocative though...
Personally I'd disagree with this - it is similar, but nowhere near as capable:
Oh. Didn't realise Python's was that powerful :)
Also (python's % notation + dictionaries):
s = "%(greet)s %(name)s, the square root of %(x)d is %(rtx)0.4f" % {
'name': "Dominik",
'x': 40,
'rtx': math.sqrt(40),
'greet': "Hello"
}
Oh (probably going off topic), does Nim's format/% support any type of bracketing?
#!/usr/bin/python
"Hello {1}, {0} multiplied by 10 is {0}0".format(7, "Dominik")
# Hello Dominik, 7 multiplied by 10 is 70
s = "Hello $2, $1 multiplied by 10 is $10" % [$7, "Dominik"]
# Error: unhandled exception: invalid format string [ValueError]
s = "Hello $2, $1 multiplied by 10 is $1 0" % [$7, "Dominik"]
# Hello Dominik, 7 multiplied by 10 is 7 0
s = "Hello $name, $number multiplied by 10 is $number0" % ["number", $7, "name", "Dominik"]
# Error: unhandled exception: invalid format string [ValueError]
@dom96 I'm trying your example
str[:-1] -> str[.. ^1], similar for other slicing examples
var str = "abcdef"
echo str[.. ^1]
Without the space between .. and ^1, I get a compiler error:
test.nim(2, 10) Error: type mismatch: got (int literal(1))
but expected one of:
system...^(a: untyped, b: untyped)
With the space, it prints "abcdef" (not "abcde" as expected).
I've also tried a suggestion from @Araq: using setLength to shorten a string:
var str = "abcdef"
str.setLength(str.len-1)
echo str
This produces:
test.nim(2, 4) Error: attempting to call undeclared routine: 'setLength'
Any suggestions?
It's called setLen.
It's called
setLen.
Ah: thank you (and sorry for not having found it myself).
I can add most of these functions to Nim in a pull request, along with proper comments and tests. I have some free time today :)
@dom96 For a PR, would you prefer a separate commit for each proc with tests or one big commit with all procs and tests that I have added?
Don't really mind, whichever is easiest. All of these procs are pretty similar so it's fine to put them in the same commit I think.
For string.translate, would it be acceptable to import tables? I don't really want to introduce a dependency like that just for that proc. If it's not acceptable, where should string.translate go?
Is translate char (byte) based, or something else?
Here is the method signature I was thinking of:
proc translate(s: string, dictionary: TableRef[string, string]): string
The "translate" here would be used for translating words in s to other languages.
Use this signature instead:
proc translate(s: string, replacements: proc (key: string): string): string
@Araq, that's a good idea, I hadn't thought of it. That makes however the user wants to implement it work regardless. Very elegant :)
@Varriount The reason I thought a table would be necessary is because of lookup speed. But Araq's comment makes more sense to me now.
For isUpper, toUpper, etc, would anyone mind if I converted all of those to be unicode compatible and move them to the unicode module?
I wouldn't remove the versions that already exist, however I would mark them as deprecated.
Right, that makes sense.
@Araq, for the translate proc, can a lambda type declare that it can't have side effects?
@Varriount and @Araq, how would I go about referencing the new unicode procs inside strutils (or vice-versa) for deprecated isUpper, toUpper, etc? Is there a special way to reference deprecations from other modules?
You wouldn't reference them, except in the documentation. If I recall correctly, you can use a bare {.deprecated.} pragma
For isUpper, toUpper, etc, would anyone mind if I converted all of those to be unicode compatible and move them to the unicode module?
I would. I use strutils.toUpper etc because I know what I'm doing and I like the improved speed and code size. But that's just another fight I will lose and something "Araq's lib" will offer.
@Araq, if I simply deprecate them, you'd still be able to use them, right? I'm a bit confused because this comment directly contradicts your earlier comment about making things Unicode compatible. I simply want to contribute something that will be of use to the community.
There are very solid use cases for each version of toUpper and similar functions. Working correctly with unicode is much slower, and may not be what you want either. D solution for that, for example, is to have a toUpper in the unicode module, and another toUpper in the ascii module.
Also, for string formatting, there is strfmt that is very similar to python .format(). It was suggested many times to be included in the standard library, but I don't know the status of the discussion. It would also introduce a second or third style for formating strings, which isn't really ideal.
@jyapayne May I suggest some additional stuff for the replace functionality that I think would be useful:
replace (and maybe alias it to replaceFirst too)replaceAllreplaceLastAlso, it would be nice to add a version of replace in the re module that accepts a callback so you can do your custom replace logic in a function (that's super useful in Scala, for example).
@ReneSac That's a good idea as well. I'm okay with having both at the same time. I would have to correct some libraries that import both unicode and strutils, but if @Araq is okay with it, it's fine with me :)
@johnnovak I'm not sure what the expectation is with Nim, but an idea from Python's replace function could be that it has a count default parameter (set to 1) that will replace count occurrences of the word in s. If set to -1, then it could replace all occurrences. That would cover both replace and replaceAll with one function. As for replaceLast, maybe replaceRight or some other name that would replace starting from the right side of the string with similar parameters as replace?
The callback sounds like a useful idea as well :)
@jyapayne I like your replace & replaceRight idea with an optional count parameter. I agree that it makes sense to be consistent with the rest of the string functions that seem to follow the Python conventions.
Yeah, when you need that callback regexp replace thingy, it _really_ comes in handy鈥攆or example, when you want to append additional query parameters to all URLs in a block of text (that was my specific use when I discovered it in the standard Scala lib).
I've updated the table to reflect the changes made in PR #4276.
I am using the devel version of Nim as of today and expandTabs is in strmisc instead of strutils.
@dom96
For me, I found str[.. ^2] in Nim to be equivalent to str[:-1] in Python. Also I had to put a space between .. and ^2 as @abudden also found out.
Similarly, str[1.. ^0] and str[1.. ^1] works the same as str[1:] in Python. So what's the difference between ^0 and ^1?
If I import both unicode and strutils package, how do I specify that I want to use the isUpper proc from unicode, and not the deprecated version in strutils?
Example:
import strutils, unicode
echo "A".isUpperAscii()
echo "脌".isUpper()
Above gives:
Error: ambiguous call; both strutils.isUpper(s: string)[declared in lib/pure/strutils.nim(322, 5)] and unicode.isUpper(s: string)[declared in lib/pure/unicode.nim(1406, 5)] match for: (string)
UPDATE:
OK. this works:
echo unicode.isUpper("脌")
Though, the earlier STR.PROC style won't work now..
echo "脌".unicode.isUpper() # Does not work
Hello all, I have taken the comparison table in this thread and written a detailed post with full code and output about string functions between Nim and Python: https://scripter.co/notes/string-functions-nim-vs-python
Looking forward to comments and corrections.
@kaushalmodi there's strfmt module in nimble for enhanced string formatting
@kaushalmodi and don't forget to note that Nim is compiled and statically typed while python is interpreted and dynamic :)
@kaushalmodi
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str[0.. <str.high]
# or
echo str[.. ^2]
Why ^2?
^1 works perfectly for me
@kaushalmodi about second example - it's because you can access null terminator which is the end of any string
@kaushalmodi also - there IS "encode" and "encode" like functions in nim - take a look at "encodings" module - convert procedure can convert from one encoding to another
@kaushalmodi also
echo "脌".unicode.isUpper() # Does not work
does not work because you're trying to find some procedure named unicode which can accept string.
You can use isUpper() without "unicode" as long as you don't import strutils
@TiberiumN
there's strfmt module in nimble for enhanced string formatting
Thanks, I'll need to do some digging on that.
and don't forget to note that Nim is compiled and statically typed while python is interpreted and dynamic :)
Of course, the point of this post is to just compare string functions between the two languages.
^1 works perfectly for me
It doesn't for me, at least on Linux 64-bit, Nim build from devel as of yesterday. This is what ^2 gives me, and this is what ^1/^0 give me. The results you see on my blog post are calculated live using Org Babel. So I did not manually paste the code and results separately.
about second example - it's because you can access null terminator which is the end of any string
Yes, but it's still not clear as to when to use str[1.. ^0] vs str[1.. ^1].
also - there IS "encode" and "encode" like functions in nim - take a look at "encodings" module - convert procedure can convert from one encoding to another
Thanks. I'll take a look.
You can use isUpper() without "unicode" as long as you don't import strutils
That's not very practical. Based on what I see, strutils looks like a must-have module in any Nim code where I am doing even basic string manipulation. So it will be unlikely for a scope to have unicode imported, but not strutils.
Thanks for your comments! I was beginning to think no one was reading this Issue thread :)
@TiberiumN Which is the official strfmt? I found lyro/strfmt and rgv151/strfmt.
Update: Looks like lyro/strfmt is the official version based on nimble install strfmt (my first nimble installed module) :)
@kaushalmodi "Yes, but it's still not clear as to when to use str[1.. ^0] vs str[1.. ^1]"
When you need to access null terminator - you use ^0, when you don't need to do it - you do ^1.
But you almost never need to access it, so just use ^1 :)
Here is how you should do it:
import strutils except isUpper
import unicode
echo "脌".isUpper()
Btw, thanks for writing that blog post! Always love to see new blog posts about Nim :)
Some feedback:
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx" # echo str[1..] Does not work.. Error: expression expected, but found ']'
This looks like a bug to me, I reported it :)
You shouldn't be showing all of the different ways to write startsWith (and the other procs), the convention is startsWith so stick to it!
@TiberiumN @dom96 Thanks for your feedback! I have updated my post with those.
@Araq, @dom96 can this be closed?
str.encode and str.decode - we have module "encodings" which AFAIK does the same thing.
yeah, we don't have str[:-1], str[1:]
Instead we use
str[0..^2]
str[1..^1]
and for str[1:-1] we use str[1..^2]
Is it really possible to implement something like str[..^1] ?
Also we now have better "translate"-like procedure - strutils.multiReplace
With the new strformat it seems on par with Python. Closing.