Nim: RFC: Nim String Functions

Created on 25 May 2016 · 46Comments · Source: nim-lang/Nim

Having see #3745, I thought it might be interesting to do something similar for string functions:

This is a comparison of string functions, with the aim of helping to identify if any functions are missing that may need to be added, etc. (in preparation for Nim V1.0)

Don't assume I correctly identified the Nim string functions (I may have missed some, or are available in other Nimble packages). Functions are in strutils.nim unless otherwise specified.

| Python Function | Nim Function |
| --- | --- |
| str[:-1] | str[0..<str.high] # Should be str[..^1], but I can't get this to work |
| str[1:] | str[1..str.high] # Should be str[1..], but I can't get this to work |
| str[1:-1] | str[1..<str.high] # Should be str[1..^1], but I can't get this to work |
| str.capitalize | capitalize |
| str.center | center (from Nim 0.15) |
| str.count | count |
| str.decode | |
| str.encode | |
| str.endswith | endsWith (for use with chars rather than strings, requires Nim 0.15 or greater) |
| str.expandtabs | expandTabs (from Nim 0.15) |
| str.find | find |
| str.format | format but nowhere near as powerful as python's version |
| str.index | find |
| str.isalnum | isAlphaNumeric |
| str.isalpha | isAlpha |
| str.isdigit | isDigit |
| str.islower | isLower |
| str.isspace | isSpace |
| str.istitle | unicode.isTitle (from Nim 0.15) |
| str.isupper | isUpper |
| str.join | join |
| str.ljust | _done manually with repeatChar according to Nim docs, although it is deprecated_ |
| str.lower | toLower |
| str.lstrip | strip(s, leading=false) |
| str.partition | strmisc.partition (from Nim 0.15) |
| str.replace | replace |
| str.rfind | rfind |
| str.rindex | rfind |
| str.rjust | align |
| str.rpartition | strmisc.rpartition (from Nim 0.15) |
| str.rsplit | rsplit (from Nim 0.15) |
| str.rstrip | strip(s, trailing=false) |
| str.split | split |
| str.splitlines | splitLines |
| str.startswith | startsWith for strings, str[0] == c for characters |
| str.strip | strip |
| str.swapcase | unicode.swapCase (from Nim 0.15) |
| str.title | unicode.title (from Nim 0.15) |
| str.translate | No direct equivalent (strutils.translate is for word translation). Closest equivalent is parallelReplace in pegs or re modules. |
| str.upper | toUpper |
| str.zfill | align(padding='0') |

RFC

Source

abudden

👍5

All 46 comments

This is brilliant! Thank you for doing this. It's exactly what I want to be done for the standard library before I can safely say "it's now 1.0 ready".

In addition to this it would be nice to think about procedures which are currently in the standard library but perhaps shouldn't be, or their API/semantics are bad.

Some things I've noticed:

str[:-1] -> str[.. ^1], similar for other slicing examples
str.format -> I would say that % is equivalent: "Foo bar: $1 $2" % [23, 42]

dom96 on 25 May 2016

👍1

str[:-1] -> str[.. ^1], similar for other slicing examples

That's really useful: thank you. I was trying to figure out a tidy way of doing it and obviously my documentation searching failed.

str.format -> I would say that % is equivalent: "Foo bar: $1 $2" % [23, 42]

Personally I'd disagree with this - it is similar, but nowhere near as capable:

s = "{greet} {name}, the square root of {x:d} is {rtx:0.4f}".format(
        name="Dominik",
        x=40,
        rtx=math.sqrt(40),
        greet="Hello"
        )

I'll edit the list to make it slightly less provocative though...

abudden on 25 May 2016

Personally I'd disagree with this - it is similar, but nowhere near as capable:

Oh. Didn't realise Python's was that powerful :)

dom96 on 25 May 2016

Also (python's % notation + dictionaries):

s = "%(greet)s %(name)s, the square root of %(x)d is %(rtx)0.4f" % {
        'name': "Dominik",
        'x': 40,
        'rtx': math.sqrt(40),
        'greet': "Hello"
        }

abudden on 25 May 2016

Oh (probably going off topic), does Nim's format/% support any type of bracketing?

#!/usr/bin/python
"Hello {1}, {0} multiplied by 10 is {0}0".format(7, "Dominik")
# Hello Dominik, 7 multiplied by 10 is 70

s = "Hello $2, $1 multiplied by 10 is $10" % [$7, "Dominik"]
# Error: unhandled exception: invalid format string [ValueError]
s = "Hello $2, $1 multiplied by 10 is $1 0" % [$7, "Dominik"]
# Hello Dominik, 7 multiplied by 10 is 7 0
s = "Hello $name, $number multiplied by 10 is $number0" % ["number", $7, "name", "Dominik"]
# Error: unhandled exception: invalid format string [ValueError]

abudden on 25 May 2016

@dom96 I'm trying your example

str[:-1] -> str[.. ^1], similar for other slicing examples

var str = "abcdef"
echo str[.. ^1]

Without the space between .. and ^1, I get a compiler error:

test.nim(2, 10) Error: type mismatch: got (int literal(1))
but expected one of:
system...^(a: untyped, b: untyped)

With the space, it prints "abcdef" (not "abcde" as expected).

I've also tried a suggestion from @Araq: using setLength to shorten a string:

var str = "abcdef"
str.setLength(str.len-1)
echo str

This produces:

test.nim(2, 4) Error: attempting to call undeclared routine: 'setLength'

Any suggestions?

abudden on 31 May 2016

It's called setLen.

Araq on 31 May 2016

It's called setLen.

Ah: thank you (and sorry for not having found it myself).

abudden on 1 Jun 2016

I can add most of these functions to Nim in a pull request, along with proper comments and tests. I have some free time today :)

jyapayne on 4 Jun 2016

@dom96 For a PR, would you prefer a separate commit for each proc with tests or one big commit with all procs and tests that I have added?

jyapayne on 4 Jun 2016

Don't really mind, whichever is easiest. All of these procs are pretty similar so it's fine to put them in the same commit I think.

dom96 on 5 Jun 2016

For string.translate, would it be acceptable to import tables? I don't really want to introduce a dependency like that just for that proc. If it's not acceptable, where should string.translate go?

jyapayne on 5 Jun 2016

Is translate char (byte) based, or something else?

Varriount on 5 Jun 2016

Here is the method signature I was thinking of:

proc translate(s: string, dictionary: TableRef[string, string]): string

The "translate" here would be used for translating words in s to other languages.

jyapayne on 5 Jun 2016

Use this signature instead:

proc translate(s: string, replacements: proc (key: string): string): string

Araq on 5 Jun 2016

@Araq, that's a good idea, I hadn't thought of it. That makes however the user wants to implement it work regardless. Very elegant :)

@Varriount The reason I thought a table would be necessary is because of lookup speed. But Araq's comment makes more sense to me now.

jyapayne on 5 Jun 2016

For isUpper, toUpper, etc, would anyone mind if I converted all of those to be unicode compatible and move them to the unicode module?

jyapayne on 5 Jun 2016

I wouldn't remove the versions that already exist, however I would mark them as deprecated.

Varriount on 5 Jun 2016

Right, that makes sense.

jyapayne on 5 Jun 2016

@Araq, for the translate proc, can a lambda type declare that it can't have side effects?

jyapayne on 5 Jun 2016

@Varriount and @Araq, how would I go about referencing the new unicode procs inside strutils (or vice-versa) for deprecated isUpper, toUpper, etc? Is there a special way to reference deprecations from other modules?

jyapayne on 5 Jun 2016

You wouldn't reference them, except in the documentation. If I recall correctly, you can use a bare {.deprecated.} pragma

Varriount on 5 Jun 2016

For isUpper, toUpper, etc, would anyone mind if I converted all of those to be unicode compatible and move them to the unicode module?

I would. I use strutils.toUpper etc because I know what I'm doing and I like the improved speed and code size. But that's just another fight I will lose and something "Araq's lib" will offer.

Araq on 5 Jun 2016

@Araq, if I simply deprecate them, you'd still be able to use them, right? I'm a bit confused because this comment directly contradicts your earlier comment about making things Unicode compatible. I simply want to contribute something that will be of use to the community.

jyapayne on 5 Jun 2016

There are very solid use cases for each version of toUpper and similar functions. Working correctly with unicode is much slower, and may not be what you want either. D solution for that, for example, is to have a toUpper in the unicode module, and another toUpper in the ascii module.

Also, for string formatting, there is strfmt that is very similar to python .format(). It was suggested many times to be included in the standard library, but I don't know the status of the discussion. It would also introduce a second or third style for formating strings, which isn't really ideal.

ReneSac on 7 Jun 2016

@jyapayne May I suggest some additional stuff for the replace functionality that I think would be useful:

replace (and maybe alias it to replaceFirst too)
replaceAll
replaceLast

Also, it would be nice to add a version of replace in the re module that accepts a callback so you can do your custom replace logic in a function (that's super useful in Scala, for example).

johnnovak on 7 Jun 2016

@ReneSac That's a good idea as well. I'm okay with having both at the same time. I would have to correct some libraries that import both unicode and strutils, but if @Araq is okay with it, it's fine with me :)

@johnnovak I'm not sure what the expectation is with Nim, but an idea from Python's replace function could be that it has a count default parameter (set to 1) that will replace count occurrences of the word in s. If set to -1, then it could replace all occurrences. That would cover both replace and replaceAll with one function. As for replaceLast, maybe replaceRight or some other name that would replace starting from the right side of the string with similar parameters as replace?

The callback sounds like a useful idea as well :)

jyapayne on 7 Jun 2016

@jyapayne I like your replace & replaceRight idea with an optional count parameter. I agree that it makes sense to be consistent with the rest of the string functions that seem to follow the Python conventions.

Yeah, when you need that callback regexp replace thingy, it _really_ comes in handy—for example, when you want to append additional query parameters to all URLs in a block of text (that was my specific use when I discovered it in the standard Scala lib).

johnnovak on 8 Jun 2016

I've updated the table to reflect the changes made in PR #4276.

abudden on 17 Jun 2016

I am using the devel version of Nim as of today and expandTabs is in strmisc instead of strutils.

kaushalmodi on 8 Aug 2017

@dom96

For me, I found str[.. ^2] in Nim to be equivalent to str[:-1] in Python. Also I had to put a space between .. and ^2 as @abudden also found out.

Similarly, str[1.. ^0] and str[1.. ^1] works the same as str[1:] in Python. So what's the difference between ^0 and ^1?

kaushalmodi on 9 Aug 2017

If I import both unicode and strutils package, how do I specify that I want to use the isUpper proc from unicode, and not the deprecated version in strutils?

Example:

import strutils, unicode
echo "A".isUpperAscii()
echo "À".isUpper()

Above gives:

Error: ambiguous call; both strutils.isUpper(s: string)[declared in lib/pure/strutils.nim(322, 5)] and unicode.isUpper(s: string)[declared in lib/pure/unicode.nim(1406, 5)] match for: (string)

UPDATE:

OK. this works:

echo unicode.isUpper("À")

Though, the earlier STR.PROC style won't work now..

echo "À".unicode.isUpper() # Does not work

kaushalmodi on 9 Aug 2017

Hello all, I have taken the comparison table in this thread and written a detailed post with full code and output about string functions between Nim and Python: https://scripter.co/notes/string-functions-nim-vs-python

Looking forward to comments and corrections.

kaushalmodi on 9 Aug 2017

@kaushalmodi there's strfmt module in nimble for enhanced string formatting

Yardanico on 9 Aug 2017

👍1

@kaushalmodi and don't forget to note that Nim is compiled and statically typed while python is interpreted and dynamic :)

Yardanico on 9 Aug 2017

@kaushalmodi

var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str[0.. <str.high]
# or
echo str[.. ^2]

Why ^2?
^1 works perfectly for me

Yardanico on 9 Aug 2017

@kaushalmodi about second example - it's because you can access null terminator which is the end of any string

Yardanico on 9 Aug 2017

@kaushalmodi also - there IS "encode" and "encode" like functions in nim - take a look at "encodings" module - convert procedure can convert from one encoding to another

Yardanico on 9 Aug 2017

@kaushalmodi also

echo "À".unicode.isUpper() # Does not work

does not work because you're trying to find some procedure named unicode which can accept string.
You can use isUpper() without "unicode" as long as you don't import strutils

Yardanico on 9 Aug 2017

@TiberiumN

there's strfmt module in nimble for enhanced string formatting

Thanks, I'll need to do some digging on that.

and don't forget to note that Nim is compiled and statically typed while python is interpreted and dynamic :)

Of course, the point of this post is to just compare string functions between the two languages.

^1 works perfectly for me

It doesn't for me, at least on Linux 64-bit, Nim build from devel as of yesterday. This is what ^2 gives me, and this is what ^1/^0 give me. The results you see on my blog post are calculated live using Org Babel. So I did not manually paste the code and results separately.

about second example - it's because you can access null terminator which is the end of any string

Yes, but it's still not clear as to when to use str[1.. ^0] vs str[1.. ^1].

also - there IS "encode" and "encode" like functions in nim - take a look at "encodings" module - convert procedure can convert from one encoding to another

Thanks. I'll take a look.

You can use isUpper() without "unicode" as long as you don't import strutils

That's not very practical. Based on what I see, strutils looks like a must-have module in any Nim code where I am doing even basic string manipulation. So it will be unlikely for a scope to have unicode imported, but not strutils.

Thanks for your comments! I was beginning to think no one was reading this Issue thread :)

kaushalmodi on 9 Aug 2017

@TiberiumN Which is the official strfmt? I found lyro/strfmt and rgv151/strfmt.

Update: Looks like lyro/strfmt is the official version based on nimble install strfmt (my first nimble installed module) :)

kaushalmodi on 9 Aug 2017

@kaushalmodi "Yes, but it's still not clear as to when to use str[1.. ^0] vs str[1.. ^1]"
When you need to access null terminator - you use ^0, when you don't need to do it - you do ^1.
But you almost never need to access it, so just use ^1 :)

Yardanico on 9 Aug 2017

Here is how you should do it:

import strutils except isUpper
import unicode

echo "À".isUpper()

Btw, thanks for writing that blog post! Always love to see new blog posts about Nim :)

Some feedback:

var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# echo str[1..] Does not work.. Error: expression expected, but found ']'

This looks like a bug to me, I reported it :)

You shouldn't be showing all of the different ways to write startsWith (and the other procs), the convention is startsWith so stick to it!

dom96 on 9 Aug 2017

👍1

@TiberiumN @dom96 Thanks for your feedback! I have updated my post with those.

kaushalmodi on 10 Aug 2017

👍1

@Araq, @dom96 can this be closed?
str.encode and str.decode - we have module "encodings" which AFAIK does the same thing.
yeah, we don't have str[:-1], str[1:]
Instead we use
str[0..^2]
str[1..^1]
and for str[1:-1] we use str[1..^2]
Is it really possible to implement something like str[..^1] ?

Also we now have better "translate"-like procedure - strutils.multiReplace

Yardanico on 1 Sep 2017

With the new strformat it seems on par with Python. Closing.

Araq on 20 Dec 2017

Was this page helpful?

0 / 5 - 0 ratings