Nim: [RFC] String null terminator should be inaccessible.

Created on 22 Mar 2017 · 14Comments · Source: nim-lang/Nim

The purpose of this RFC is to change current Nim behavior allowing to refer to null terminator of a string. E.g. the following code is currently valid:

var s = ""
assert(s[0] == '\0') # We can refer to element at index 0 even though it exceeds string len
s[0] = 'a' # We can even change the null terminator which may potentially have bad consequences

Such logic is justified by the fact that different parsing/scanning algorithms are handier to implement relying on the null-terminator, even though its still possible to implement the same logic through index vs length checks.

The proposal is to raise index errors in cases when null terminator is referred just like it is done for indexes that exceed string len, or like it is done for seq types.

The change itself is trivial, however there are quite a few places where current behavior is relied upon. So all of these places will have to be adjusted before the change.

Pros: consistent and more intuitive behavior.
Cons: breaking change.

Thumbs up?

RFC

Source

yglukhov

👍4

Most helpful comment

After some discussion with @Araq, the null-termitator deprecation direction is approved, and we can start moving in this direction. My plan is:

No longer rely on the old behavior when str[str.len] == '\0' in new code.
Introduce charAtIndexOrNull (we might pick a better name though) in strutils module (we might pick a better place though).
Gradually patch Nim and stdlib code relying on the old behavior, with small PRs.
When all of the above is done, switch the default behavior of [] operator to raise outOfBounds, introducing a -d:allowStringNullTerminatorAccess flag for those who need more time to migrate.

Thumbs up?

yglukhov on 29 Jun 2017

👍8

All 14 comments

Originated from #5592

yglukhov on 22 Mar 2017

This behaviour is very nice for parsers so I'm afraid it's a 👎 from me.

dom96 on 24 Mar 2017

IMO, the fact that this behavior is nice for parsers doesn't justify it. This behavior can easily be emulated specifically for parsers like so:

proc `{}`(s: string, i: int): char =
  # returns char or `\0` if at end. raises if out of bounds.
  if i == s.len: return '\0'
  return s[i]

proc parse(s: string) =
  var i = 0
  while true:
    let c = s{i} # Special op for using in parsing algorithms.
    # ...

Needless to say that a lot of parsing algorithms can be changed to not use this behavior at all without sacrificing in readability.

yglukhov on 1 Apr 2017

This behavior can easily be emulated specifically for parsers like so:

Of course it can but that's slower and produces more code than is necessary.

But I'm torn on this.

Araq on 2 Apr 2017

The current behaviour makes sense to me. My rationale is that all strings in Nim are delimited by a \0 for easy C interop, and because of this even an empty string is delimited by a \0.

dom96 on 5 Apr 2017

C interop doesn't require null terminator to be accessible from nim side. C interop should not affect Nim's semantics if it is possible not to. Nim has to be a standalone language done right, not dictated by legacy state of affairs.

Null terminator is just an implementation detail that should not be considered and heavily relied upon. It also makes strings potentially unsafe allowing to overwrite it and then passing to C side. Implementing this RFC I would remove the null terminator mention from the manual, leaving a tiny statement that "string conversion to cstring is zero cost for native targets (because somewhere deep under the hood strings are null terminated :)"

Null terminators make indexing behavior inconsistent with seq/array indexing which adds another tiny detail to the language.

yglukhov on 5 Apr 2017

👍3

FWIW, Facebook are experimenting with removing the null terminator from their C++ string library (and creating it on demand when the string must be passed to a C library):

https://www.reddit.com/r/rust/comments/5732kh/the_strange_details_of_stdstring_at_facebook/

They do experiments like this in order to get tiny performance improvements, but It's an interesting idea nevertheless.

zah on 5 Apr 2017

👍1

From an implementation design perspective, I agree with @yglukhov - it feels like a leaked implementation detail.

Varriount on 2 May 2017

Of course it can but that's slower and produces more code than is necessary.

@Araq, in fact, it may be implemented as efficiently as current indexing:

proc `{}`(s: string, i: int): char {.inline.} =
  # returns char or `\0` if at end. raises if out of bounds.
  if i > s.len: raise IndexError
  return cstring(s)[i] # No check here.

That's it.

yglukhov on 13 Jun 2017

Probably the name should be a bit more scary (like charAtIndexOrNull) instead of nice {}.

yglukhov on 13 Jun 2017

After some discussion with @Araq, the null-termitator deprecation direction is approved, and we can start moving in this direction. My plan is:

No longer rely on the old behavior when str[str.len] == '\0' in new code.
Introduce charAtIndexOrNull (we might pick a better name though) in strutils module (we might pick a better place though).
Gradually patch Nim and stdlib code relying on the old behavior, with small PRs.
When all of the above is done, switch the default behavior of [] operator to raise outOfBounds, introducing a -d:allowStringNullTerminatorAccess flag for those who need more time to migrate.

Thumbs up?

yglukhov on 29 Jun 2017

👍8

Can be closed now?

yglukhov on 30 Apr 2018

Todo:

[x] Update JS codegen.
[x] Update the VM.
[x] Update the tutorials / manual.

Araq on 30 Apr 2018

😕1

Now complete. Closing.

Araq on 11 Sep 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

ospaths.isAbsolute: uncovering out of bound bugs after updating to 0.18.1 from 0.18.0: empty string and nil string now checked for out of bound errors

timotheecour · 3Comments

feature: have 'if' treat nil as false

kobi2187 · 4Comments

Support default arguments in sugar.`=>` and sugar.`->`

hlaaftana · 3Comments

getOccupiedMem Undocumented

juancarlospaco · 3Comments

nimpretty generates broken formatting

SolitudeSF · 3Comments