When trying run this snippet:
fn main() {
s := 'a string with 盲枚眉 (umlauts)'
println(s)
for i := 0; i < s.len; i++ {
print('-')
}
println('')
}
you does not get the expected result:
a string with 盲枚眉 (umlauts)
---------------------------
but because s.len returns the amount of bytes instead of the characters/runes you'll get
a string with 盲枚眉 (umlauts)
------------------------------
Right now, it looks like the code needs to be changed to
import encoding.utf8
fn main() {
s := 'a string with 盲枚眉 (umlauts)'
println(s)
for i := 0; i < utf8.len(s); i++ {
print('-')
}
println('')
}
IMHO (Java background which strictly separates between characters and their representations), the implementation detail, that internally bytes are used, should not leak to the programmer. If you want to deal with bytes, then a byte array would be the correct choice.
Yeah I had the same idea. But for performance reasons I went with Go's approach. Strings are used internally very often, and in most cases utf8 runes are not needed. The overhead of parsing utf8 and storing positions would be too big.
IMHO a good programming language should make the correct way easy. If the more performant way is more complex, then this would be a compromise.
I understand that storing positions would be a bad choice. I don't have a good idea how to handle it right now, but the current way feels wrong for me.
Just for the records - here is a go documentation: https://blog.golang.org/strings
One possibility is to avoid the name len as it is generally assumed to be number of characters for people coming from most other languages. Simply renaming it to len_bytes or something similar would avoid the confusion.
It would be more clear, but V is a very small language, the docs can be read in less than an hour, and strings being an array of bytes is a very fundamental feature of V, like in Go.
Repeating len_bytes a million of times in all codebases is repetitive and verbose.
Maybe use size?