V: Consider to make UTF8 strings more prominent

Created on 7 Sep 2020 · 6Comments · Source: vlang/v

When trying run this snippet:

fn main() {
    s := 'a string with äöü (umlauts)'
    println(s)
    for i := 0; i < s.len; i++ {
        print('-')
    }
    println('')
}

you does not get the expected result:

a string with äöü (umlauts)
---------------------------

but because s.len returns the amount of bytes instead of the characters/runes you'll get

a string with äöü (umlauts)
------------------------------

Right now, it looks like the code needs to be changed to

import encoding.utf8

fn main() {
    s := 'a string with äöü (umlauts)'
    println(s)
    for i := 0; i < utf8.len(s); i++ {
        print('-')
    }
    println('')
}

IMHO (Java background which strictly separates between characters and their representations), the implementation detail, that internally bytes are used, should not leak to the programmer. If you want to deal with bytes, then a byte array would be the correct choice.

Feature Request

Source

tmssngr

All 6 comments

Yeah I had the same idea. But for performance reasons I went with Go's approach. Strings are used internally very often, and in most cases utf8 runes are not needed. The overhead of parsing utf8 and storing positions would be too big.

medvednikov on 9 Sep 2020

IMHO a good programming language should make the correct way easy. If the more performant way is more complex, then this would be a compromise.

I understand that storing positions would be a bad choice. I don't have a good idea how to handle it right now, but the current way feels wrong for me.

tmssngr on 9 Sep 2020

👍1

Just for the records - here is a go documentation: https://blog.golang.org/strings

vmcrash on 9 Sep 2020

One possibility is to avoid the name len as it is generally assumed to be number of characters for people coming from most other languages. Simply renaming it to len_bytes or something similar would avoid the confusion.

VinWare on 19 Sep 2020

👍1

It would be more clear, but V is a very small language, the docs can be read in less than an hour, and strings being an array of bytes is a very fundamental feature of V, like in Go.

Repeating len_bytes a million of times in all codebases is repetitive and verbose.

medvednikov on 19 Sep 2020

👍1

Maybe use size?

vmcrash on 19 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings