V: Consider to make UTF8 strings more prominent

Created on 7 Sep 2020  路  6Comments  路  Source: vlang/v

When trying run this snippet:

fn main() {
    s := 'a string with 盲枚眉 (umlauts)'
    println(s)
    for i := 0; i < s.len; i++ {
        print('-')
    }
    println('')
}

you does not get the expected result:

a string with 盲枚眉 (umlauts)
---------------------------

but because s.len returns the amount of bytes instead of the characters/runes you'll get

a string with 盲枚眉 (umlauts)
------------------------------

Right now, it looks like the code needs to be changed to

import encoding.utf8

fn main() {
    s := 'a string with 盲枚眉 (umlauts)'
    println(s)
    for i := 0; i < utf8.len(s); i++ {
        print('-')
    }
    println('')
}

IMHO (Java background which strictly separates between characters and their representations), the implementation detail, that internally bytes are used, should not leak to the programmer. If you want to deal with bytes, then a byte array would be the correct choice.

Feature Request

All 6 comments

Yeah I had the same idea. But for performance reasons I went with Go's approach. Strings are used internally very often, and in most cases utf8 runes are not needed. The overhead of parsing utf8 and storing positions would be too big.

IMHO a good programming language should make the correct way easy. If the more performant way is more complex, then this would be a compromise.

I understand that storing positions would be a bad choice. I don't have a good idea how to handle it right now, but the current way feels wrong for me.

Just for the records - here is a go documentation: https://blog.golang.org/strings

One possibility is to avoid the name len as it is generally assumed to be number of characters for people coming from most other languages. Simply renaming it to len_bytes or something similar would avoid the confusion.

It would be more clear, but V is a very small language, the docs can be read in less than an hour, and strings being an array of bytes is a very fundamental feature of V, like in Go.

Repeating len_bytes a million of times in all codebases is repetitive and verbose.

Maybe use size?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

penguindark picture penguindark  路  3Comments

oleg-kachan picture oleg-kachan  路  3Comments

clpo13 picture clpo13  路  3Comments

vtereshkov picture vtereshkov  路  3Comments

taojy123 picture taojy123  路  3Comments