Zig: Add a "string" type alias for "[]const u8"

Created on 5 Jul 2020 · 8Comments · Source: ziglang/zig

From Zig's documentation:

    // Zig has no concept of strings. String literals are const pointers to
    // arrays of u8, and by convention parameters that are "strings" are
    // expected to be UTF-8 encoded slices of u8.
    // Here we coerce [5]u8 to []const u8
    const hello: []const u8 = "hello";
    const world: []const u8 = "世界";

The type []const u8 is 10 characters long and there are approximately ~1000 occurrences of it in std alone. In most cases it means a UTF-8 encoded string, in the rest it means a u8/byte buffer to read from. It's probably not very hard to figure out the meaning from the context (the surrounding function/struct name) but it takes a tiny bit of effort. A type alias for []const u8 called string (or text, or something else) could be added in order to unambiguously convey the type of the entity. It might also be shorter/easier to type. It seems appropriate to use shorter names for types that are used extensively (u32 instead of uint32, f32 instead of float, etc.).

proposal

Source

luauser32167

Most helpful comment

I don't think it's in zig's best interest to have a single type called string. There's ascii, extended ascii and utf-8 which all look like []u8 for example, and must be treated differently. What I think we do need is a way to nicely annotate what string encoding is being used.

Sobeston on 5 Jul 2020

🚀2

All 8 comments

For consistency, string wouldn't really just mean []u8, with const string meaning []const u8? Thus the typing benefits are decreased, and it becomes less clear what exactly a string is.

justjosias on 5 Jul 2020

Windows love u16.

data-man on 5 Jul 2020

😄1

std::string     std::basic_string<char>
std::wstring    std::basic_string<wchar_t>
std::u8string   std::basic_string<char8_t>  // (since C++20)
std::u16string  std::basic_string<char16_t> // (since C++11)
std::u32string  std::basic_string<char32_t> // (since C++11)

data-man on 5 Jul 2020

Sobeston on 5 Jul 2020

🚀2

An actual string type can bring along a few extra features not present with []const u8:

methods that only make sense on strings
a field .is_valid_utf8: ?bool

daurnimator on 5 Jul 2020

I was going to write the same as @Sobeston.

If a type alias will be introduced, then it makes sense to tie the alias to the encoding of the string:

const hello: []const u8 = "hello";
const hello: utf8 = "hello";

"utf8" conveys more information than "string" and is shorter to type, down to 4 characters now.

jorangreef on 5 Jul 2020

You can make such a type in user space. No need for it to be a language primitive.

andrewrk on 5 Jul 2020

👍1

You can make such a type in user space. No need for it to be a language primitive.

Sure. But it would be nice if everyone picks the same name for the same concept, for example:

str8 for a UTF-8 encoded string (str16 for UTF-16LE?, and str32 for UTF-32)

This is very unlikely to happen if Zig doesn't pick a name, and simply let people define their own type aliases to []const u8/16/32.

Some people would stick with []const u8, others might use c++'s names, some (I guess like me) would just call it string and pretend that there's only UTF-8 or something...

Regardless []const u8 can either mean a UTF-8 string or a source of bytes, and not every source of bytes is a UTF-8 string.

luauser32167 on 5 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

question: async/await and threadpools for fs/cpu work

jorangreef · 3Comments

make Debug and ReleaseSafe modes fully safe

andrewrk · 3Comments

cimport fails to escape characters correctly for zig

daurnimator · 3Comments

QOL Proposal: use(xxx) statement to reduce repeating element

bheads · 3Comments

C ABI test failures on Windows x86_64

andrewrk · 3Comments