From Zig's documentation:
// Zig has no concept of strings. String literals are const pointers to
// arrays of u8, and by convention parameters that are "strings" are
// expected to be UTF-8 encoded slices of u8.
// Here we coerce [5]u8 to []const u8
const hello: []const u8 = "hello";
const world: []const u8 = "涓栫晫";
The type []const u8 is 10 characters long and there are approximately ~1000 occurrences of it in std alone. In most cases it means a UTF-8 encoded string, in the rest it means a u8/byte buffer to read from. It's probably not very hard to figure out the meaning from the context (the surrounding function/struct name) but it takes a tiny bit of effort. A type alias for []const u8 called string (or text, or something else) could be added in order to unambiguously convey the type of the entity. It might also be shorter/easier to type. It seems appropriate to use shorter names for types that are used extensively (u32 instead of uint32, f32 instead of float, etc.).
For consistency, string wouldn't really just mean []u8, with const string meaning []const u8? Thus the typing benefits are decreased, and it becomes less clear what exactly a string is.
Windows love u16.
std::string std::basic_string<char>
std::wstring std::basic_string<wchar_t>
std::u8string std::basic_string<char8_t> // (since C++20)
std::u16string std::basic_string<char16_t> // (since C++11)
std::u32string std::basic_string<char32_t> // (since C++11)
I don't think it's in zig's best interest to have a single type called string. There's ascii, extended ascii and utf-8 which all look like []u8 for example, and must be treated differently. What I think we do need is a way to nicely annotate what string encoding is being used.
An actual string type can bring along a few extra features not present with []const u8:
.is_valid_utf8: ?bool I was going to write the same as @Sobeston.
If a type alias will be introduced, then it makes sense to tie the alias to the encoding of the string:
const hello: []const u8 = "hello";
const hello: utf8 = "hello";
"utf8" conveys more information than "string" and is shorter to type, down to 4 characters now.
You can make such a type in user space. No need for it to be a language primitive.
You can make such a type in user space. No need for it to be a language primitive.
Sure. But it would be nice if everyone picks the same name for the same concept, for example:
str8 for a UTF-8 encoded string (str16 for UTF-16LE?, and str32 for UTF-32)
This is very unlikely to happen if Zig doesn't pick a name, and simply let people define their own type aliases to []const u8/16/32.
Some people would stick with []const u8, others might use c++'s names, some (I guess like me) would just call it string and pretend that there's only UTF-8 or something...
Regardless []const u8 can either mean a UTF-8 string or a source of bytes, and not every source of bytes is a UTF-8 string.
Most helpful comment
I don't think it's in zig's best interest to have a single type called string. There's ascii, extended ascii and utf-8 which all look like []u8 for example, and must be treated differently. What I think we do need is a way to nicely annotate what string encoding is being used.