Zig: proposal: type for null terminated pointer

Created on 23 Feb 2017  路  42Comments  路  Source: ziglang/zig

* Progress

Currently the type of c"aoeu" is *const u8.

Instead, the type should indicate that the pointer is null terminated. Here are two ideas to represent that:

  • *0 const u8
  • *null const u8

This type would be implicitly castable to *const u8. You can explicitly cast the other way, and in debug mode this inserts a safety check to make sure there actually is a null byte there.

It should probably work for any type that supports T == 0 or T == null.

We want to steer users away from this type and instead use []const u8, which includes a pointer and a length. However, we still have to deal with null terminated things from C land, which makes this useful, and some kernel interfaces. For example, we currently have this:

pub fn open_c(path: *const u8, flags: usize, perm: usize) -> usize {
    arch.syscall3(arch.SYS_open, usize(path), flags, perm)
}

pub fn open(path: []const u8, flags: usize, perm: usize) -> usize {
    const buf = @alloca(u8, path.len + 1);
    @memcpy(&buf[0], &path[0], path.len);
    buf[path.len] = 0;
    return open_c(buf.ptr, flags, perm);
}

Having the open_c prototype be *0 const u8 would make it more type-safe. Further, we could provide an open function that supported either type for path, and if it happened to be null terminated then it could avoid the stack allocation.

We could also make the type of string literals be []0 const u8 meaning that the pointer value for the slice has a 0 after the last byte. The length would still indicate the memory before the null byte. If you slice this type then the pointer component would change from *0 const u8 to *const u8.

It would be extra helpful if automatic .h import could identify when a pointer in a function is supposed to be null-terminated, and we could emit a compile error if the user passes a pointer that is not null terminated. I'm not sure how we could detect this automatically though.

accepted enhancement proposal

Most helpful comment

Implemented in #3728, landed in 5a98dd42b38b9188cfb96c9ab57dc91af923029f.

Related follow-up issues:

  • #3731
  • #3766
  • #3767
  • #3768

All 42 comments

in debug mode this inserts a safety check to make sure there actually is a null byte there.

How would that work? Wouldn't that be an unbounded linear search?

Maybe if you cast from []T to []0 T then len then we check for 0 in the last spot (since we have len) and then in the new slice, len is -1.

That's true though, for pointers we can't really have this check.

Proposal rejected in order to discourage use of null terminated things.

How do you plan to interface with C-style null-terminated strings?

Can you elaborate with this question? We have the cstr module for getting string length and converting to a slice.

Solved thanks to that.

Reopening for #518.

My proposal is to leave the type unnamed; you have to refer to it as @typeOf(c""). This discourages people from using the type, and it clearly associates the type with the c"" syntax.

Instances of the type should not be usable in any way, except for implicitly converting to &const u8. No slicing syntax x[0..5], no array subscripting syntax x[0], no field access x.foo.

Literals with c"foo" syntax should obviously be of this type. Nothing implicitly casts to this type. Creating an instance of this type should be done through @typeOf(c"")(x), where x is of type []const u8. This is the only explicit conversion that should be allowed. In safety mode, the conversion should do this safety check:

assert(x.len > 0);
for (x[0..x.len - 1]) |b| {
    assert(b != 0);
}
assert(x[x.len - 1] == 0);

There is no way to create a @typeOf(c"") directly from a pointer. You have to slice the pointer, explicitly giving it a length, and then convert it. This way the safety check is always bounded.

There are many cases where you don't want to bother with this safety check, but you still want a @typeOf(c""). For that we can add these functions to the cstr module:

pub fn fromSliceUnsafe(x: []const u8) -> @typeOf(c"") {
    @setDebugSafety(this, false)
    return @typeOf(c"")(x);
}
pub fn fromPointerUnsafe(x: &const u8) -> @typeOf(c"") {
    return fromSliceUnsafe(x[0..0]);
}

The name "Unsafe" should properly inform people that they're taking a risk using these utilities.

I'm not sure we need any solution to null-terminated arrays of things other than u8. These cases exist, but I don't think we need special support for them.

I think this counter proposal has 2 competing ideas:

  • Introduce a type safety feature that makes it more ergonomic to have safer code.
  • Make the feature intentionally ugly so that using it is unattractive

These don't go well together. I think the original proposal is better, where the type is named.

There's also no reason to limit it to u8. A null terminated array is not inherently an evil C concept that is intruding in the Zig language. It's a general data storage technique that is valid for some memory constrained use cases.

I can probably find a Windows API that uses NULL as a sentinel for an array of pointers. The argv in libc main would be represented with &null ?&null u8. (And not just libc - this is how it is represented in the official x86_64 ABI specification).

There is no way to create a @typeOf(c"") directly from a pointer. You have to slice the pointer, explicitly giving it a length, and then convert it. This way the safety check is always bounded.

So the generated code would first have to do essentially a strlen to find the length, use that to convert to a slice, then cast that to the null-terminated type, and then the debug safety check would do a redundant strlen? That doesn't seem right.

If we had the[]null u8 type as originally proposed, this would be more straightforward. I think a better way this can go is:

  • Set the null terminator in memory.
  • Cast the &u8 to []null u8. This cast does linear search for the terminator and sets len appropriately.
  • Now @typeOf(slice.ptr) == &null u8

From llvm/include/llvm/Support/MemoryBuffer.h

/// This interface provides simple read-only access to a block of memory, and
/// provides simple methods for reading files and standard input into a memory
/// buffer.  In addition to basic access to the characters in the file, this
/// interface guarantees you can read one character past the end of the file,
/// and that this character will read as '\0'.
///
/// The '\0' guarantee is needed to support an optimization -- it's intended to
/// be more efficient for clients which are reading all the data to stop
/// reading when they encounter a '\0' than to continually check the file
/// position to see if it has reached the end of the file.

That may be a better place to put my comment -
Does the null/0 have to be at index x.len? C strings are stored in fixed length array but the string length can vary, it is not necessarily equal to the array size. The same applies to null-terminated C arrays.

@jido In C, nothing prevents you from removing all null terminators from a char array, and then passing it to strlen. This is the motivation for the null terminated type. The compiler will insert a null terminator at x.len to prevent this bug. You, as a user of a null terminated type can still insert null terminators between 0 and x.len and strlen will stop at your null terminator instead of the one at x.len. The one at x.len is there for safety, and trying to override it is undefined behavior (aka a runtime crash in debug mode).

Moving some work from Pointer Reform (#770) to here:

  • [ ] add syntax for [N]null T, []null T, and [*]null T
  • [ ] add implicit casting

    • [ ] []null T to []T

    • [ ] [*]null T to [*]T

    • [ ] [N]null T to [N]T

    • [ ] *[N]null T to *[N]T

    • [ ] []null T to [*]null T

    • [ ] [N]null T to []null const T

    • [ ] [N]null T to [*]null const T

    • [ ] make string literals [N]null u8

    • [ ] remove C string literals and C multiline string literals

Something I hadn't considered yet:

  • Would the null terminated slice/array types assert that a null/0 byte does not occur in any of the elements before the len index?

If so, this would guarantee the property that after casting a []null T to [*]null T, finding the length based on the null termination would give the same len value as before. However this would mean that casting []null T to [*]null T should have a runtime safety check, which probably means it shouldn't be an implicit cast. Hmm.

Or the type could have a weaker guarantee, which is only that there shall be a null/0 byte at the len index, and makes no guarantees about items otherwise. However, then the "length" may change when implicitly casting from []null T to [*]null T.

A runtime check in debug mode is not a bad idea, as long as its easy/cheap to get a []null u8 slice from another slice. Its not uncommon in to insert a null in the middle of a larger array...

Is there a reason [N]null T, []null T, and [*]null T cannot simply be wrapper types around [N]T, []T, and [*]T? E.g. consider some code like this for []null T:

fn TermPtr(comptime T: type, comptime term: T) type {
    return struct {
        ptr: [*]T,
        fn len(self: @This()) usize { implement strlen here }
        fn from_slice(slice: []T) @This() {
            assert(slice[slice.len - 1] == term);
            assert(there are no interior nulls);
            return Self { .ptr = slice.ptr };
        }
        etc.
    };
}

This has the same layout as a C pointer and can be passed to and from C code as-is.

What changes if this type is implemented directly in the compiler? The reasons I can think of are basically syntactic sugar.

  • You get the []null T syntax, which is concise.
  • []null T implicitly coerces to []T.
  • String literals would be automatically null-terminated and coerce to either []u8 or []null u8. But you can already do this conversion safely with a comptime function.

Are there any other significant differences? And are the things listed above advantageous at all? The current sentiment seems to be that the feature should be intentionally ugly, so adding convenient syntax doesn't seem all that compelling.

Null terminated data is too fragile. Here some arguments

  • unchecked memset could corrupt data
  • sending and receiving null terminated data over network creates complications (for example unknown length incoming data requires more calls to allocate new buffers)
  • it also creates obstacles for bounds checking (repeating calls to strlen, or store length & defeat any reason or 'saving' that zero-terminated data provides)

Even more: in C it is not a part of the language, but an api (of the past). So i think that interop with legacy C APIs should not get dedicated feature in the language. Robustness should not be traded for runtime efficiency.

Response to @daurnimator's comment:
POSIX standard dates back to late 1980s. Linux syscalls are based on POSIX, so i consider 'open' and other as legacy APIs. Take a look at an article about file IO in Google's in-development OS. There the kernel itself doesn't know anything about 'files' and file IO is implemented in userland.

Thoughts on [N]null T, []null T, and [*]null T:
Consider use case: i have some data (for example [N]null u8) and i want to create a zero-terminated slice and pass it to an api that wants it. Creating such slice would require writing 0 to array (act of taking a slice corrupts data), even not considering const array. So mem alloc&copy required anyway. Basically []null T can be used only to annotate some api that it takes zero-terminated data.

Null terminated data is too fragile. Here some arguments

I think you will find that the Zig community in general (and especially myself) agrees with you on this, and APIs in general should prefer slices to null terminated pointers. Even if you are using Zig to create a C library, and even in actual C libraries, I would recommend pointer and length arguments rather than null terminated pointers, like this: https://github.com/andrewrk/libsoundio/blob/1.1.0/soundio/soundio.h#L795

That being said, I want to repeat what I said earlier about null terminated pointers:

A null terminated array is not inherently an evil C concept that is intruding in the Zig language. It's a general data storage technique that is valid for some memory constrained use cases.

I also stumbled on a Real Actual Use Case inside LLVM.

The bottom line for me is that null terminated pointers exist in the real world, and especially in systems programming. You can see this in interfaces with the operating system in the standard library:

[nix-shell:~/downloads/zig/build]$ grep -RI 'issues.*265' ../std/
../std/c/index.zig:// TODO https://github.com/ziglang/zig/issues/265 on this whole file
../std/event/fs.zig:            /// must be null terminated. TODO https://github.com/ziglang/zig/issues/265
../std/event/fs.zig:            /// must be null terminated. TODO https://github.com/ziglang/zig/issues/265
../std/event/fs.zig:            // TODO https://github.com/ziglang/zig/issues/265
../std/special/bootstrap.zig:// TODO https://github.com/ziglang/zig/issues/265
../std/os/linux/index.zig:// TODO https://github.com/ziglang/zig/issues/265
(about 100 occurrences in this file)
../std/os/index.zig:// TODO https://github.com/ziglang/zig/issues/265
../std/os/darwin.zig:// TODO https://github.com/ziglang/zig/issues/265 on the whole file

@matthew-mcallister

Is there a reason [N]null T, []null T, and [*]null T cannot simply be wrapper types around [N]T, []T, and [*]T? E.g. consider some code like this for []null T:

I think it's a reasonable use case to want to decorate external functions with this null terminated attribute. Pointers are the biggest cause of unsafety in Zig, with the danger of security vulnerabilities, and we don't have a borrow checker like Rust. So we have to compensate by having pointer attributes to provide metadata about pointers to prevent footguns. I've seen the align(n) pointer attribute save myself and others from difficult to diagnose undefined behavior, and I think the null attribute will be similar. I think if it was between using a TermPtr struct from the standard library, or a [*] pointer and documenting that it has to be null terminated, most programmers, including myself, would stick with the bare pointer. Making the syntax more convenient to represent this API is to encourage people to opt in to some compiler-checked safety.

I want to note another use case: The Vulkan API is defined by an XML document that generates the .h file to include, and they independently came up with the idea of annotating null terminated pointers in this way: https://github.com/KhronosGroup/Vulkan-Docs/blob/v1.1.100/xml/vk.xml#L661
With Zig language support for null terminated pointers, I believe the Vulkan API can be represented in Zig types with no loss of information.

So here's the trade-off we're making as I see it:

  • language becomes slightly bigger, with the null attribute to pointer types
  • language becomes slightly smaller, with no more C string literals
  • one less footgun (passing non-null terminated pointer when null-terminated pointer expected)

I feel pretty confident about this trade-off being net positive.

Now that I've just made an argument in favor of moving forward with this proposal, I want to say that I am pleased with the community resisting new features. I strongly believe in keeping Zig simple, and the voices here saying "can't we get away with not having this?" echo my own thoughts.

Just seen that the new C pointer type is being implemented, i think that's where this null attribute belongs.

const envp_optional = @ptrCast([*c][*c_null]u8, argv + argc + 1); // bootstrap.zig@62

Maybe a better name is needed (c_zero), as null can be confused with abstract null in zig.

Having 0 right after end of the string literals is good idea, and it would allow to automatically cast string literals to [*c_zero]u8 (and to [*c]u8? but not [*]u8?). Creating zero terminated data at runtime would be handled manually in userland where it belongs.

About Real Actual Use Case inside LLVM:
Is it made for while ((ch = readChar(&buf)) != 0)?
Doesn't look like 'the zig way'. Here while with error unions would be more appropriate.

const Buffer = struct {
    // implementation
    fn readChar() !u8 {
        if (now >= len) return error.EndOfBuffer;
        // implementation
    }
}
///////////
while (buffer.readChar()) |char| {
    // use the char
} else |err| {
    // handle end of buffer
}

It will be a compile error to have a null terminated C pointer. C pointer means "I'm sorry, I actually don't know whether this is supposed to point to one thing, or many things, or if it's supposed to be null terminated. The .h file didn't tell me." If you know that a pointer is null terminated, and you have access to edit the pointer type, then make the type [*]null T.

Well my idea is to discourage use of 0 terminated data and associate it with C interop (all of the use cases that i've seen). If you know really what are you doing and know solid benefits then you would use [*c_null]T.

And for converted .h files you would do
open(path: [*c]u8, flags: u32) [*c]File
open(path: [*c_null]u8, flags: u32) *File

Random thoughts:
why c pointer is [*c]T and 0 terminated is [*]null T? why not [*]c T or [*null] T?

Edit:
If zig's pointer semantics would ever deviate from c's pointers semantics (for example some secret parameter for debug mode) how would then these [*]null T and other manually edited pointers in converted .h files handle it?

Edit 2:
I'm thinking of

  • *c T i know nothing
  • [*c]T i point to many
  • [*c_null]T i point to many that terminates with 0

I must say that I would like _zero_ better than _null_ for 0-terminated array. To me _null_ denotes a pointer with a null value.

@Rocknest your counter proposal isn't taking into account everything. Have a look at the issue description and the checklist and show how your counter proposal solves the use cases. For example what about slices.

@jido one problem with zero (or 0 as in the original post) is that it doesn't make sense when the type is optional, e.g. [*]null ?T

About slices: maybe 0 terminated arrays are not evil at all, but 0 terminated slices are evil for sure. Taking 0 terminated slices in theory impossible without destroying original data. Debug safety with such slices (check every write? iterate over data when doing casts?) is complicated.

About pointers: it is nice to have type safe pointer that indicates 0 terminated data, but it should be discouraged.

Edit: i think original description is a bit outdated. Isn't type of c"aoeu" is [5]const u8 and it automatically converts to [*]const u8? Also @alloca is no more.

@andrewrk I wholly agree with your stance on first-class support for null-terminated strings in Zig. I see excellent C interop as mission-critical for basically any programming language, but even moreso a systems language like Zig. However, I'm not sure I fully understand your response, or at least I partially disagree with it.

I think it's a reasonable use case to want to decorate external functions with this null terminated attribute. Pointers are the biggest cause of unsafety in Zig, with the danger of security vulnerabilities, and we don't have a borrow checker like Rust. So we have to compensate by having pointer attributes to provide metadata about pointers to prevent footguns. I've seen the align(n) pointer attribute save myself and others from difficult to diagnose undefined behavior, and I think the null attribute will be similar.

So, am I wrong in saying this is true whether the type is implemented in the compiler or in stdlib? Say c_str = TermPtr(u8, '\0'), or equivalent. Functionally, what is the difference between these two prototypes?

extern "c" fn chdir(path: [*]null const u8) c_int;
extern "c" fn chdir(path: c_str) c_int;

Both are at least type safe: I cannot pass a [*]const u8; I would have to either do an unsafe cast or go through a library function with a safety check, assuming c_str is properly encapsulated.

We're talking about using the type system to enforce an invariant on the data pointed at by the pointer, and I guess I'm not sure what compiler would do that cannot be accomplished by a good user API. Borrow checking is different, as the algorithm requires variable scoping and lifetime information only accessible to the compiler; there's no way to enforce that kind of invariant with plain-Jane encapsulation. EDIT: Also, FWIW, Rust implements null-terminated strings in its stdlib.

I think if it was between using a TermPtr struct from the standard library, or a [*] pointer and documenting that it has to be null terminated, most programmers, including myself, would stick with the bare pointer. Making the syntax more convenient to represent this API is to encourage people to opt in to some compiler-checked safety.

That's a very reasonable proposition, and it may even be the case, but I'm not convinced there's actually a major difference in convenience/usability. Getting into naming would be bikeshed territory, but if the choice is between (say) c_str and [*]null u8, I doubt a new Zig user would find either hard to get used to. And library/binding authors have to be diligent either way; a clumsy or inexperienced author could still easily use [*]u8 or [*c]u8 in place of [*]null u8 when, say, blindly translating C headers.

Granted, there are some clear downsides. Most obviously, you would have to wrap C string constants with a comptime function to do the type conversion (e.g. c_str("quux") instead of just "quux"), but the type checker will catch you if you forget. Moreover, if it's a huge deal, the language could later add overloaded string constants like Haskell has, but perhaps that's a bridge too far.

I believe the Vulkan API can be represented in Zig types with no loss of information.

Not directly relevant, but I know of this use case. I've written two Vulkan binding generation programs before [1] [2], and I'm starting one for Zig.

So here's the trade-off we're making as I see it:
* language becomes slightly bigger, with the null attribute to pointer types
* language becomes slightly smaller, with no more C string literals
* one less footgun (passing non-null terminated pointer when null-terminated pointer expected)

Basically, I think that a user datatype solution accomplishes the last two goals without having first problem at all. There will be syntax concerns, but, hey, maybe it'll be possible to work around them. Further, a userland solution can be trialed right now with no changes to the compiler, which would be easy enough that it wouldn't cost much to have a change of heart later and go the builtin route. (The two could actually coexist, meaning there's a very mild violation of the "one obvious way to do things" principle here.)

EDIT: I did think of another difference: the compiler could allow null-terminated pointers in for loops. But I'm guessing this would only iterate over bytes, not Unicode chars, limiting its applicability.

What about type safe opaque pointers? And allow to define functions in their namespace.

const c_string = TermSequence(u8, 0x00);

fn open(path: const c_string, flags: c_int) [*c]File {
    _ = path.strlen();
    // ...
    const iter = path.iterator();
    while (iter.nextChar()) |char| {
        //
    } else |err| {
        //
    }
    // ...
}

@matthew-mcallister

Functionally, what is the difference between these two prototypes?

If I answer the question literally, the answer is that one of them would compile and the other would be non-extern struct not allowed in function with C calling convention. If you changed TermPtr implementation to use extern struct then it's not actually guaranteed that the C ABIs are compatible. For example if you look at the x86-64 architecture specification, it describes the C ABI for the C calling convention. It's a bit convoluted, but structs and pointers are treated separately by the spec. I believe it happens to be true that for x86-64 in the C ABI environments I am familiar with, a small enough struct will pass its fields as registers, it's not guaranteed to be the same, and one would have to research all the C ABIs to make sure that struct with a pointer field and a pointer directly, happen to have the same ABI for all the targets.

So at the very least, it's a lot less immediately obvious that one of them is passing a pointer.

I think a userland null terminated pointer would need to be more complicated than your example. For example, what about const, volatile, alignment?

If I answer the question literally, the answer is that one of them would compile and the other would be non-extern struct not allowed in function with C calling convention.

Sorry, I had no idea; it seems the extern modifier for structs isn't documented yet.

If you changed TermPtr implementation to use extern struct then it's not actually guaranteed that the C ABIs are compatible. ...

True, I naively assumed that all platforms are the same as x86, which in my (limited) experience does always pass this via registers. But I believe that is an orthogonal issue. If we had a "newtype" or "transparent" struct layout modifier which caused a struct to have the same binary semantics as its (sole, primitive) member, it would be generally quite useful by allowing you to create ABI-compatible, typesafe wrappers for, e.g., clock_t and various bitflag types, and it would handle this point as a side effect. I would definitely like to see that happen whichever way null-terminated pointers are implemented.

I think a userland null terminated pointer would need to be more complicated than your example. For example, what about const, volatile, alignment?

Couldn't you could ask the same about any custom pointer wrapper (e.g. unique_ptr, shared_ptr)? In principle, you can always make the type template function more flexible and add typedefs for common cases. Plus anyone can define their own type if all else fails. Also, doesn't alignment go on the data type and not the pointer? But yeah, if you want to support all possible pointer attribute combinations, present and future, it might be more convenient as a new attribute.

Anyways, I'm open to discuss/answer more, but I can relent if you feel like you've read enough.

I would definitely like to see that happen whichever way null-terminated pointers are implemented.

This is definitely an interesting proposal which I would invite you to file separately from this null terminated pointers issue (and indeed if it was accepted, it could potentially reverse the decision on this one).

I think you've made a pretty strong case for userland null-terminated pointer. It's enough to make me go back and consider all the use cases that I have for it. I don't know if I'm ready to reverse the "accepted" label here yet, but I'll admit that I'm going back and questioning this decision.

This is definitely an interesting proposal which I would invite you to file separately from this null terminated pointers issue (and indeed if it was accepted, it could potentially reverse the decision on this one).

That sounds good! I'd be happy to write up a proposal and work on implementing it as well.

Also, I just tried this out in Compiler Explorer (updated) and it seems like a one-member extern struct already gets passed the same way as a plain int on x86_64, so the suggestion here could be tentatively evaluated outside stdlib without waiting for that idea to be implemented.

In principle, you can always make the type template function more flexible and add typedefs for common cases.

For the record, what I was thinking by this was that the template would take a struct specifying pointer attributes and you'd write one typedef per set of parameters. Without working out details, something like

const PtrAttrs = struct {
    is_const: bool,
    is_volatile: bool,
    // etc.
}
const c_str_const = TermPtr(u8, '\0', PtrAttrs { is_const: true, is_volatile: false });
const w_str_mut = TermPtr(u16, '\0', PtrAttrs { is_const: false, is_volatile: false });
// etc.

This should generalize to various custom pointer types.

Ability to define types such as 0 terminated pointers would a powerful addition to the language. Also it would probably solve #1595.

How it could possibly look like:

pub fn TermPtr(comptime T: type, comptime termValue: T) type {
    return @TransparentStruct(struct { // builtin ensures that there is only one field
        ptr: [*]T // could be any type, for example 'c_int' to pass type safe values to some c api
        fn from(comptime slice: []T) @This() {
            return @This() { .ptr = (slice ++ termValue).ptr };
        }
    });
}

Not sure about modifiers such as const, align. Maybe something like this:

pub fn TermPtr(comptime T: type, comptime termValue: T) type {
    return @TransitiveStruct(struct { // allows to forward type modifiers from the outside
        ptr: [*]T 
        // methods
    });
}
///////
const c_str = TermPtr(u8, 0x00);
extern fn someApi(path: const c_str); // the type here is '[*]const T', but it is type safe

It could even allow to use operators on such type, however it should be optional. For example [index] operator on c_str is ok, but + on type safe c_int wrapper for c api is not ok.

@matthew-mcallister It seems important to me that C types that can be represented in C, can be represented in Zig language without having to import any code. This precedent is set for C integer types, float types, and C pointer types. There must be a C string literal, or Zig string literals must work for C APIs.

Something I hadn't considered yet:

* Would the null terminated slice/array types assert that a null/0 byte does _not_ occur in any of the elements before the `len` index?

If so, this would guarantee the property that after casting a []null T to [*]null T, finding the length based on the null termination would give the same len value as before. However this would mean that casting []null T to [*]null T should have a runtime safety check, which probably means it shouldn't be an implicit cast. Hmm.

Or the type could have a weaker guarantee, which is only that there shall be a null/0 byte at the len index, and makes no guarantees about items otherwise. However, then the "length" may change when implicitly casting from []null T to [*]null T.

Continuing from IRC:

To echo the common usage of null terminated strings, I think the length should always be computed at runtime (at least before the optimizer kicks in). I propose:

null slices

  • []null T (null slice) should be struct { ptr: [*]T } where .len performs a strlen-like operation.
  • []null T implicitly casts to [*]T
  • []null T can be 'cast' to [*]T by simply doing nullslice.ptr
  • [*]T to []null T needs an explicit cast (via @ptrCast)
  • []null T can be 'cast' to []T via nullslice[0..nullslice.len]. The .len here is invoking a strlen-like operation.
  • For safety, you could make indexing a null slice do a length check: null_u8_slice[5] could emit code that does: assert(strnlen(null_u8_slice.ptr, 5) == 5) before the access.

null arrays

  • [N]null T is an array of max size N where the first null should be considered the length.
  • It is similar to []null T except uses a strnlen-like instead of a strlen-like.
  • It is valid to read [N]null T at N. You are guaranteed to get null.
  • [N]null T uses @sizeOf(T)*(N+1) space (or possibly less depending on array packing if T is e.g. u1?)
  • [N]null T implicitly casts to []null T
  • [N]null T implicitly casts to [*]T
  • It is a compile error to write to index N.
  • It is valid to read or write to a [N]null T at any index in the range [0..N)

null literals

  • c"aoeu" is of type [4]null const u8
  • (c"aoeu").len == 4
  • @sizeOf(c"aoeu") == 5
  • (c"ao\0eu").len == 2
  • @sizeOf(c"ao\0eu") == 6

misc notes

  • [*]null T doesn't exist. the length of a null array always "known": it's before the first null!

@daurnimator Why not keep [*]null T/[*c]null T and dispense with the others? If the intended purpose of this feature is to decorate pointers in C FFI definitions, then that would be the minimal solution, plus it would encourage use of "real" slices in Zig code and discourage overuse of strlen.

I feel like null-terminated literals can still conceivably be handled well by a builtin function. Say c"hello" produces a raw [*]null u8 pointer. Then a hypothetical @cstrToSlice function can safely check for the null and make a Zig slice at compile time. Or all string literals can have an implicit terminating null and @sliceToCstr will check that its argument has the null (and no interior nulls?) and return a [*]null u8.

why would we need @cstrToSlice? std.mem.len already works fine, it'll just have an improved prototype that uses [*]null T instead of [*]const T. @sliceToCstr as you described it could also easily be a userland function.

I meant those names as placeholders; I only added the @ to satisfy the requirement that you wouldn't need to import from std. As far as my suggestion goes, it's agnostic to how they're implemented.

Moving some work from Pointer Reform (#770) to here:

  • [ ] add syntax for [N]null T, []null T, and [*]null T
  • [ ] add implicit casting

    • [ ] []null T to []T
    • [ ] [*]null T to [*]T
    • [ ] [N]null T to [N]T
    • [ ] *[N]null T to *[N]T
    • [ ] []null T to [*]null T
    • [ ] [N]null T to []null const T
    • [ ] [N]null T to [*]null const T
  • [ ] make string literals [N]null u8
  • [ ] remove C string literals and C multiline string literals

coming in late to this, but wouldn't it be easier to create a hybrid string type?
the hla language for example uses a hybrid string type that is null terminated with a header containing the max length, the current length and a pointer to the start of the string. this really simplified interactions between C libraries and internal libraries.

Thought it might be helpful to share my experience with this problem. In D I implemented a module to support null-terminated strings.

https://github.com/dragon-lang/mar/blob/master/src/mar/sentinel.d

The general term for arrays that end in a particular value that I've found is a "sentinel array" and or "sentinel ptr". In D I just implemented them as a wrapper struct arround pointers and arrays.

The 2 main benefits I see from having sentinel pointers as a part of the type system are:

  1. functions that take sentinel pointers can declare this requirement, meaning that if a client passes a non-sentinel pointer then they will get a compile error rather than a runtime bug
  2. it allows the application to control when and how sentinel arrays are allocated, rather than having to convert normal zig arrays to sentinel arrays every time a C function is called

My library solution for this in D would be equivalent to something like this in Zig:

pub fn SentinelPtr(comptime T: type) type {
    return struct {
        ptr: [*]T,
        // create a sentinel pointer from `ptr`, assume it ends in a sentinel value
        pub fn init(ptr: [*]T) {
            return @This() {
                .ptr = ptr,
            }
        }
    };
}
pub fn SentinelSlice(comptime T: type) type {
    return struct {
        array: []T,
        // create a sentinel slice from `slice`
        pub fn init(slice: []T) @This() {
            std.debug.assert(slice.ptr[slice.len] == 0);
            return assume(slice);
        }
        // create a sentinel slice from `slice`, assume it ends in a sentinel value
        pub fn assume(slice: []T) @This() {
            return @This() {
                .array = array,
            };
        }
    };
}

It just boils down to wrapping the pointers/slices inside structs and creating helper functions to create/unwrap them.

This is one way to implement it, though, if you do it in a library like this then it will be a bit more verbose than a language solution, and you'll probably want to find a way to allow the types to perform automatic const conversion, i.e.

var chars = [2]u8;
chars[0] = 'a';
chars[1] = '\0';
var x = SentinelPtr(u8).init(chars.*);
var y : SentinelPtr(const u8) = x; // is there a way to support this in zig?

Just to extend and visualize @Rocknest 's syntactical proposal:

[]const u8
[*]const u8
[5]const u8
[_]const u8
[*c]const u8
[null]const u8
[*null]const u8
[5 null]const u8
[_ null]const u8

I appreciate having the null inside the brackets because it describes a property of the array/slice type itself (like 5/_/*), as opposed to the const which qualifies the element type.
(Minor thing)> 馃悿

I appreciate having the null inside the brackets because it describes a property of the array/slice type itself (like 5/_/*), as opposed to the const which qualifies the element type.

But e.g. align is on the outside of the []

But e.g. align is on the outside of the []

Ah, true

Implemented in #3728, landed in 5a98dd42b38b9188cfb96c9ab57dc91af923029f.

Related follow-up issues:

  • #3731
  • #3766
  • #3767
  • #3768
Was this page helpful?
0 / 5 - 0 ratings