Zig: make null terminated pointer syntax support any sentinel value, not just null

Created on 20 Nov 2019 · 14Comments · Source: ziglang/zig

Idea: Instead of [*]null u8, have [*]sentinel(0) u8.

It's not obvious that null means 0: no where else in zig do we call 0 "null".
Additionally, we could open things up to allow arbitrary values as a sentinel in the future.

_Originally posted in https://github.com/ziglang/zig/pull/3728_

accepted proposal

Source

daurnimator

👍3

Most helpful comment

Here's an alternate syntax proposal:

*[10:0]u8- pointer to array with sentinel 0 at len
[:0]u8 - slice with sentinel 0 at len
[*:0]u8 - unknown length pointer with sentinel 0

pub extern "c" fn printf(format: [*:0]const u8, ...) c_int;

test "allow any sentinel" {
    assert(@typeOf("hello") == *const [5:0]u8);
    var slice: [:0]const u8 = "hello";
    var ptr: [*:0]const u8 = "hello";
}

test "with enums" {
    const Number = enum {one, two, sentinel};
    var ptr: [*:.sentinel]Number = &[_:.sentinel]Number{.one, .two, .two, .one};
    comptime assert(ptr[4] == .sentinel); // comptime because index is comptime known
}

test "with optional thing" {
    var ptr: [*:null]?i32 = &[_:null]?i32{1, 2, 3, 4};
    comptime assert(ptr[4] == null); // comptime because index is comptime known
}

test "with floats" {
    const nan = std.math.nan_f32;
    var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
    comptime assert(ptr[4] == nan); // comptime because index is comptime known
}

[x] basic implementation
[ ] index of a sentinel-terminated array which is comptime-known to be len, is a comptime-known value when loading. no-op with safety check when storing.

andrewrk on 20 Nov 2019

👍7 👎1

All 14 comments

Here's an alternate syntax proposal:

*[10:0]u8- pointer to array with sentinel 0 at len
[:0]u8 - slice with sentinel 0 at len
[*:0]u8 - unknown length pointer with sentinel 0

pub extern "c" fn printf(format: [*:0]const u8, ...) c_int;

test "allow any sentinel" {
    assert(@typeOf("hello") == *const [5:0]u8);
    var slice: [:0]const u8 = "hello";
    var ptr: [*:0]const u8 = "hello";
}

test "with enums" {
    const Number = enum {one, two, sentinel};
    var ptr: [*:.sentinel]Number = &[_:.sentinel]Number{.one, .two, .two, .one};
    comptime assert(ptr[4] == .sentinel); // comptime because index is comptime known
}

test "with optional thing" {
    var ptr: [*:null]?i32 = &[_:null]?i32{1, 2, 3, 4};
    comptime assert(ptr[4] == null); // comptime because index is comptime known
}

test "with floats" {
    const nan = std.math.nan_f32;
    var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
    comptime assert(ptr[4] == nan); // comptime because index is comptime known
}

[x] basic implementation
[ ] index of a sentinel-terminated array which is comptime-known to be len, is a comptime-known value when loading. no-op with safety check when storing.

andrewrk on 20 Nov 2019

👍7 👎1

Another thing that could be part of this proposal, slicing syntax which can obtain sentinel terminated types from non-sentinel-terminated types:

const std = @import("std");
const assert = std.debug.assert;

test "obtaining a null terminated slice" {
    // here we have a normal array
    var buf: [50]u8 = undefined;

    buf[0] = 'a';
    buf[1] = 'b';
    buf[2] = 'c';
    buf[3] = 0;

    // now we obtain a null terminated slice:
    const ptr = buf[0..3 :0];

    // ptr is a pointer to null-terminated array,
    // because the len was comptime known (See #863)
    comptime assert(@typeOf(ptr) == *[3:0]u8);

    var runtime_len: usize = 3;
    const ptr2 = buf[0..runtime_len :0];
    // ptr2 is a null-terminated slice
    comptime assert(@typeOf(ptr) == [:0]u8);


    buf[3] += 1;
    _ = buf[0..3 :0]; // safety panic: slice sentinel assertion failed
}

Along with this:

[ ] slicing a sentinel-terminated pointer without this syntax should yield a non-sentinel-terminated result.

andrewrk on 20 Nov 2019

👍5

related: #1838

emekoi on 21 Nov 2019

Putting the sentinel value inside the brackets (i.e. [*:0] u8) has the benefit that you know you can't do this with a single-value pointer: *sentinel(0) u8. This is good since sentinel pointers don't make sense with single-value pointers.

marler8997 on 21 Nov 2019

👍4

I like the semantics of "sentinel-terminated pointers" and andrews syntax-proposal is also quite good. This would also allow safe-slicing of C-Strings:

test "index-out-of-bounds"
{
    // here we have a normal array
    var buf: [50]u8 = undefined;

    buf[0] = 'a';
    buf[1] = 'b';
    buf[2] = 'c';
    buf[3] = 0;

    const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"
}

MasterQ32 on 21 Nov 2019

👍1

const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"

not saying I'm against the runtime check, just to consider this (admittedly contrived) case in debug/safe builds:

var buf: [50]u8 = undefined;
buf[0] = 'a';
buf[1] = 'b';
buf[2] = 'c';
buf[3] = 0xa;

assert(buf[11] == 0x0a);
const ptr = buf[2..10 :0xa]; // this will not @panic

I admit this could be a non-issue as undefined handling at runtime is expected to evolve.

mikdusan on 21 Nov 2019

👍1

@MasterQ32 your example in https://github.com/ziglang/zig/issues/3731#issuecomment-557154045 would panic but not for the reason in your comment. It would panic because buf[10] != 0. My proposal for slicing syntax with a sentinel value is that it only does a O(1) assertion: assert(ptr[len] == sentinel);

andrewrk on 21 Nov 2019

@andrewrk ah okay, makes sense to not burn that much perf for safety, even in debug builds

MasterQ32 on 21 Nov 2019

about nan's behavior:

Is nan equal to nan in the zig language?
[*:nan]f32, how to judge the length of the sequence? Is nan equal to nan?

In the above example I saw that nan is used in the sentinel grammar.

test "with floats" {
const nan = std.math.nan_f32;
var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
comptime assert(ptr[4] == nan); // comptime because index is comptime known
}

i am very confused about whether nan can be a sentinel.

hyu1996 on 22 Nov 2019

Is nan equal to nan in the zig language?

The zig language's floating point formats are specified by IEEE-754. https://ziglang.org/documentation/master/#Floats

[*:nan]f32, how to judge the length of the sequence? Is nan equal to nan?

The memory has a particular bit pattern. The pointer type allows specifying the sentinel bit pattern. If a == b would return true then it is the sentinel value.

andrewrk on 22 Nov 2019

The pointer type allows specifying the sentinel bit pattern.

Okay... so same bit-patterned nan would be equal?

If a == b would return true then it is the sentinel value.

nan == nan is false (as IEEE-754 demands)... which means the comparison is not by bit pattern.

daurnimator on 22 Nov 2019

Thanks, I forgot that nan != nan. My example above is no good. Maybe negative zero would be better to use for this example, but really, it's beside the point. The point is that the compiler lets you use any value, provided that the == operator is allowed for the type, and that a == a holds for all possible values of the type.

I think it could even work fine for float types. The safety assertion would use the isNan operator rather than == for this case.

andrewrk on 22 Nov 2019

Done as part of #3728

daurnimator on 25 Nov 2019

I moved the unfinished parts of https://github.com/ziglang/zig/issues/3731#issuecomment-556436531 and https://github.com/ziglang/zig/issues/3731#issuecomment-556483080 to #3770.

andrewrk on 25 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

cimport fails to escape characters correctly for zig

daurnimator · 3Comments

make @mulAdd support integers, comptime integers, and comptime floats, and no explicit type parameter

andrewrk · 3Comments

replace "&&" and "||" with "and" and "or"

andrewrk · 3Comments

QOL Proposal: use(xxx) statement to reduce repeating element

bheads · 3Comments

proposal: rename List to Vector in standard library

andrewrk · 3Comments