Zig: make null terminated pointer syntax support any sentinel value, not just null

Created on 20 Nov 2019  路  14Comments  路  Source: ziglang/zig

Accepted Proposal


Idea: Instead of [*]null u8, have [*]sentinel(0) u8.

It's not obvious that null means 0: no where else in zig do we call 0 "null".
Additionally, we could open things up to allow arbitrary values as a sentinel in the future.

_Originally posted in https://github.com/ziglang/zig/pull/3728_

accepted proposal

Most helpful comment

Here's an alternate syntax proposal:

  • *[10:0]u8- pointer to array with sentinel 0 at len
  • [:0]u8 - slice with sentinel 0 at len
  • [*:0]u8 - unknown length pointer with sentinel 0
pub extern "c" fn printf(format: [*:0]const u8, ...) c_int;

test "allow any sentinel" {
    assert(@typeOf("hello") == *const [5:0]u8);
    var slice: [:0]const u8 = "hello";
    var ptr: [*:0]const u8 = "hello";
}

test "with enums" {
    const Number = enum {one, two, sentinel};
    var ptr: [*:.sentinel]Number = &[_:.sentinel]Number{.one, .two, .two, .one};
    comptime assert(ptr[4] == .sentinel); // comptime because index is comptime known
}

test "with optional thing" {
    var ptr: [*:null]?i32 = &[_:null]?i32{1, 2, 3, 4};
    comptime assert(ptr[4] == null); // comptime because index is comptime known
}

test "with floats" {
    const nan = std.math.nan_f32;
    var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
    comptime assert(ptr[4] == nan); // comptime because index is comptime known
}
  • [x] basic implementation
  • [ ] index of a sentinel-terminated array which is comptime-known to be len, is a comptime-known value when loading. no-op with safety check when storing.

All 14 comments

Here's an alternate syntax proposal:

  • *[10:0]u8- pointer to array with sentinel 0 at len
  • [:0]u8 - slice with sentinel 0 at len
  • [*:0]u8 - unknown length pointer with sentinel 0
pub extern "c" fn printf(format: [*:0]const u8, ...) c_int;

test "allow any sentinel" {
    assert(@typeOf("hello") == *const [5:0]u8);
    var slice: [:0]const u8 = "hello";
    var ptr: [*:0]const u8 = "hello";
}

test "with enums" {
    const Number = enum {one, two, sentinel};
    var ptr: [*:.sentinel]Number = &[_:.sentinel]Number{.one, .two, .two, .one};
    comptime assert(ptr[4] == .sentinel); // comptime because index is comptime known
}

test "with optional thing" {
    var ptr: [*:null]?i32 = &[_:null]?i32{1, 2, 3, 4};
    comptime assert(ptr[4] == null); // comptime because index is comptime known
}

test "with floats" {
    const nan = std.math.nan_f32;
    var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
    comptime assert(ptr[4] == nan); // comptime because index is comptime known
}
  • [x] basic implementation
  • [ ] index of a sentinel-terminated array which is comptime-known to be len, is a comptime-known value when loading. no-op with safety check when storing.

Another thing that could be part of this proposal, slicing syntax which can obtain sentinel terminated types from non-sentinel-terminated types:

const std = @import("std");
const assert = std.debug.assert;

test "obtaining a null terminated slice" {
    // here we have a normal array
    var buf: [50]u8 = undefined;

    buf[0] = 'a';
    buf[1] = 'b';
    buf[2] = 'c';
    buf[3] = 0;

    // now we obtain a null terminated slice:
    const ptr = buf[0..3 :0];

    // ptr is a pointer to null-terminated array,
    // because the len was comptime known (See #863)
    comptime assert(@typeOf(ptr) == *[3:0]u8);

    var runtime_len: usize = 3;
    const ptr2 = buf[0..runtime_len :0];
    // ptr2 is a null-terminated slice
    comptime assert(@typeOf(ptr) == [:0]u8);


    buf[3] += 1;
    _ = buf[0..3 :0]; // safety panic: slice sentinel assertion failed
}

Along with this:

  • [ ] slicing a sentinel-terminated pointer without this syntax should yield a non-sentinel-terminated result.

related: #1838

Putting the sentinel value inside the brackets (i.e. [*:0] u8) has the benefit that you know you can't do this with a single-value pointer: *sentinel(0) u8. This is good since sentinel pointers don't make sense with single-value pointers.

I like the semantics of "sentinel-terminated pointers" and andrews syntax-proposal is also quite good. This would also allow safe-slicing of C-Strings:

test "index-out-of-bounds"
{
    // here we have a normal array
    var buf: [50]u8 = undefined;

    buf[0] = 'a';
    buf[1] = 'b';
    buf[2] = 'c';
    buf[3] = 0;

    const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"
}
const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"

not saying I'm against the runtime check, just to consider this (admittedly contrived) case in debug/safe builds:

var buf: [50]u8 = undefined;
buf[0] = 'a';
buf[1] = 'b';
buf[2] = 'c';
buf[3] = 0xa;

assert(buf[11] == 0x0a);
const ptr = buf[2..10 :0xa]; // this will not @panic

I admit this could be a non-issue as undefined handling at runtime is expected to evolve.

@MasterQ32 your example in https://github.com/ziglang/zig/issues/3731#issuecomment-557154045 would panic but not for the reason in your comment. It would panic because buf[10] != 0. My proposal for slicing syntax with a sentinel value is that it only does a O(1) assertion: assert(ptr[len] == sentinel);

@andrewrk ah okay, makes sense to not burn that much perf for safety, even in debug builds

about nan's behavior:

  • Is nan equal to nan in the zig language?
  • [*:nan]f32, how to judge the length of the sequence? Is nan equal to nan?

In the above example I saw that nan is used in the sentinel grammar.

test "with floats" {
const nan = std.math.nan_f32;
var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
comptime assert(ptr[4] == nan); // comptime because index is comptime known
}

i am very confused about whether nan can be a sentinel.

  • Is nan equal to nan in the zig language?

The zig language's floating point formats are specified by IEEE-754. https://ziglang.org/documentation/master/#Floats

  • [*:nan]f32, how to judge the length of the sequence? Is nan equal to nan?

The memory has a particular bit pattern. The pointer type allows specifying the sentinel bit pattern. If a == b would return true then it is the sentinel value.

The pointer type allows specifying the sentinel bit pattern.

Okay... so same bit-patterned nan would be equal?

If a == b would return true then it is the sentinel value.

nan == nan is false (as IEEE-754 demands)... which means the comparison is not by bit pattern.

Thanks, I forgot that nan != nan. My example above is no good. Maybe negative zero would be better to use for this example, but really, it's beside the point. The point is that the compiler lets you use any value, provided that the == operator is allowed for the type, and that a == a holds for all possible values of the type.

I think it could even work fine for float types. The safety assertion would use the isNan operator rather than == for this case.

Done as part of #3728

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bronze1man picture bronze1man  路  3Comments

fengb picture fengb  路  3Comments

dobkeratops picture dobkeratops  路  3Comments

andersfr picture andersfr  路  3Comments

komuw picture komuw  路  3Comments