Idea: Instead of [*]null u8, have [*]sentinel(0) u8.
It's not obvious that null means 0: no where else in zig do we call 0 "null".
Additionally, we could open things up to allow arbitrary values as a sentinel in the future.
_Originally posted in https://github.com/ziglang/zig/pull/3728_
Here's an alternate syntax proposal:
*[10:0]u8- pointer to array with sentinel 0 at len[:0]u8 - slice with sentinel 0 at len[*:0]u8 - unknown length pointer with sentinel 0pub extern "c" fn printf(format: [*:0]const u8, ...) c_int;
test "allow any sentinel" {
assert(@typeOf("hello") == *const [5:0]u8);
var slice: [:0]const u8 = "hello";
var ptr: [*:0]const u8 = "hello";
}
test "with enums" {
const Number = enum {one, two, sentinel};
var ptr: [*:.sentinel]Number = &[_:.sentinel]Number{.one, .two, .two, .one};
comptime assert(ptr[4] == .sentinel); // comptime because index is comptime known
}
test "with optional thing" {
var ptr: [*:null]?i32 = &[_:null]?i32{1, 2, 3, 4};
comptime assert(ptr[4] == null); // comptime because index is comptime known
}
test "with floats" {
const nan = std.math.nan_f32;
var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
comptime assert(ptr[4] == nan); // comptime because index is comptime known
}
Another thing that could be part of this proposal, slicing syntax which can obtain sentinel terminated types from non-sentinel-terminated types:
const std = @import("std");
const assert = std.debug.assert;
test "obtaining a null terminated slice" {
// here we have a normal array
var buf: [50]u8 = undefined;
buf[0] = 'a';
buf[1] = 'b';
buf[2] = 'c';
buf[3] = 0;
// now we obtain a null terminated slice:
const ptr = buf[0..3 :0];
// ptr is a pointer to null-terminated array,
// because the len was comptime known (See #863)
comptime assert(@typeOf(ptr) == *[3:0]u8);
var runtime_len: usize = 3;
const ptr2 = buf[0..runtime_len :0];
// ptr2 is a null-terminated slice
comptime assert(@typeOf(ptr) == [:0]u8);
buf[3] += 1;
_ = buf[0..3 :0]; // safety panic: slice sentinel assertion failed
}
Along with this:
related: #1838
Putting the sentinel value inside the brackets (i.e. [*:0] u8) has the benefit that you know you can't do this with a single-value pointer: *sentinel(0) u8. This is good since sentinel pointers don't make sense with single-value pointers.
I like the semantics of "sentinel-terminated pointers" and andrews syntax-proposal is also quite good. This would also allow safe-slicing of C-Strings:
test "index-out-of-bounds"
{
// here we have a normal array
var buf: [50]u8 = undefined;
buf[0] = 'a';
buf[1] = 'b';
buf[2] = 'c';
buf[3] = 0;
const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"
}
const ptr = buf[2..10 :0]; // this will @panic with "index-out-of-bounds"
not saying I'm against the runtime check, just to consider this (admittedly contrived) case in debug/safe builds:
var buf: [50]u8 = undefined;
buf[0] = 'a';
buf[1] = 'b';
buf[2] = 'c';
buf[3] = 0xa;
assert(buf[11] == 0x0a);
const ptr = buf[2..10 :0xa]; // this will not @panic
I admit this could be a non-issue as undefined handling at runtime is expected to evolve.
@MasterQ32 your example in https://github.com/ziglang/zig/issues/3731#issuecomment-557154045 would panic but not for the reason in your comment. It would panic because buf[10] != 0. My proposal for slicing syntax with a sentinel value is that it only does a O(1) assertion: assert(ptr[len] == sentinel);
@andrewrk ah okay, makes sense to not burn that much perf for safety, even in debug builds
about nan's behavior:
In the above example I saw that nan is used in the sentinel grammar.
test "with floats" {
const nan = std.math.nan_f32;
var ptr: [*:nan]f32 = &[_:nan]f32{1.1, 2.2, 3.3, 4.4};
comptime assert(ptr[4] == nan); // comptime because index is comptime known
}
i am very confused about whether nan can be a sentinel.
- Is nan equal to nan in the zig language?
The zig language's floating point formats are specified by IEEE-754. https://ziglang.org/documentation/master/#Floats
- [*:nan]f32, how to judge the length of the sequence? Is nan equal to nan?
The memory has a particular bit pattern. The pointer type allows specifying the sentinel bit pattern. If a == b would return true then it is the sentinel value.
The pointer type allows specifying the sentinel bit pattern.
Okay... so same bit-patterned nan would be equal?
If
a == bwould returntruethen it is the sentinel value.
nan == nan is false (as IEEE-754 demands)... which means the comparison is not by bit pattern.
Thanks, I forgot that nan != nan. My example above is no good. Maybe negative zero would be better to use for this example, but really, it's beside the point. The point is that the compiler lets you use any value, provided that the == operator is allowed for the type, and that a == a holds for all possible values of the type.
I think it could even work fine for float types. The safety assertion would use the isNan operator rather than == for this case.
Done as part of #3728
I moved the unfinished parts of https://github.com/ziglang/zig/issues/3731#issuecomment-556436531 and https://github.com/ziglang/zig/issues/3731#issuecomment-556483080 to #3770.
Most helpful comment
Here's an alternate syntax proposal:
*[10:0]u8- pointer to array with sentinel 0 at len[:0]u8- slice with sentinel 0 at len[*:0]u8- unknown length pointer with sentinel 0