Zig: Proposal: New pointer type for C pointers

Created on 5 Jun 2018  路  16Comments  路  Source: ziglang/zig

Latest Progress


Given any C function prototype, a human can look at it and properly decide between which Zig pointer type should be used. Is it a * pointer to a single-item? Or is it a [*] pointer to multiple things?

Unfortunately, Zig has no way to automatically distinguish, and when translating from C, it must pick one or the other. To always pick * makes c.printf(c"hello") give a compile error, and to always pick [*] makes c.scanf("%d", &value) give a compile error.

Currently I've chosen [*] and you can see in this commit that updates the Tetris example that I had to make a workaround function c.ptr and use it everywhere we pass the result of address-of to a C function:
https://github.com/andrewrk/tetris/commit/afdc72e7a932bbfec83a612b01bd2d3d7719391c

Given that one of Zig's goals is to beat C at its own game - even at interfacing with C code, interop with the result of @cImport should be better than this.

So, just like we have integer types that are only intended to be used for C interop, I propose adding a new pointer with this syntax: [*c]T

This pointer would be the pointer type that translate-c (and therefore @cImport) chooses. It has the same semantics as pointers in C:

  • ~It can be null. Adding ? in front would be an additional
    null ability on top of the fact that the pointer might == null.~
  • It implicitly casts to zig pointers (it can still catch alignment and const/volatile violations)
  • It implicitly casts to [*c]c_void.
  • It supports pointer arithmetic.

In other words, it's a type that should only ever be used when using @cImport.And @cImport should only be used when a .h file is unstable and provided by the system. For many cases, one can use translate-c one time, and then manually improve the API by changing [*c] pointers to (possibly nullable) zig pointers, and then committing the generated and manually adjusted code to the source repository.

Here are a couple simple alternatives to this proposal:

  • Accept that there will be some workarounds for FFI when using @cImport

    • Perhaps we could add @ptrToMany builtin which is considered a safe (but explicit) cast to get from *T to [*]T. For example: @ptrToMany(&variable).



      • Note: This is already possible to implement in userland, albeit with a gnarly implementation, and the userland implementation even works at comptime.



  • Allow implicitly casting *T to [*]T
accepted proposal

Most helpful comment

I'm going to accept this, for the same reason that we have c_int, c_void, c_char, and @cImport. The point is seamless C interaction, and we simply need this pointer type to seamlessly operate with C libraries. Maybe we can have compile flags that disallow C features such as this pointer type, for codebases that have no reason to use it.

Proposal Edit: nevermind the part about null. That's just confusing. Even the C pointers will not be allowed to be null, and ? interacts with C pointers in the same way as normal pointers.

All 16 comments

I like the first alternative you listed, leaving the implementation in userland, because it separates the C interop from the language. Adding another pointer variant for one part of one feature is a lot when it can be implemented with a simple function.

Converting between all the pointers might get confusing too. I like the explicitness of a function call.

Another thing I just realized is that pointer of unknown length to opaque type does not make sense. And forward-declared structs, which are very common in .h files, are translated as opaque types. So we can translate this case as single-item pointers. That solves a big chunk of the issue here, but it does leave integers and float values requiring the cast.

What would the pointer turn into when unwrapping the null? @typeOf(c_ptr.?) == ????.
If it turns into [*], then you lose "It implicitly casts to [*c]c_void".

Edit: Also, since [*c]Opaque is a valid C pointer, unwrapping this pointer is invalid because [*]Opaque is invalid

Implicitly casting *T to ?[*]T would probably solve most of the pain points I've had while using @cImports directly without having to add another pointer type.

There are plans to add a [*]null T... could this type be appropriated for what you're talking about here?

It can be null. Adding ? in front would be an additional nullability on top of the fact that the pointer might == null

are a b and c equivalent?

var a: [*]null u8 = null;
var b: [*]null u8 = "";
var c = [*]null u8 { 0 };

EDIT: Removed noise

What's the downside of allowing implicitly casting *T to ?[*]T?

It's good to know whether youre passing one item or multiple items... I guess its the same downside as having only one type of pointer.

What's the downside of allowing implicitly casting *T to ?[*]T?

That's an important question. If we can't come up with any code examples where this could lead to bugs, then we can just allow this cast and that solves the problem perfectly.

Currently I can think of a great example of why this is dangerous: if the [*]T is null terminated, then implicitly casting a *T would compile successfully and then at runtime would be a bug/security vulnerability because the *T isn't null terminated.

After #265 is implemented, this may be less of an issue, but there could be other situations like this.

That's a bit of a weird example imho as you have exactly the same problem with any arbitrary [*]T

For many cases, one can use translate-c one time, and then manually improve the API by changing [*c] pointers to (possibly nullable) zig pointers, and then committing the generated and manually adjusted code to the source repository.

Is this workflow documented and supported? I'd like to do this for SDL's .h file(s).

I'd also like to consider looking into some kind of userland function that does pattern matching to automate this pointer annotation process. This may play into the code generation ideas i batted around in https://github.com/ziglang/zig/issues/383#issuecomment-423354433 .

Is this workflow documented and supported?

1596

I'm not sure what you're asking with regards to support. translate-c is pretty heavily tested and it's the same code that @cImport uses. You end up with .zig code that has extern function declarations, types, constants, and extern variables. If the ABI that it represents is stable then it works great. In fact the workflow is preferred unless the ABI is changing and parsing the .h file is how you discover changes to the ABI, in which case it makes sense to use @cImport instead.

@thejoshwolfe I did a translation of SDL2 to Zig a little while ago which may be a useful start for you: https://github.com/tiehuis/zig-sdl2

I'm going to accept this, for the same reason that we have c_int, c_void, c_char, and @cImport. The point is seamless C interaction, and we simply need this pointer type to seamlessly operate with C libraries. Maybe we can have compile flags that disallow C features such as this pointer type, for codebases that have no reason to use it.

Proposal Edit: nevermind the part about null. That's just confusing. Even the C pointers will not be allowed to be null, and ? interacts with C pointers in the same way as normal pointers.

Inspired by #1831, what is the benefit of having ?*T be a special case of optional types outside of C compatibility? It's definitely valuable for C compatibility, but what if ?[*c]T was the only special case for optional types?

(To give background, the special case is that @sizeOf(?*T) == @sizeOf(*T) (as long as @sizeOf(T) > 0) which is different from the usual semantics where @sizeOf(?T) > @sizeOf(T). Furthermore, @intToPtr(*u8, 0) violates language-level assumptions that non-optional pointers must never have an address of 0, because address 0 is the special value for null optional pointers.)

If you're writing embedded code, and you actually have an object at address 0, Zig's assumptions would be violated by taking a pointer to that object. I looked into whether C has this problem or not, and it's one of those "technically x, but in practice y" complicated situations. And "does C have this problem?" is different from "does LLVM have this problem?", and I don't know the answer to the latter.

C's concept of "null pointers" is important for compatibility with C, but why does Zig have a special case concept of "null pointers" that are different from null everything else in Zig?

The more I read about null pointers in C, the more horrified I am by the semantics. For example, a C null pointer is not necessarily represented by the value zero, which means that technically sometimes doing a pointer<->integer conversion in C requires branching logic to special case null pointers maybe sometimes maybe. I'm starting to think Zig should have a special semantics for C null pointers to insulate the rest of the Zig language from this nonsense. This is another one of those "technically x, but in practice y" situations, but it might be a good idea to protect ourselves early on.

As @thejoshwolfe notes, the actual value of NULL is kind of a nightmare. What are the use cases within Zig for translated C pointers? It seems like you are going to need to know that there is a special value, C's NULL or C++'s nullptr, and deal with it.

How painful is it to write wrappers to deal with this case? I.e. have code that checks the value of the C pointer:

if myCPtr == C_NULL { ... }

Where C_NULL is a special value that is specific to the target platform (LLVM has to deal with this today due to the C/C++ spec so it will have this value). 99% of the time it happens to be zero, but it might not. As @thejoshwolfe notes, embedded programming throws all the assumptions about addresses out the window.

If the use case is to make Zig's null handling interoperate with C data, then this would not work, but I question whether that is going to be a large part of the code.

  • [x] add tokenizer/parser code
  • [x] docs update
  • [x] allow implicit int-to-ptr
  • [x] allow comparison with integers
  • [x] allow implicit casting between it and any other pointer len (single-item / unknown length)
  • [ ] ~allow implicit casting to [*c]c_void~
  • [x] revisit implicit cast from *T to ?*c_void. maybe it shouldn't be allowed anymore
  • [x] allow pointer arithmetic
  • [x] make translate-c choose this type
  • [x] update zig fmt and peg grammar
  • [ ] ~compile error test for integer value %s outside of pointer address range~
  • [x] disallow [*c]T where T is a type not allowed in extern functions
  • [x] compile error test for casting integer to c pointer when the int has more bits than pointers
  • [x] safety check when implicitly casting c pointer to non-c pointer, to make sure it's not null
  • [x] compile error for implicitly between c pointer and non-c pointers, make sure const/volatile are respected
  • [x] compile error for implicitly between c pointer and non-c pointers, make sure alignment is respected
  • [x] update @typeInfo
  • [x] peer type resolution: any other pointer type and [*c]T turns into [*c]T.
  • [ ] ~compile error test for compile time pointer arithmetic overflowing~ #1918
  • [ ] ~runtime safety test for pointer arithmetic overflowing~ #1918
  • [x] comptime pointer arithmetic
  • [ ] ~compile error test for comptime ptr arithmetic with undefined~ #1918
  • [x] implicit casting between C pointer and non-C pointer when the element is itself a C pointer vs non-C pointer
  • [x] fix the nonnull llvm attribute
  • [x] ~disallow pointer attributes on C pointers~
Was this page helpful?
0 / 5 - 0 ratings

Related issues

zimmi picture zimmi  路  3Comments

S0urc3C0de picture S0urc3C0de  路  3Comments

dobkeratops picture dobkeratops  路  3Comments

komuw picture komuw  路  3Comments

DavidYKay picture DavidYKay  路  3Comments