Zig: Proposal: Type variables

Created on 26 Mar 2020  路  18Comments  路  Source: ziglang/zig

Zig seems to be set up to support generic functions quite well, but is lacking some of the flexibility or expressiveness I'd personally hope for. Currently, you can use param: var (or, soon to be, param: anytype if #4820 lands) and then @TypeOf to infer the types.

fn add(a: var, b: @TypeOf(a)) @TypeOf(a) {
    return a + b;
}

add(1, 2) //-> 3

However, this is a bit messy not only in readability, but maintainability; if the type of a changes, but b and/or the return type should not, then _all three_ types will have to be updated.

There exists the other way of doing it, with first-class comptime type variables:

fn add(comptime T: type, a: T, b: T) T {
    return a + b;
}

add(i32, 1, 2) //-> 3

But this requires explicit type specification at the call-sites, which also includes a bit of maintenance overhead.

What would be helpful is some syntactic sugar: Type identifiers prefixed with an immediate $ (e.g. $FooType but not $ FooType) are type variables that can be used in place of types anywhere within the function (including the signature) and the compiler will automatically deduce them.

It may even be syntactic sugar for normal type variables (in the second form above) if that's easier to implement (it'd also be easier to reason about, too).

For example, the above signature becomes:

fn add(a: $T, b: $T) $T {
    return a + b;
}

add(1, 2) //-> 3

This last form should be nearly semantically similar to the first form (depending on how the type checking would be implemented) and identical to the second form (if this proposal were to be implemented as syntactic sugar for the type variables).

proposal

Most helpful comment

I personally dislike the proposal. I myself prefer highly verbose types but even despite my own bias I think that there are some pretty big issues here.

In particular take the example you gave here:

fn add(comptime T: type, l: T, r: T) T {
    return l + r;
}

fn mul(comptime T: type, l: T, r: T) T {
    return l * r;
}

fn muladd(l: var, r: @TypeOf(l), a: @TypeOf(l)) @TypeOf(l) {
    // Extreme example, of course.
    return add(@TypeOf(l), mul(@TypeOf(l), l, r), a);
}

I'd imagine for most programmers, even without knowing much about Zig or anything about the compiler, you can fully reason about how the return types got deduced and therefore how implicit casting rules would behave.

For muladd if r isn't an l or can't be safely cast implicitly to an l, this a compile error. The same is true for a, also the return type is always the same as that of l.

// Proposed syntax
fn add(l: $T, r: $T) $T {
    return l + r;
}

fn mul(l: $T, r: $T) $T {
    return l * r;
}

fn muladd(l: $T, r: $T, a: $T) $T {
    return add(mul(l, r), a);
}

Here its not immediately obvious how the compiler would deduce what $T is. Clearly something must happen but who knows?
Does this just enforce that all of l, r, and a are the same exact type? But why, Zig allows implicit type coercion on function calls. So why do generic functions get to somehow be special?
Does this mean that $T should be @TypeOf(l)? What if I would rather this all coerce to @TypeOf(r)?

Now its not actually possible to do so, as the order of the arguments matter, and if you tried doing @TypeOf(r) for the type of l and left r as var then you'd get an undeclared identifier error. Knowing this you could deduce that $T must be @TypeOf(l).

That's the beauty of the verbose syntax though, you don't need to understand much at all to know why calling add(f64, f32) returns a f64 and why add(f32, f64) is a compile error.

Maybe I'm just being salty here, but this whole $T syntax reminds me a lot of Java's generics where I've seen some pretty frustrating compile errors appear because of really complicated deduction rules in some generic function calls.

All 18 comments

if the type of a changes, but b and/or the return type should not, then _all three_ types will have to be updated

It's unclear to me how the proposal affects this. Wouldn't $T have the same problem?

fn add(a: $T, b: $T) $T {
   return a + b;
}

add(1, 2) //-> 3

This is similar to what Odin does; there, you only specify the $ on the first usage of the new type name to indicate where it's being declared:

add :: proc(a: $T, b: T) -> T {
    return a + b;
}

To this end, Zig's could be:

fn add(a: $T, b: T) T {
    return a + b;
}

As a sidenote, Odin also has extra specialization syntax for this where you can do stuff like this:

sum :: proc(elems: $T/[]$E) -> E { // T=[]E, E=elem type
    tot := E(0);
    for e in elems {
        tot += e;
    }
    return tot;
}

.. meaning that sum can then be passed a slice of any numeric type.

You can also declare mutliple "polymorphic" types in the signature if you'd like:

add :: proc(a: $T, b: $U) -> T {
   return a + T(b);
}

Vow, I love this proposal. It resolves #4820, and gives an elegant way to support generic params.

It's not that I don't like this proposal, but I fail to see how it solves your example problem.

add :: proc(a: $T, b: T) -> T {
    return a + b;
}

Isn't this exactly the same, except for giving it a name, as

fn add(a: var, b: @TypeOf(a)) @TypeOf(a) {
    return a + b;
}

??
Or maybe that was your point?

Another question I have is about this example:

fn add(a: $T, b: $T) $T {
    return a + b;
}

What do you expect to happen if a changes type? A compile error?

I agree the explicit type parameters can get messy, but it also makes it a bit easier to reason about what types things are. I'm fearful of moving in the same direction as C++ templates, where you get weird errors about types which you have no idea where they come from or what they actually are.

@Qix- You used this example, but what is T in this case?

fn add(a: $T, b: $T) $T {
    return a + b;
}

// comptime_int / u8 / ... ?
add(1, 2) //-> 3

Especially when the return type does not depend on the parameters, you will have to specify it explicitly again:

_ = add(@as(f32, 1), 2);

That being said, I actually like the current way of doing

fn myfunction(comptime T: type, a: T, b: T) T {}

it makes it very obvious what is going on when calling a "generic" function.

That being said, I actually like the current way of doing

fn myfunction(comptime T: type, a: T, b: T) T {}

it makes it very obvious what is going on when calling a "generic" function.

This works great when one type parameter is "shared". It doesn't work so well in contexts like this though:

// exaggerated example for the sake of demonstration:
fn myFunction(comptime T1: type, comptime T2: type, comptime T3: type, comptime T4: type, a: T1, b: T2, c: T3) T4 {}

// less messy, but less information
fn myFunction(a: var, b: var, c : var) DeduceReturnType(a,b,c) {}

// or if it was possible to "attach" type predicates to the var keyword
// type predicate: fn(type) bool
fn myFunction(a : var(isType1), b: var(isType2), c: var(isType3)) DeduceReturnType(a,b,c) {}

At some point you also have to wonder what you're doing that you need so many generic type parameters which are all (potentially) different.

I've been wondering, what are valid use cases for var? And why pick var over an explicit type? One example I can think of is when using 'tuples'/structs with unnamed fields, such as for string formatting. But are there any other cases than tuples where var is obviously better or the only possible choice?

My question is: Should var just be used for tuples and actually be renamed to anytuple? And then just use explicit types for all the other cases?

@user00e00 Good point. I never had a function with more than one type parameter, though. So I can't really comment on that.

@BarabasGitHub Because explicit types require knowing the type up-front, which means either duplicating types for explicit calls, or using @TypeOf repeatedly - namely when chaining generic functions and mixing var and comptime T: type.

fn add(comptime T: type, l: T, r: T) T {
    return l + r;
}

fn mul(comptime T: type, l: T, r: T) T {
    return l * r;
}

fn muladd(l: var, r: @TypeOf(l), a: @TypeOf(l)) @TypeOf(l) {
    // Extreme example, of course.
    return add(@TypeOf(l), mul(@TypeOf(l), l, r), a);
}

This is, of course, an extreme example - it's meant more to demonstrate how complicated things can get, especially when mixing the two paradigms together.

However, with type variables, none of this is an issue - it's reduced to what is probably the minimum amount possible (aside from having type-less parameters entirely) while still enforcing proper types at the same time.

// Proposed syntax
fn add(l: $T, r: $T) $T {
    return l + r;
}

fn mul(l: $T, r: $T) $T {
    return l * r;
}

fn muladd(l: $T, r: $T, a: $T) $T {
    return add(mul(l, r), a);
}
  • The functions are easy to visibly discern as generic (given that the $ character is not allowed anywhere else in the Zig grammar aside from strings, as far as I'm aware).
  • They allow potentially non-trivial type configurations be expressed simply.
  • Maintaining these types no longer affects e.g. argument ordering; this reduces the number of callsites that would have to be updated in many cases.
  • In theory, this could replace both param: var (entirely) and the need for type parameters (in many cases) while introducing a simpler syntax that is easy for text editors to highlight and can be expanded to typed functions in a straightforward way.

I personally dislike the proposal. I myself prefer highly verbose types but even despite my own bias I think that there are some pretty big issues here.

In particular take the example you gave here:

fn add(comptime T: type, l: T, r: T) T {
    return l + r;
}

fn mul(comptime T: type, l: T, r: T) T {
    return l * r;
}

fn muladd(l: var, r: @TypeOf(l), a: @TypeOf(l)) @TypeOf(l) {
    // Extreme example, of course.
    return add(@TypeOf(l), mul(@TypeOf(l), l, r), a);
}

I'd imagine for most programmers, even without knowing much about Zig or anything about the compiler, you can fully reason about how the return types got deduced and therefore how implicit casting rules would behave.

For muladd if r isn't an l or can't be safely cast implicitly to an l, this a compile error. The same is true for a, also the return type is always the same as that of l.

// Proposed syntax
fn add(l: $T, r: $T) $T {
    return l + r;
}

fn mul(l: $T, r: $T) $T {
    return l * r;
}

fn muladd(l: $T, r: $T, a: $T) $T {
    return add(mul(l, r), a);
}

Here its not immediately obvious how the compiler would deduce what $T is. Clearly something must happen but who knows?
Does this just enforce that all of l, r, and a are the same exact type? But why, Zig allows implicit type coercion on function calls. So why do generic functions get to somehow be special?
Does this mean that $T should be @TypeOf(l)? What if I would rather this all coerce to @TypeOf(r)?

Now its not actually possible to do so, as the order of the arguments matter, and if you tried doing @TypeOf(r) for the type of l and left r as var then you'd get an undeclared identifier error. Knowing this you could deduce that $T must be @TypeOf(l).

That's the beauty of the verbose syntax though, you don't need to understand much at all to know why calling add(f64, f32) returns a f64 and why add(f32, f64) is a compile error.

Maybe I'm just being salty here, but this whole $T syntax reminds me a lot of Java's generics where I've seen some pretty frustrating compile errors appear because of really complicated deduction rules in some generic function calls.

@AssortedFantasy

Here its not immediately obvious how the compiler would deduce what $T is. Clearly something must happen but who knows?

The example uses $T _everywhere_; my example, like Odin has, is more reasonable.

In which case, only _one_ of those Ts actually has the $ and whichever one it is determines what the type of T is.

Does this mean that $T should be @TypeOf(l)? What if I would rather this all coerce to @TypeOf(r)?

You could just make the signature muladd(l: T, r: $T, a: T) T instead.
Now it's r that determines the type of them all.
In Odin, this doesn't work because you cannot refer to a type variable which is declared later on in the signature, but maybe Zig could allow this to work.

I'm not a huge fan of the single $T and references of T aspect - the _intent_ shown by using the variables is that any use of the $T variable refers to the type variable itself, and there's no other way to refer to it.

Further, there wouldn't be a single variable that determined the type; instead, Zig would ensure that _all_ declarations with the type variable had equal types.

I feel pretty strongly that a single $T followed by references to T would be confusing syntax and thus less helpful than the status quo. I'd personally prefer to enforce all references of a type variable include the $ character.

Zig would ensure that all declarations with the type variable had equal types.

That's what $T followed by T does though?

AFAICT, it's the same as now with comptime T: type, except that you don't have to pass the type as an extra argument.

@Tetralux Yes, it can be done that way, but I feel pretty strongly that it creates more confusion than it solves by allowing $ in some places and not in others. All usages of the type variable should have $ in order to remain consistent.

namely when chaining generic functions and mixing var and comptime T: type.

Yes, I'm arguing that var is the problem here. It's better to be explicit, especially when things get complicated, because if you make a mistake somewhere and everything is a generic 'whatever' then it can be very hard to figure out what's going on.
For example this

fn muladd(l: var, r: @TypeOf(l), a: @TypeOf(l)) @TypeOf(l) {
    // Extreme example, of course.
    return add(@TypeOf(l), mul(@TypeOf(l), l, r), a);
}

I agree this is horrible and I'd argue that you should do it like so:

fn muladd(comptime T: type, l: T, r: T, a: T) T {
    // Extreme example, of course.
    return add(T, mul(T, l, r), a);
}

Not so bad, is it?
It's also clear you're calling generic functions, because you're passing a type.
Instead if you do everything as with var it becomes like C++ templates, but worse because you have no named types. Or with your suggestion it is pretty much like C++ templates. (And for those who don't know, C++ templates are hell to debug and why you sometimes get 20 pages of text for one compile error.)

And then you have questions like what happens if I call: add(@as(i10, 1), @as(i5, 2)) or add(@as(i5, 1) @as(u10, 2))? With var the first will work, the second won't. With your suggestion I honestly don't know, I hope for a compile error due to ambiguity. With explicit types it's obvious. I can do even do add(u15, @as(i5, 1), @as(i10, 2)).

@BarabasGitHub

Or with your suggestion it is pretty much like C++ templates.

(And for those who don't know, C++ templates are hell to debug and why you sometimes get 20 pages of text for one compile error.)

I find it interesting that you feel this way, considering that Odin has very similar approach to what's being suggested here, yet has significantly better error messages, and I've never had problems figuring out what was wrong from them, so I suspect your objection is unfounded.
C++ is a different story though - those errors are complete garbage.

@Tetralux I have no experience with Odin, so it's possible that it can be made to work. Personally I prefer being explicit and keeping things simple. Sometimes it can be a bit more work to type, but it is often easier to understand when reading.

One function in Zig std where I think an explicit parameter might have been better is testing.expectEqual.

Half the time you end up writing this:

    testing.expectEqual(@as(usize, 0), container.size());

while you might as well do

    testing.expectEqual(usize, 0, container.size());

Of course that also makes tests like these longer:

    testing.expectEqual(test_values.len, container.span("a").len);

which now needs to be:

    testing.expectEqual(usize, test_values.len, container.span("a").len);

but I don't feel that's all that bad.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fengb picture fengb  路  3Comments

dobkeratops picture dobkeratops  路  3Comments

andrewrk picture andrewrk  路  3Comments

andersfr picture andersfr  路  3Comments

andrewrk picture andrewrk  路  3Comments