Zig: return type inference

Created on 8 Sep 2017  路  21Comments  路  Source: ziglang/zig

use case:

fn max(a: var, b: var) var {
    return if (a > b) a else b;
}
proposal

Most helpful comment

D has this feature and I think it significantly reduces readability of code and documentation. The standard library is full of functions returning auto. I call a function, it returns something, I have no idea what that something is, and no idea what I can or can't do with it.

All 21 comments

I'm worried that this is too easy. var is much easier to type than &const T, which means authors are incentivized to omit function return types out of laziness. This will lead to a paradigm shift where Zig functions can optionally declare their return type, rather than optionally inferring their return type.

Lessons learned from Python and Haskell say that explicit return types greatly improve readability, so I'm opposed to this proposal as it is.

A counter proposal would be some syntax that is sufficiently painful to use, so that authors are incentivized not to use it, except where it's really the best solution. Perhaps:

fn max(a: var, b: var) -> @typeOf(this.bodyExpression) {
    if (a > b) a else b
}

That still seems too easy to me. And of course in order to make that work and make sense, we'd be dragging in a whole lot of other features with this, so I don't recommend this either.

Here's another way to increase pain:

fn max(a: var, b: var) -> var {
    @setReturnTypeInferrable(this);
    if (a > b) a else b
}

And another way:

fn max(a: var, b: var) -> @inferReturnType(this) {
    if (a > b) a else b
}

All of these approaches are fundamentally flawed, because they're all a fixed snippet of code you could paste in without thinking. The thinking is what we want from the authors. We want the authors to document the return types when possible.

Here's another idea:

fn max(a: var, b: var) ->
        if (@isComptime(a) and @isComptime(b))
            @typeOf(if (a > b) a else b)
        else
            @typeOf(a, b)
{
    if (a > b) a else b
}

Now THAT's painful! (See also #439.)

Can we come up with any examples beyond min() and max()? That last thing I wrote there is actually not that bad of a solution in my opinion.

D has this feature and I think it significantly reduces readability of code and documentation. The standard library is full of functions returning auto. I call a function, it returns something, I have no idea what that something is, and no idea what I can or can't do with it.

Since so many things in Zig are lazily compiled the following suggestion might be a bit tricky.

It seems that most of the worries above are about reducing readability, esp. of documentation/APIs. If the return type is inferred then the documentation/autocomplete hints could contain the inferred type instead of var, no?

Just leaving a note here that came up while doing the stage2 parser rewrite.

FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? (KEYWORD_var / TypeExpr)

var is currently a grammatically accepted return type, but I see now that it is not implemented.

The grammar does not include Keyword_var as a choice for PrimaryTypeExpr, but the original iterative stage2 parser treats it as one. If inferred return types are to be supported, then that behavior actually kinda makes sense, if I understand correctly.

If var were to be included in PrimaryTypeExpr, I think this could be a valid update to the FnProto rule:

- FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? (KEYWORD_var / TypeExpr)
+ FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? TypeExpr

Java opted to allow type inference only for local variables, the reasoning was that return type inference makes it too easy to accidentally break API compatibility when changing implementation details (especially when such methods call each other, changing a method deep down might change the API return type at a distance). There was debate if private (non-pub) fields / methods should allow type inference because they are not part of the API, but ultimately it was decided against to keep the rules simple. Zig could make a different tradeoff here, but I think the argument to enforce explicit commitment to the API (the public one, at least) of a module has a lot of weight.

Zig already infers errors, following the argumentation above it might make sense to enforce explicitly listing the possible errors at API boundaries as well, but that's another issue.

Here's a good use case for this: 2cd5e555818583e77e5601d43d55339e8c4017b0. With this issue implemented, the Min function from that commit would not be needed.

I'll repeat my question from above, as this still seems to be an important question:

Can we come up with any examples beyond min() and max()?

Can we come up with any examples beyond min() and max()?

Implementing a multi-typed glGetUniform for hypothetical GL bindings might be a nice example. Note that in WebGL, the return type varies.

glGetUniform

I don't understand how the bindings layer could figure out the type at compile time. I've never used GL shaders, but it seems like there are a few layers of runtime values that are getting in the way of determining the type at compile time.

Is there a statically typed language binding that knows the type at compile time?

Ah, I see that you might be right! I was thinking you'd get the "inferred" type at the call site, like

var line_width:f32 = try gl.GetUniform("line_width");

And then it would error out if line_width wasn't an f32. But this might be over-complicating things...

For a use case, how about a function that loads a file at comptime (using embedFile) and puts some info from the file in the return type? (e.g. an image loaded at comptime, which returns [w*h]u8)

I could almost have done that here, if I decided not to enforce a fixed width and height.

I'm not actually sure I like the idea of return type inference though, I just thought this was a funny idea.

i like where you're going with that usecase @dbandstra. i can imagine a case that's even more extreme where you load an assets bundle from a .tar.xz at compile time.

const assets = loadAssets("assets.tar.xz");
fn loadAssets(comptime filename: []const u8) [getEntryCount(filename)]Asset {
    // do everything at compile time
}

In order to implement getEntryCount(), you'd need to decompress and iterate over the entire archive (at compile time), and then to implement loadAssets() you'd need to decompress and iterate over it all again. That's real bad.

Consider this workaround that works (untested) in status quo:

const assets = LoadAssetsT("assets.tar.xz").value;
fn LoadAssetsT(comptime filename: []const u8) type {
    comptime var assets = [_]Asset{};
    // read the assets at compile time
    while (something) {
        assets = assets ++ [_]Asset{entry};
    }
    return struct {
        const value: [assets.len]Asset = assets;
    };
}

That trick only works for entirely comptime values.

This reminds me a lot of C++ templates, like std::is_same. Very unergonomic.

I wonder if the compromise is that functions with inferred return type have to be entirely run at compile time.

Here's a use case I just ran into:

Constructing anything where the structure needs to be altered through the course of development, but will be constant at runtime (so there's no need for switches / function pointers).

I was just writing some noise generation functions (to generate maps for games procedurally), and it's incredibly useful to be able to have composeable noise functions so you can combine them quickly & experiment with different combinations. Here are the two I was using:

/// A radial weight, which returns higher values for points closer to the center
const RadialWeight = struct {
    ...
    pub fn gen(self: @This(), x: f32, y: f32) f32 {
    }
};

/// A gradient noise function, for smooth noise
const SimplexNoise = struct {
    ...
    pub fn gen(self: @This(), x: f32, y: f32) f32 {
    }
};

Here are two really common functions. The intended usage is to setup the noise with some parameters (for example with RadialWeight you'd give a radius and a width / height, with SimplexNoise you'd give a scaling factor), then call the 'gen()' function repeatedly with different points in 2D.

Simplex noise will generate something similar to this, for reference -> https://i.stack.imgur.com/LNK39.png

I want to combine various scales of simplex noise, plus a radial weight, to generate an island map (where higher values indicate a higher elevation). Here's an example -> https://shanee.io/imagesT/blog/island-generation/mask_with_height.png

This generation requires a lot of playing around with different combinations of noise, different weights, etc - this may change through development. So I created a third type of noise, for combining any two noises:

pub fn CombinedNoise(comptime Noise1: type, comptime Noise2: type) type {
    ...
}

You can probably guess how this works. I can combine 3 noises by nesting them:

const MyNoise = CombinedNoise(CombinedNoise(SimplexNoise, RadialWeight), SimplexNoise));

I can't put this experimental noise generation in a function, however, because I need to know the return type for the function, which means duplicating the code - one version of the code to work out the types, and another version of the code to actually create the values.

Here's the actual function WITH inferred return types - try and figure out the return type & write it down yourself, as an exercise to the reader ;)

pub fn createNoise(seed: usize, map_size: f32) var {
    const SIMPLEX_SCALE = 1.0 / 90.0;
    const rad = map_size / 2.0;
    // Gen base radial noise with some extra octaves of simplex noise
    const base_noise = CombinedNoise(RadialWeight, SimplexNoise)
        .init(RadialWeight{ .cx = rad, .cy = rad, .r = rad },
              SimplexNoise.init(seed, SIMPLEX_SCALE), 0.7);
    const octave_1 = CombinedNoise(@typeOf(base_noise), SimplexNoise)
        .init(base_noise, SimplexNoise.init(seed, SIMPLEX_SCALE * 2.0), 0.92);
    const octave_2 = CombinedNoise(@typeOf(octave_1), SimplexNoise)
        .init(octave_1, SimplexNoise.init(seed, SIMPLEX_SCALE * 3.0), 0.95);
    const octave_3 = CombinedNoise(@typeOf(octave_2), SimplexNoise)
        .init(octave_2, SimplexNoise.init(seed, SIMPLEX_SCALE * 3.0), 0.98);
    const final_noise = octave_3;
    return final_noise;
}

Now imagine 'hey, I actually want another octave of noise' later on in development.

This might seem like an esoteric example, but would hold true for anything where you want to be able to compose functions at compile time, but may want to alter that composition through DEVELOPMENT (not runtime).

The solution I've had to use for this (and is the solution I'd use in C) is a big switch statement, and a heap allocated array of enums that I loop through. Now the burden of figuring out the types is left to runtime code, rather than the compiler. This is the perfect example of something which would run faster in c++ (and be much nicer to maintain) because the facilities of the language let me compose these things in a reasonable way.

I'd also mention that figuring the types out manually would likely be less readable than just putting 'var' there

Just weighing in, I agree that allowing var return types adds to much potential for unreadable/non-understandable types. I think that using @typeOf in the return type gives enough flexibility. for the min/max example:

fn max(a: var, b: @typeOf(a)) @typeOf(a) {
    return if (a > b) a else b;
}

If allowing a and b as different types and figuring out a compatible type is really necessary, you can write a seperate type function to do so and use that:

fn CompatibleNumericType(a: type, b: type) type {
    //...
    return SomeType;
}

fn max(a: var, b: var) CompatibleNumericType(@typeOf(a), @typeOf(b)) {
    return if (a > b) a else b;
}

Clunky? Maybe. But it's a lot more explicit about what's going on, and should encourage people to stick to simpler forms unless they really need complicated stuff.

This also works today with no modification as far as I'm aware.

Some variant of this would be helpful for translating c macro fn's.

For functions that return comptime arrays/strings, it might be nice to use [_] to infer the length of the returned array:

/// convert comptime string literal to utf16 string
fn utf16(comptime utf8: []const u8) [_]u16 {
    // ...
}

This proposal might interact with the language change introduced with #2749.

This also popped up when I was trying to be generic over function types:

const std = @import("std");

pub fn main() anyerror!void {
  thing(a);
  thing(b);
}

fn thing(f: fn()var) void {
  f();
}
fn a() i32 {
  std.debug.warn("a", .{});
  return 4;
}
fn b() *const[_]u8 {
  std.debug.warn("b", .{});
  return "hey";
}
./src/main.zig:8:13: error: TODO implement inferred return types https://github.com/ziglang/zig/issues/447
fn thing(f: fn()var) void {
            ^
./src/main.zig:4:3: note: referenced here
  thing(a);
  ^

You can work around this with some wrappers:

pub const WrappedAnytype = struct {
    val: anytype,
};

pub fn wrap_val(val: anytype) WrappedAnytype {
    return WrappedAnytype { .val = val };
}
pub fn unwrap_val(wrapped_val: anytype) @TypeOf(wrapped_val.val) {
    return wrapped_val.val;
}

pub fn wrapped_anytype_decrement(val: anytype) WrappedAnytype {
    return wrap_val(val - 1);
}
pub fn anytype_decrement(val: anytype) @TypeOf(unwrap_val(wrapped_anytype_decrement(val))) {
    return unwrap_val(wrapped_anytype_decrement(val));
}

comptime {
    @compileLog(anytype_decrement(0));
}

Output:

| -1
<source>:20:5: error: found compile log statement
    @compileLog(anytype_decrement(0));
    ^
Compiler returned: 1

If you're willing to deal with anonymous struct & function literals and awkward @call syntax, here's a helper:

pub fn unwrap_anytype_func_wrapped(func: anytype) WrappedAnytype {
    const Closure = struct {
        fn func(options: @import("std").builtin.CallOptions, args: anytype) @TypeOf(unwrap_val(@call(options, func, args))) {
            return unwrap_val(@call(options, func, args));
        }
    };
    return wrap_val(Closure.func);
}
pub fn unwrap_anytype_func(func: anytype) @TypeOf(unwrap_val(unwrap_anytype_func_wrapped(func))) {
    return unwrap_val(unwrap_func_wrapped(func));
}

pub const anytype_decrement_call = unwrap_anytype_func(struct {
    fn anytype_decrement(val: anytype) WrappedAnytype {
        return wrap_val(val - 1);
    }
}.anytype_decrement);

comptime {
    @compileLog(anytype_decrement_call(.{}, .{0}));
}

Output:

| -1
<source>:42:5: error: found compile log statement
    @compileLog(anytype_decrement_call(.{}, .{0}));
    ^
Compiler returned: 1

@bb010g That only works when executing the function at compile time. Structs containing var/anytype are comptime-only so the function is always implicitly evaluated at compile time. This won't compile with your example:

export fn foo(x: i32) i32 {
    return anytype_decrement(x);
}

There is no current plan for return type inference. This is a simplification of the language for the person reading the code as well as the compiler implementation.

Was this page helpful?
0 / 5 - 0 ratings