Zig: syntax flaw: return type

Created on 9 Feb 2018 · 65Comments · Source: ziglang/zig

Fundamentally we have this ambiguity:

fn foo() A { }

Is this a function that returns A with an empty body? Or is this a function that returns the struct literal A{} and we're about to parse the body?

Another example

fn foo() error {}

Function that returns an error, or empty error set, and we're about to find the function body?

fn foo() error{}!void {
}

The above is not allowed because zig thinks the return value is error and the function has an empty body, and then is surprised by the !. So we have to do this:

fn foo() (error{}!void) {
}

Unfortunate. And the compile errors for forgetting to put a return type on a function point at the wrong place.

Let's come up with a better syntax for return types. All of these issues are potentially on the table:

could reintroduce -> or => if it helps.
could change the struct literal syntax / error set declaration syntax
multiple return values are potentially in zig's future. but also maybe not.
could potentially go back to allowing leaving off the return type for void return types.
error set inference - which currently looks like this: fn foo() !T {} - is special syntax that belongs to a function definition that marks the function as having an inferred error set. It doesn't necessarily have to be specified as it currently is by leaving off the error set of the ! binary operator.

The syntax change should be satisfying, let's make this the last time everybody has to update all their return type code.

breaking

Source

andrewrk

Most helpful comment

I have another idea. The only change from 0.3.0's syntax is to rename the error type to anyerror or whatever works best.

We split the expression syntax into two parts: TypeExpr and Expr. A TypeExpr is an expression that can return the type type.

[]<Attributes> <TypeExpr>
[*]<Attributes> <TypeExpr>
[<Expr>]<TypeExpr>
<Expr> "." <Symbol>
<Expr> "(" <Exprs> ")"
@<Symbol>  "(" <Exprs> ")"
"(" <Expr> ")"
...

An Expr is all other expressions + TypeExpr.

<Expr> + <Expr>
...
<TypeExpr> "{" <Exprs> "}"
<TypeExpr> "{" <FieldInits> "}"
<TypeExpr>
...

We then change the function syntax to this:

"fn" <Symbol> "(" <Params> ")" <TypeExpr> "{" <Statements> "}"

And now, this is a compiler error:

fn a() A{} {} // error: expected top level decl, found "{"
// (fn a() (A) {}) {} // parens, to show how it was parsed

// Now, if a literal should be parsed for function return type, parens are required:
fn a() (A{}) {}

// Note, that <Expr> "." <Symbol> is a <TypeExpr>. This is therefor possible:
fn a() A{.T = u8}.T {}

Hejsil on 18 Oct 2018

👍7 🎉1

All 65 comments

I'm mentioning it for the sake of completeness, not as a serious suggestion, but the ambiguity can be resolved with arbitrary look-ahead / back-tracking.

For tooling reasons though you probably want zig's grammar to be LL(1) or LR(1).

bnoordhuis on 10 Feb 2018

As an aside, I do like having the -> before the return type, just for consistency with similarly recent languages (Swift, Rust) that employ this syntax. It makes reading code a bit easier; I find.

That said, for this issue it seems the problem is figuring out where the type declaration ends and the bod begins. The main flaw is that { is a valid token inside the return type declaration.

Possible solutions I see:

1) Introduce a token to designate the end of return type and start of body. For example, =, as in:

    const abc = fn(x: X, y: Y) -> Z = {
        // function body
    }

I don't think this is a very good idea. It deviates too much from what everyone is used to in similar languages, and adds a needless clutter to a lot of code just to avoid an ambiguity that only arises some of the time.

2) Forbid { inside the return type, unless surrounded by parenthesis. For example:

    fn(...) -> (A{}) { the return type is `A{}`
    }

     fn(...) -> Z { // the return type is just `Z`
     }

This avoids the ambiguity without requiring the `(..)` to be always present in the return type, but potentially violates the "only one obvious way of doing things", and can cause "code style" wars, with one camp saying "you should always use parenthesis around your return types" and another one saying "never put parenthesis around return types, and never put `{` inside the return types; always use type aliases".

Which brings me to the next idea:

3) Forbid { inside the return type completely; require the use of type aliases.

If you need a function return a type that requires `{` in its definition, the only way to return such a type is to give an alias first. This can be good for readability, and avoids the introduction of having multiple ways of doing the same thing.

I don't have other ideas at the moment.

hasenj on 10 Feb 2018

👍3

@hasenj Your first suggestion is pretty ok if we can declutter it a bit. With number 3, readability can be hindered by forcing everything to have a name/alias. Number 2 seems like a good compromise.

As an iteration on the first, using => to denote the start of the function body:

const S = struct {
    someFn: fn(x: X, y: Y) error{} ! Z,
    foo: u8
}

fn bar(x: X, y: Y) error{} ! Z => {
     // function body
}

// Single expression functions!
fn barWhenXEquals2(y: Y)  !Z => try bar(2, y);

const s = S {.someFn = bar, .foo = 0 };

raulgrell on 27 Feb 2018

@raulgrell taking your => suggestion to its natural conclusion:

if x + y > 10 => {

} else {

}

if foo() => |value| {

} else |err| {

}

while it : |node| it = node.next => {

}

while x < 10 : x += 1 => {

}

switch expressions already match this pattern.

I'm not so sure about this. I'd like to explore changing container initialization syntax and container declaration syntax to disambiguate.

andrewrk on 27 Feb 2018

To clarify:

() on if, while, switch, for, etc is solving the same syntactic ambiguity. So for certain we will do one of these two things:

require () on return types, just like if and friends
solve the syntactic ambiguity a different way, and drop the () on if and friends

andrewrk on 27 Feb 2018

I'd love to rid if and friends of their parentheses, so my vote is strongly for the =>.

The curly brackets for container initialization is common in C, C++, Rust, C#. On the other hand, of those, only Rust allows the omission of the parentheses in if statements.

Alternative container initialization methods include:

const S = struct {a: u8, b: u8};

// Some of these would probably mean adding named parameters in function calls

// keyword - it's a familiar keyword, but doesn't really look right in the language
const s = new S(.a = 0, .b = 1);

// builtin type function - more consistent with other language features. other function names
// can include make, create, init, initialize
const s = S.new(.a = 0, .b = 1);

// Callable type with named parameters - builtin types are callable, so there's some
// consistency here.
const s = S(.a = 0, .b = 1);

// We could probably do with compile time enforcement of Capitalized struct names if
// calling with parentheses or square brackets. On the other hand, it would mean
// comptime created types could look awkward

fn V(comptime T: type) type {
 return struct { value: T }
}

const vs = V(.T = S)(.a = 0, .b = 1);

// We could do square brackets on type is initialization
const s = S[.a = 0, .b = 1];
const vs = V(.T = S)[.a = 0, .b = 1];

// named initializers - I'm not sure how we'd define an initializer outside of the struct's scope.
// I do have a slightly more complicated idea that involves enum arrays, but I'll
// get to that later
const T = struct {
      a: u8,
      b: u8,

      new zeroes => { .a = 0, .b = 0 }
}

const t_zeroes = T::zeroes;

// builtin function
const t_ones = @new(T) { .a = 1, .b = 1};
// with named args
const t_ones = @new(T, .a = 1, .b = 1};

raulgrell on 27 Feb 2018

This is the cleanest iteration of the above ideas I have found, using the new keyword and changing declaration syntax.

Declaration syntax changes to use parentheses instead of curly brackets, and square brackets instead of parentheses. Initialization requires new (T) { .fields = t }. Somewhat inspired by #661

const S = struct(a: u8, b: u8);
const s = new(S) {.a = 0, .b = 0};

fn G(comptime T: type) type { return struct( a: T ); }
const g = new (G(S)) { .a = s };

const E = enum(A, B);
const TE = enum[u8] (A = 3, B = 4);

error NoMem;
const errorSet = error(NoMem);

const C = union[E] ( A: u8, B: void);
const c = new (C) { .A = 0 };

const U = union[enum] (Number: f64, String: []const u8, None)
const n = new (U) { .Number = 0.99};

// This a function that returns A with an empty body
fn foo() A { }

// This is a function that returns the struct literal and an empty body
fn foo() struct(a: u8) { }

// This is a function that returns the struct literal A() and an empty body
fn foo() A() { }

// This is a function that returns an error with an empty body
fn foo() error {}

// This is a function that returns empty error set and an empty body
fn foo() error() {}

// This is fine
fn foo() error()!void {}

Actually, with these changes to declaration, I think initialization syntax can stay the same. I'm just not sure about the "struct literal A()" bit.

const s = S {.a = 0, .b = 0};
const g = G(S) { .a = s };
const c = C { .A = 0 };
const n = U { .Number = 0.99};

raulgrell on 27 Feb 2018

Here's my proposal:

Add the requirement of . before the { when instantiating types and values
Remove the requirement of () for if, while, for, switch, etc.

const Foo = struct.{
    a: i32,
    b: i32,
};

const Id = enum.{
    One,
    Two,
};

const Blah = union(enum).{
    Derp,
    Bar: struct.{
        aoeu: f64,
    },
};

const FileError = error.{
    OutOfMemory,
    AccessDenied,
};

test "aoeu" {
    var array = []u8.{'a', 'b', 'c', 'd'};

    var x = Foo.{
        .a = 1234,
        .b = 5678,
    };

    if x.a == 1234 {
        while true {
            eatStinkyCheese();
        }
    }
}

This fixes all the parsing ambiguity. Now { proceeded by a . is always type/value instantiation, and { not proceeded by a . is always the beginning of a block.

andrewrk on 5 Mar 2018

👍8 👎3

It looks a little awkward but I guess that's no reason not to do it. That said, I didn't see discussion on reintroducing ->. Are there drawbacks besides extra typing?

bnoordhuis on 5 Mar 2018

Are there drawbacks besides extra typing?

Not really. I removed -> because with mandatory return types, it was rendered unnecessary.

-> does not solve the return type syntax problem though. Whether or not we have ->, there is still the fn foo() -> error { } { } ambiguity.

andrewrk on 5 Mar 2018

Ah right. And mandating parens around the return type makes for bad error messages if you forget them. I see the issue now.

bnoordhuis on 5 Mar 2018

I was wondering whether it would be beneficial to change the built in field/reflection syntax from using the regular scope accessor . to a new operator like ::, ie:

Function.return_type becomes Function::return_type and slice.len becomes slice::len. This signals to the reader that whatever you're accessing is something built-in.

Then keep your proposal and make container instantiation requiring a :: followed by a {

const Blah = union(enum)::{
    Derp,
    Bar: struct::{
        aoeu: f64,
    },
};

raulgrell on 5 Mar 2018

Ok with the proposal, but:

var x = Foo.{
        .a = 1234,
        .b = 5678,
    };

Do we still need dot a, dot b here? We already have a dot after Foo.

jido on 8 Mar 2018

Only because this was mentioned in a separate issue, I just wanted to voice my support for your most recent proposal - disregard my :: suggestions as I've come to appreciate the value of the universal . operator.

Regarding @jido's point about the dot (ha, pun), I think they are consistent with the rest of the language.

// Here there is no dot because the identifier is being declared
const Foo = struct.{
    a: i32,
    b: i32,
};

// Here, we would have a dot because the name already exists
var x = Foo.{
    .a = 1234,
    .b = 5678;
};

If we look at consider the enum type inference in switches proposed in #683, they mention of issues with shadowing without the .. This would also be consistent with other parts of the language

const Value = enum(u32).{
    Hundred = 100,
    Thousand = 1000,
    Million = 1000000,
};

const v = Value.Hundred;
switch (v) {
    .Hundred => {},
    .Thousand => {},
    .Million => {}
}

raulgrell on 30 Mar 2018

Thanks for doing that work, @Hejsil .

To those who prefer the previous syntax, I empathize because I also prefer the previous syntax. However parsing ambiguities are simply not acceptable. And so we're stuck with the least worst solution. Proposals for different syntax/solutions are welcome, provided that they are better than status quo and do not introduce any syntax ambiguities.

andrewrk on 15 Oct 2018

@andrewrk as I mention in my previous comment in #1628, using the dot here harm readability, as it's used usually as a member access operators, here it's overloaded with two extra meaning define and new.

If I'm getting this right, the problem is to solve the ambiguity of expressions like this

fn foo() A {}
fn foo() error{}!void {}

All what we need is to surround the return type with symbols that are not used somewhere else, how about

fn foo() `A {}`
fn foo() `error{}!void` {}

To clarify:

() on if, while, switch, for, etc is solving the same syntactic ambiguity. So for certain we will do one of these two things:

require () on return types, just like if and friends

solve the syntactic ambiguity a different way, and drop the () on if and friends

I much prefer to require () on return types and keep it with if, while, switch, for, etc, like it is with C like languages than create a new meaning for .

allochi on 15 Oct 2018

@allochi I'm going to amend your proposal and change it from surrounding the return type with backticks to surrounding it with parentheses. I think that's actually a reasonable proposal. If we end up doing named return values to solve #287, then parentheses around return values is especially reasonable.

andrewrk on 15 Oct 2018

👍2

How about in case of ambiguity use a dot?

fn foo() A {} // function returns A
fn foo() A{} {} // compile error
fn foo() A.{} {} // function returns A{} struct literal

Now everything stays the same for the most part, assuming this works

ghost on 15 Oct 2018

Here's how I'm thinking it:

{} is always the body of what's to the left of it.

const A = struct {} // works fine
fn foo() {} // easily detects that return type is missing

The function decl parsing code is special cased to expect the {} to always refer to its body, and it must be escaped if you want to refer to something else.

fn foo() A.{} {}

ghost on 16 Oct 2018

If you are not using the dot I would use inset operators e.g.

const FileError = error in {
    OutOfMemory,
    AccessDenied,
};

test "aoeu" {
    var array = []u8 of {'a', 'b', 'c', 'd'};

    var x = Foo with {
        .a = 1234,
        .b = 5678,
    };

    if x.a == 1234 {
        while true {
            eatStinkyCheese();
        }
    }
}

jido on 16 Oct 2018

How about in case of ambiguity use a dot?

I don't think this is compatible with the best way of solving the problem, which is that the parser is independent of the type system. Otherwise known as Context Free Grammar.

andrewrk on 16 Oct 2018

If you are not using the dot I would use inset operators e.g.

I don't think in as a keyword is better than ..

andrewrk on 16 Oct 2018

I don't think this is compatible with the best way of solving the problem, which is that the parser is independent of the type system. Otherwise known as Context Free Grammar.

All that's changed (I think) is types in function signatures have a different grammar, .{} instead of {}. I don't know language theory and I haven't thought this through either, but the change sounds benign.

Edit: I'm going to embarrass myself, but a mini language:

Decls -> Decl Decls | Decl
Decl -> Func | Var
Func -> fn Ident () Type2 FuncBody
FuncBody -> { Variable }
Variable -> var Ident : Type;
Type -> Ident {}
Type2 -> Ident . {}

ghost on 16 Oct 2018

@allochi I'm going to amend your proposal and change it from surrounding the return type with backticks to surrounding it with parentheses. I think that's actually a reasonable proposal. If we end up doing named return values to solve #287, then parentheses around return values is especially reasonable.

@andrewrk which is better than what I have proposed, I suggested backticks cause I thought the parentheses are out of question to make parsing easier.

I use parentheses with Go, and it doesn't bother me, although in Go it's only required if you have multiple return values (named or not), with parentheses we will have the following, which is quite reasonable

fn foo() (A{})
fn foo() (error{}!void) {}

pub fn main() (!void) {
   var i: i32 = 42;
}

We can also make zig artistic by introducing

fn foo() ._=|A{}|=_.
fn foo() ._=|error{}!void|=_. {}

Just joking, or maybe not, humm... 😄

allochi on 16 Oct 2018

👍2

Here is another way we could go. What causes ambiguity is initializers and error sets. struct/union/enum type decls are not ambiguous, as it is required that they a followed by a lbrace.

fn a() struct {} {}
fn b() union {} {}
fn c() enum {} {}

Error sets are ambiguous because the error type is a thing.

// is this fn()error or fn()error{} ?
fn d() error {}

We could rename error to anyerror or something, and now, error sets are also not ambiguous.

Now, we just need to solve initializers. @andrewrk and I discussed removing our current initializers, in favor of inferred initializers and implicit cast.

Point(.{.x = 1, .y = 2})

Hejsil on 17 Oct 2018

@Hejsil nice!

But the language allows for this anonymous struct definition in a return type of a function

fn something() ?struct{ x:i32 } {
    return null;
}

I don't know the point and use cases of this, I'm not aware of all Zig rules yet, but this compilers, and it shouldn't IMHO.

allochi on 17 Oct 2018

@andrewrk and I discussed removing our current initializers, in favor of inferred initializers and implicit cast.

Point(.{.x = 1, .y = 2})

I hope you mean this

Point{.x = 1, .y = 2}

back to the original form, (.{}) is slightly clutter.

allochi on 17 Oct 2018

@allochi Your first example is unambiguous. The first pair of braces always has to be part of the struct decl. There is no other way to parse this code.

We can't have Point{.x = 1, .y = 2}, with my proposal, because it is ambiguous.

// Is this a fn()Point or is it a fn()Point{} ?
fn a() Point {} {}

Hejsil on 17 Oct 2018

I feel I'm missing couple of points here:

are we talking about initializing when we talk about Point(.{.x = 1, .y = 2})? like we can't have this back?

const point = Point{.x = 1, .y = 2};

we will have this instead?

const point = Point(.{.x = 1, .y = 2});

are return types only ambiguous when they have empty brackets? like this

fn foo() A{} {
  ...
}

but not like this?

fn foo() A{ x: i32 } {
  ...
}

What I mean:

a. is struct definition not a problem when it's not empty?
b. but it's a problem when it's empty, because we don't know if it's an empty struct A{} or an initialization A{}?

But then in the context of a function definition at return types, why would we have an initializer? and why would we have empty structs/union/enum as return types? what would be the use of them?

Why allow this in the compiler?

fn something() ?struct{} {
    return null;
}

allochi on 17 Oct 2018

Yes, that's what we're talking about. We can't have Point{.x = 1, .y = 2} currently because it is ambiguous during parsing.
Well, in theory, yes, return types are only ambiguous with empty brackets, but it is a little more complicated than this. We're trying to keep the grammar of the language simple, so we keep the lookahead requirements as small as possible (currently, we need 2 token lookahead to parse Zig).

a. No, struct/union/enum is never a problem. The reason for this is that they are required to have braces. You can't have the struct/union/enum keywords without them being followed by braces, as that is the invalid syntax. Therefore, we always know where the braces belong:

// I'll add parens to show why this is not ambiguous
fn a() struct {} // error: expected function body
// fn a() (struct {})

fn a() struct {} {}
// fn a() (struct {}) {}

Now, let's look at initializers:

fn a() A {}
// fn a() (A) {}
// fn a() (A {})
// ^ Ambiguous because an initializer is optional 

fn a() A {} {}
// fn a() (A) {} {}
// fn a() (A {}) {}
// ^ Ambiguous because an initializer is optional

The reason this is allowed is because the gramma is something like this:

"fn" <Ident> "(" List("name" ":" <Expr>) ")" <Expr> <Body>

We need the gramma to be like this, because any expression could be @typeOf(expr) == type, and the function require that the expression of the return type is of type type. This allows Zig generics to work the way they do. A generic data structure in Zig is a struct returned from a function taking a type.

fn Point(comptime T: type) type {
    return struct { x: T, y: T };
}

As for the last example. Any programming language allows alot of weird programs to compile. A programming language cannot be turning complete, if you cannot write an ugly hack in it (I have no proof for this, but I'm pretty sure this is true).

Hejsil on 17 Oct 2018

So can't we just come up with an initialization for empty struct only?

I'm assuming in the parsing phase (I really need to dig more into Zig grammer and compiler) if we find a dot after the braces {. we conclude it's an initialization, if not it's a definition (am I sort of right?), so:

empty vs non-empty struct definition would look like

A{} // empty struct definition
B{ x: i32} // non-empty struct definition

while initialization looks like

A{.} // empty struct initialization
B{ .x = 42 } // non-empty struct initialization

This way I think we don't even need parentheses around return types (maybe), we already now know that A{} is a definition and A{.} is an initialization.

What do you think? and thanks for you patience 😄

allochi on 17 Oct 2018

@allochi No problem. We have these issues so we can discuss and come up with solutions for Zigs problems :)

I still think there is a problem. Consider if we had inferred initializers as they are currently proposed:

fn a() A { .{}; }

We would need more than the . to solve this (or we could change the inferred initializer syntax)

Hejsil on 17 Oct 2018

What about moving the return type definition into the parens of the function declaration?

// status quo
fn sum(a: i32, b: i32) i32 { return a + b; }

// return type moved into the parens
fn sum(a: i32, b: i32 -> i32) { return a + b; }

// longer signature
fn quiteTheLongFunctionName(
    the_first_argument: SomeElaborateType,
    another_argument: AnotherType,
-> TheResultType) {
    return /* ... */;
}

// Again with inline error sets
fn quiteTheLongFunctionName(
    the_first_argument: SomeElaborateType,
    another_argument: AnotherType,
-> error {
    FirstFailure,
    SecondFailure,
}!TheResultType) {
    return /* ... */;
}

Manuzor on 17 Oct 2018

@Manuzor This indeed does solve the problem.
Idk if this deviates too much from calling a function, but it's definitely not bad :thinking:

Hejsil on 17 Oct 2018

@Hejsil Thanks, if the dot is a problem in {.}, maybe another symbole ({0} or {..} or {~}) for empty initializer. This way we can remove the leading dot before braces for initializers.

@Manuzor PostgrSQL functions use similar style, I can't say it's nice to read

CREATE OR REPLACE FUNCTION fn_sqltestout(param_subject text, 
    OUT subject_scramble text, OUT subject_char text)

This could easily lead to have clutter between parentheses:

allocator
function parameters
generics types
return type
4b. or multiple return types (when we have it)

it could take some time to read and understand a function signature, and find out which is which.

allochi on 17 Oct 2018

I propose that we use a different symbol for initialization to remove confusion between defining the type and initializing.

# sign is unused

var p: Point  = #{.x = 1, .y = 0}; // pound sign

Also the empty set seems like a corner case, maybe that should change

bheads on 17 Oct 2018

Adding to @allochi list of empty initializers
{-} {!} {_} {=/=} {#}

bheads on 17 Oct 2018

@allochi I'm not sure how far syntax can help you reduce clutter in function signatures. We didn't even look at qualifiers such as extern "kernel32" or the function calling convention here yet. They all add clutter.

Another alternative was mentioned before which enforces parens for the return type. It solves the ambiguity but it requires me to chase parens to figure out where the set of input parameters ends and the return types begin. I chose the arrow because it stands out.

In the case of multiple return values, my arrow-inside-parens version allows the syntax to extend naturally. You just continue specifying stuff separated by commas.

fn getMultipleResults(
    a: FirstType,
    b: SecondType,
-> error {
    FirstError,
    SecondError
}
! FirstResult,
  SecondResult,
) {
    // ...
}

Now that I'm looking at this, maybe we should consider swapping how return types and the error set are declared... I guess that's another proposal, though.

fn getMultipleResults(
    a: FirstType,
    b: SecondType,
->  FirstResult,
    SecondResult,
! error {
    FirstError,
    SecondError
}) {
    // ...
}

Manuzor on 17 Oct 2018

I have another idea. The only change from 0.3.0's syntax is to rename the error type to anyerror or whatever works best.

We split the expression syntax into two parts: TypeExpr and Expr. A TypeExpr is an expression that can return the type type.

[]<Attributes> <TypeExpr>
[*]<Attributes> <TypeExpr>
[<Expr>]<TypeExpr>
<Expr> "." <Symbol>
<Expr> "(" <Exprs> ")"
@<Symbol>  "(" <Exprs> ")"
"(" <Expr> ")"
...

An Expr is all other expressions + TypeExpr.

<Expr> + <Expr>
...
<TypeExpr> "{" <Exprs> "}"
<TypeExpr> "{" <FieldInits> "}"
<TypeExpr>
...

We then change the function syntax to this:

"fn" <Symbol> "(" <Params> ")" <TypeExpr> "{" <Statements> "}"

And now, this is a compiler error:

fn a() A{} {} // error: expected top level decl, found "{"
// (fn a() (A) {}) {} // parens, to show how it was parsed

// Now, if a literal should be parsed for function return type, parens are required:
fn a() (A{}) {}

// Note, that <Expr> "." <Symbol> is a <TypeExpr>. This is therefor possible:
fn a() A{.T = u8}.T {}

Hejsil on 18 Oct 2018

👍7 🎉1

And, we could make the syntax for try this:

"try" <TypeExpr>

Which allows this:

try somefunc() + 1
// parsed as (try somefunc()) + 1

Maybe TypeExpr is the wrong name. It's an expression who can return all types, including type, where Expr also includes expressions which is limited to returning only certain types.
<Expr> + <Expr> always returns uN or iN.
<TypeExpr>{<FieldInit>} never returns type, errorset, uN, iN ect.

Maybe AnyTypeExpr is a better name.

Hejsil on 18 Oct 2018

We split the expression syntax into two parts: TypeExpr and Expr. A TypeExpr is an expression that can return the type type.

An Expr is all other expressions + TypeExpr.
<Expr> + <Expr>
...
<TypeExpr> "{" <Exprs> "}"
<TypeExpr> "{" <FieldInits> "}"
<TypeExpr>
...
We then change the function syntax to this:
"fn" <Symbol> "(" <Params> ")" <TypeExpr> "{" <Statements> "}"
And now, this is a compiler error:
fn a() A{} {} // error: expected top level decl, found "{"
// (fn a() (A) {}) {} // parens, to show how it was parsed

// Now, if a literal should be parsed for function return type, parens are required:
fn a() (A{}) {}

// Note, that <Expr> "." <Symbol> is a <TypeExpr>. This is therefor possible:
fn a() A{.T = u8}.T {}

That looks quite sane.

jido on 18 Oct 2018

I approve of https://github.com/ziglang/zig/issues/760#issuecomment-430938743. anyerror is a great name for the primitive type that represents the global error set.

andrewrk on 18 Oct 2018

So it's "fn foo() anyerror {}"?

For shorter typing you could have "fn foo() error {}" and "const FileOpenError = errorset {};"

ghost on 18 Oct 2018

@UniqueID1 It is encouraged/good style to use error sets more than error, so I think it makes sense to change error.

Hejsil on 18 Oct 2018

If the standalone error is the only case that causes ambiguity then I also think special casing it is a much better solution than anything else mentioned here before. The identifier anyerror states the intent very clearly.

Manuzor on 18 Oct 2018

@Hejsil Wanting to have (subjectively) uglier syntax for encouragement reasons is a valid viewpoint, if I'm not simplifying your viewpoint too much. I think it's in a way risky. I think it'll encourage people to switch over from the global error set at the wrong time for the wrong reason.

When I say "at the wrong time" I'm channeling my inner Jonathan Blow who has a way of writing really imperfect code at the right time which works for him to stay hyper productive. So if you're not handling errors yet, and you might not for years, then you'll use the global error set to keep yourself from going off on a tangent such as having to define the errors correctly before you'll ever need to handle them.

ghost on 18 Oct 2018

@UniqueID1 I see your concern, but I'm pretty sure inferred error sets cover 99% of this use case (fast prototyping), because in 99% of cases you (or the compiler), knows which errors can be returned from a function.

If you don't care, then just add ! to the return type of all functions that can fail. It's easier to type, and more correct than anyerror.

Hejsil on 19 Oct 2018

❤1

Ahh okay, also anyerror is very descriptive so that's a plus for it. It's actually not bad at all and it tells the newbies exactly what it does.

ghost on 19 Oct 2018

Alright, seems like we might have found a better solution than what was merged in #1628. Any objections should be voiced as soon as possible, as I wanna revert #1628, and start implementing the proposed solution.

One downside to this is that that we have to reject #1659.

Hejsil on 19 Oct 2018

👍3

// Note, that <Expr> "." <Symbol> is a <TypeExpr>. > This is therefor possible:
fn a() A{.T = u8}.T {}

This doesn't seem right. Everything else about the proposed TypeExpr seems good, but I think this example should fail to parse.

The formal grammar specification for Zig isn't always up to date, but it might be able to shed light on this if it's updated with this proposal.

This is also not very important and shouldn't block any forward progress. This is just something to iron out before 1.0.0.

thejoshwolfe on 19 Oct 2018

I don't think we can have that example fail as simple cases like fn a() A.B {} and fn a() A().B {} should work.

My plan was to work on #208, but it seems we still haven't reached a conclusion on this (and this issue seemed to be related, so I switched my focus to this).

I'll probably work towards having the grammar up to date, as that will help with the stage2 parser rewrite when that is gonna happen. Then add some kind of test that ensures that it stays up to date (have the grammar as a bison parser, and run this parser on all Zig src code in this repo on every commit). This will also help when prototyping new syntax.

Hejsil on 19 Oct 2018

@Hejsil, I really like this proposal (https://github.com/ziglang/zig/issues/760#issuecomment-430938743) a lot better than the one using dot to disambiguate. 😁

I don't love the anyerror name but I'm not quite clear on the semantics so I hesitate to comment. It seems kind of abstract and poorly-defined though. This is where Go uses a builtin interface type error that can easily be implemented by any type. Nested / chained errors are also a really common need and all of Go, Rust, C, and C++ have awkward tooling and conventions around that problem. Using a special name for a special error type to imply that it can contain other error values (?) just feels icky and non-obvious.

Is there a place for discussing the error naming or semantics? Please feel free to disregard, just my two cents as an outside observer. But maybe there is an even better solution here.

binary132 on 19 Oct 2018

@binary132 docs and error set issue :)

Hejsil on 19 Oct 2018

@Hejsil thanks a lot for being so communicative and agile toward this issue.

I'm glad that we are getting to agree on a solution for this, I don't mind parentheses in if, for, ..., I admit I got used not to use them in Go for example, but I don't mind them in other languages, and since these are expressions in zig it kind of works well, take for example this from the standard library

const n = if (self.pos + bytes.len <= self.slice.len)
    bytes.len
  else
    self.slice.len - self.pos;

// -- vs --

const n = if self.pos + bytes.len <= self.slice.len
    bytes.len
  else
    self.slice.len - self.pos;

I don't know about the others, but with parentheses it seems to be easier to read.

Now regarding the global error set, why name it anyerror? why not errorset or errors wouldn't that be more descriptive of what it is?

allochi on 19 Oct 2018

@allochi There are a few reasons for anyerror:

error{A,B} is an error set, so I don't think we should rename the error to errorset as that is just confusing.
There is a low chance of anyerror to collid with identifiers, as the standard style is snake_case (any_error).
anyerror communicates quite well if you don't know the language:

// `a` could return any error or u8. 
fn a() anyerror!u8 { ... }

// `a` could return errors or u8. (I could read `fn a() !u8` the same way)
fn a() errors!u8 { ... }

// `a` could return all errors or u8. (It could return all errors at the same time?)
fn a() allerrors!u8 { ... }

Hejsil on 19 Oct 2018

@Hejsil Thanks, no strong feeling againstanyerror, it seems to be fine.

allochi on 19 Oct 2018

error{A,B} is an error set, so I don't think we should rename the error to errorset as that is just confusing.
I mean I don't know about that.

const A = struct {}; // a struct
const A = errorset {}; // an error set

ghost on 19 Oct 2018

@UniqueID1 Sorry, that was a confusing statement. I'm talking about this rename:

fn a() error{}!u8 {} -> fn a() error{}!u8 {}  
fn a() error!u8 {}   -> fn a() anyerror!u8 {}

We could even rename both examples:

fn a() error{}!u8 {} -> fn a() errorset{}!u8 {}  
fn a() error!u8 {}   -> fn a() anyerror!u8 {}

But that's not the goal of this issue.

Hejsil on 19 Oct 2018

Or maybe best:
``rust // Leaveerror` as is. This allows error.A to work without parser hacks
fn a() error { return error.A; }

// Rename error for error sets
const E = errorset {A, B};

Hejsil on 19 Oct 2018

@Hejsil Yeah, this is what I was thinking of too, it make sense this way.

allochi on 20 Oct 2018

That was my proposal.

For shorter typing you could have "fn foo() error {}" and "const FileOpenError = errorset {};"

Edit: ..maybe, I don't know. I'm confused haha.

ghost on 20 Oct 2018

Let me get this straight. The __new__ keyword errorset would be used to declare a new error set, just like struct and friends. The __existing__ keyword error would be used to return a specific error, like return error.OutOfMemory.

If that is correct, then I think there is no need for anyerror anymore.

// no errors from this function.
fn foo() void {
    // ...
}

// inferred errorset from this function.
fn bar() !void {
    // ...
}

// explicit errorset that is declared inline.
fn baz() errorset{ OutOfMemory, FileNotFOund }!void {
    // ...
}

// new explicit errorset
const TheErrors = errorset{ OutOfMemory, FileNotFound };
fn qux() TheErrors!void {
    // ...
}

// status quo to return "any error" still works.
fn corge() error!void {
    // ...
}

// Not sure about this one, but it's certainly not ambiguous.
fn waldo() error.OutOfMemory!void {
    // ...
}

Manuzor on 20 Oct 2018

I believe that there are 3 options

// keywords are error and errorset
fn foo() error {}
const E = errorset {};

// keywords are anyerror and error
fn foo() anyerror {}
const E = error {};

// just the error keyword
fn foo() error {}
fn foo() (error {A,B}) { ... }
const E = error {};

I prefer error + errorset over anyerror + error. However, with functions accepting TypeExpr instead of Expr, I believe we could also just have the error keyword and use parenthesis in some cases as shown above.

Edit: I see that @andrewrk finds the parenthesis unfortunate (first post, 4th example). My bad Hejsil, forgot about that. In that case having two keywords might be better, and it might be clearer overall anyways. Personally I love error + errorset. Whatever the case, Hejsil's change with functions accepting TypeExpr is absolutely excellent. Super cool change!

ghost on 20 Oct 2018

👍1

I agree with @UniqueID1. If there's an ambiguity in parsing inline return types in fn signatures, the right way to resolve the ambiguity is by explicit grouping. ISTM this would only be necessary when declaring a type inline in a fn signature's return type, and it makes it more readable, so the increased write-time effort is minimal compared to the benefits.

a) It does not reducing readability overall, and in fact enhances it. Otherwise, fn signatures can read like a run-on sentence. Easy for humans to parse from left to right.
b) Does not require an awkward new special-purposed builtin keyword.
c) Familiar and known to be usable in Go.

binary132 on 20 Oct 2018

The . syntax has been reverted as of #1685, which implements option 2 in this comment

Hejsil on 13 Nov 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

fix inability to interact with C ABI symbols with underscore name (`_`) by making it a keyword

andrewrk · 3Comments

make @mulAdd support integers, comptime integers, and comptime floats, and no explicit type parameter

andrewrk · 3Comments

question: async/await and threadpools for fs/cpu work

jorangreef · 3Comments

replace "&&" and "||" with "and" and "or"

andrewrk · 3Comments

QOL Proposal: use(xxx) statement to reduce repeating element

bheads · 3Comments