Zig: Lambdas, Closures and everything in between

Created on 4 Jun 2018  ·  36Comments  ·  Source: ziglang/zig

I've been thinking about this topic for a few weeks now, and after quite a bit of research I think I have a solution that fits Zig's needs :).

This is building on #229 (and similar issues), since this is talking about implementation specifically and the issue is old I felt it was worth creating a new one.

Step 1: Make all functions anonymous; I.e. you can define a function like; const Foo = (bar: i32) i32 => { ... }; however of course we could support using fn Foo(bar: i32) i32 { ... } as the short form. In this way you can define functions inline, these aren't closure functions and when compiled will be placed outside the scope of course.

// i.e. to sort
sort(array, (a: i32, b: i32) bool => { return a < b; });

Step 2: Lambdas; Building onto anonymous functions you can define 'inline' lambda functions which are just a single statement that returns like;

sort(array, ($) => $0 < $1);

The $ just acts as a wild card effectively matching whatever the types the input requires, if the input is a 'var' type then it makes a function def that is fn X(a: var, b: var) var, perhaps? Or maybe that is just a compile error, I'm not sold either way.

Step 3: Closures; In the case where you actually want a closure you would define it like any other function but indicate type of function input;

var x = someRuntimeValue();
// 'Long form'
where(array, (a: i32, x = x) bool => { return a < x });
// And as a lambda (implicit 'var' return)
where(array, ($, x = x) => $0 < x));
// The above cases are by value, if you wanted by reference you would just do
where(array, ($, x = &x) => $0 < x));

The above is synonymous to the following Zig code if we allow some kind of implicit cast;

const Closure = struct {
   x: i32, // note: if doing the by value it would be `x: *i32`
   f: fn(i32) bool,
};
where(array, Closure { .x = x, .f = (a: i32, closure: &const Closure) bool => { return a < closure.x } });

HOWEVER, this is where the problem occurs, you require this pointer to exist in the definition, and so we need someway to get around this call and it's been suggested in the past that you can pass around some kind of 'closure' type that allows you to call it like a function but is really just this struct, personally this hides information from the coder and I feel goes against Zig's core, and furthermore would you allow the above 'closure' type to be passed into a function with definition (a: i32) bool?

Instead I propose that we can use LLVM Trampolining in quite a few cases to 'excise' a parameter from the call, which would be the closure information and the call would rather become something like;

const Closure = struct {
   x: i32, // note: if doing the by value it would be `x: *i32`
   // Note: no f
};
fn Foo(env: &Closure, a: i32) bool {
    return a < env.x;
}

var x = runtimeValue();
const c = Closure { .x = x };
where(array, @trampoline(Foo, c));

Note: of course in this case I'm using trampoline as if it was an inbuilt, I'm not actually sure if we want it as an inbuilt, but in reality it is more like generating the LLVM code that trampolines the function. A trampoline (said that word too many times now) just basically stores the value on the stack in advance to the call this would be much more efficient then a typical closure as it would avoid the need for a function pointer and would avoid the ugliness of indirection. HOWEVER, there is not much information on how this effects things like 'arrays of closures' which may instead require a different approach. Note: this is how GCC implements nested functions this is a relevant section.

So basically I propose that we approach this in the order I've given, and perhaps implement that builtin before integrating the closure x = x syntax.

proposal

Most helpful comment

this proposal seems way to complicated to me

the Syntax of funcs should always be nearly the same.

A normal func looks like this:

fn foo(a : i32, b : i32) bool {
    return a < b;
}

thus a lamda should look like this:
no magic $ involved.

sort(   some_arr, 
        (a : i32, b : i32) bool {
            return a < b;
        } 
    )

And it should be possible to have a func inside a func so that it can only be used locally. This is good for code hygiene and refactoring. The Syntax is the same as for a global function thus its very easy to move them around.

closures are simple as well

fn adder( x : i32) type {
    return fn(y : i32) i32 {
        return x + y;
    };
}

var add40 = adder(40);
assert(add40(2)==42);

Lambdas have not been added to C because over 45+ years, no one has ever felt they they needed them.

that is obviously false, just cause its not in c does in no way mean no one needed it

All 36 comments

Trampolines require that the stack is marked executable. That's not the default on most systems and it reduces security, it makes injecting shellcode much easier.

I have some ideas on how to tackle closures but I don't want to derail this issue with counterproposals.

How the closures is alloc?
How to free this alloc?

@bnoordhuis Huh, you got me there, completely forgot that damn. I guess the big challenge with this problem is trying to figure out an efficient solution that isn't just 'every closure gets a struct with a function ptr' since that just deliberately hides information from the coder. Counter proposals are fine since its more an implementation detail. The other idea I had was a 'fat ptr' like solution however that is just bloating a lot of calls, and forces us to use C style function calls (that allow extra parameters).

@bronze1man sorry I don't understand your questions? No need to free closures as we are talking about stack allocations, you can read up on trampolining if you really want the nitty gritty implementation detail.

Could we use @newStackCall as part of the allocation.

var needle : i32 = 10;
var lambda = fn [needle] (x: var) bool { return x >needle; };  // yes cpp capture syntax

What I image is lambda is a struct, that has captured init stack instance and a function pointer:

struct {
    stack: Struct { needle : i32 },
    func: (x: var) bool,
}

Calling the lambda would use newStackCall, I am just not sure how to put the stack into scope for the call.

Could we use @newstackcall as part of the allocation.

Actually I think you are onto something, originally I dismissed your idea since I was like "nah that can't work", however upon considering it I definitely think it could work. What if you could bind a stack to a function, regardless of the functions purpose such as;

This is what the lambda would decompose to basically.

var stack : [8]u8;
var x : i32 = &stack[0]; // skipping cast to make it easier to understand
x.* = 10;
var lambda = fn (a: var) bool { return a < @stack(0, i32); }; // Longer syntax for clearness, @stack giving you the stackptr, then asking for an 'i32' integer
doWhatever(@bindstack(stack[4..], lambda, i32)); // @bindstack returns a function ptr, 4... to indicate stack begins from the '4'th position

Basically the language would have the concept of allowing you to bind a stack to a function, so there is no need to 'implicitly' cast. In effect yes it would look like that struct you gave (kinda) but instead of requiring some weird obscure cast or changing how functions work, we just allow functions to carry a different stack then the one we give it. Of course this requires the function to have a scope within the caller, if the scope is outside perhaps we allocate at start of program then deallocate at end?

As a side note I'm also not too sure about cpp syntax, it looks clunky; and is the cause of much confusion and bugs due to having some finer details that are a little odd. I prefer the more common style that is prevalent in languages like C#, Go, and Python for example since it is much clearer.

I get the capture syntax is tricky when your learning but it was added to c++ for good reason. One of the good things about the capture syntax is it gives you control over what goes into the closure. This fits nicely with the zen and my personal rule of the principle of least surprise (ie, I have no idea what is getting capture in the closure and I have run into performance issues and bugs in D where it was capturing and trying to make deep copies of things I did not want/need.)

Of course, however I still feel that the syntax I originally proposed is clearer i.e. (a: i32, x = x) => { ... } where x = x is by value and x = &x would be by reference; of course you could change the name of the variable to better match its context i.e. upperbound = x, and maybe could even just not include the new name if it is going to be the same therefore being (a: i32, =x, =&y). I guess it also comes down to not having the [] at the beginning of each one which kinda makes it look uglier.

Honestly a personal preference and we both agree that a clear copy/by ref is really needed which is important, which way it goes I think is not important right now; more important is figuring out how the lambda will be implemented.

Ah sorry, I was not trying to defend the c++ syntax, just the concept of capturing. So yeah I am on the same page as you.

@BraedonWooding
I really like how neatly this proposal fits into Zig! Making all functions anonymous is a great way to prevent the language from becoming more complex, especially in implementing closures. It would probably be the best way to implement closures in Zig.

When I say that more discussion is needed before Zig gets closures, it has nothing to do with _your_ proposal, only with the concept in general. I just don't think that this anonymous functions and closures are a good idea in Zig.

Zig aims to be a C alternative, not a C++ alternative. C does not have anonymous functions. Lambdas have not been added to C because over 45+ years, no one has ever felt they they needed them. Anonymous functions are simply not useful in imperative languages such as C, and Zig.

I love Zig because it aims to be an alternative for C, and it is _the only_ language with this goal that is on the right track. As of now, Zig is small and simple, and fit for low-level/embedded/os development. Rust is horrible for this purpose: it is endlessly bloated with features, lang items, and 3 standard libraries (core, alloc, std). But individually, these features are very small, like Rust closures. But boy, do they add up, and make for a hell on earth.

So, unless you can come up with some real-life code examples of why we _need_ lambdas, rather than just some quick convenience in sort(), I remain unconvinced.

Everyone, please give your own perspectives and ideas! I don't want to be "that guy", just spouting disapproval in every Zig github issue. ;)

this proposal seems way to complicated to me

the Syntax of funcs should always be nearly the same.

A normal func looks like this:

fn foo(a : i32, b : i32) bool {
    return a < b;
}

thus a lamda should look like this:
no magic $ involved.

sort(   some_arr, 
        (a : i32, b : i32) bool {
            return a < b;
        } 
    )

And it should be possible to have a func inside a func so that it can only be used locally. This is good for code hygiene and refactoring. The Syntax is the same as for a global function thus its very easy to move them around.

closures are simple as well

fn adder( x : i32) type {
    return fn(y : i32) i32 {
        return x + y;
    };
}

var add40 = adder(40);
assert(add40(2)==42);

Lambdas have not been added to C because over 45+ years, no one has ever felt they they needed them.

that is obviously false, just cause its not in c does in no way mean no one needed it

that is obviously false, just cause its not in c does in no way mean no one needed it

Here are your examples written in current zig.

First:

fn foo(a : i32, b : i32) bool {
    return a < b;
}

sort(some_arr, foo);

Second:

fn add(a: i32, b: i32) i32 {
    return a+b;
}

assert(add(40,2) == 42);

I don't see the benefit of these functional programming features here. I'm not saying I hate closures: I just want to see an example of them being legitimately useful.

Zig zen bullet 4:

Only one obvious way to do things.

local functions would be useful as described

And it should be possible to have a func inside a func so that it can only be used locally. This is good for code hygiene and refactoring. The Syntax is the same as for a global function thus its very easy to move them around.

Making foo available although it is just used for sorting is bad because it makes it subject to being used elsewhere thus unwanted dependency arise and refactoring becomes a problem. That is why with local functions and lambdas are useful. Of course those are contrived examples.

I'm not all for closures, I was merely pointing out that they can be added to the language without fuzz/ real new syntax

Only one obvious way to do things.

If you look at how zig does OO this is not the case currently https://github.com/ziglang/zig/issues/1205#issuecomment-403499288 (its clearly not the obvious way but rather the hacked way). Thus some language features might indeed be needed to improve code quality.

@monouser7dig

If you look at how zig does OO this is not the case currently #1205 (comment) (its clearly not the obvious way but rather the hacked way). Thus some language features might indeed be needed to improve code quality.

Ok, finished my case study of std/io.zig. It's a bit cluttered, and you're right: it's a bit hacked. But the Plan 9 extension thing is the solution: not closures. I can imagine an InputStream interface being defined as a struct with only abstract functions, which would be anonymously included in a FileInputStream. That would be a nice, clean, "OO" way to do this. (Besides, the code for FileInputStream and InputStream could just be merged into one "InputStream", everything should be abstracted as a File in the first place, but that's a discussion for another issue/MR)

EDIT:
Even better, a File can be directly embedded into an InputStream. No need for crazy abstractions. A big problem here is the implementations for os.File and friends for different operating systems is all mushed together in the same code. This means a nice UNIX "everything is a file" ideal is thrown out the window, since it's forced to share its codebase with something like Windows. A Zig "file" should be higher level than an OS file, in such a case.

Is a function obj one ptr size or two ptr size?
Closure syntax like golang can not avoid hidden alloc if zig want support apple ios operate system which do not support generate code at runtime.
I think a closure means a function with a context.
If you want to put some context data with a function obj,you need two ptr size if you do not generate asm code at runtime,one for the function address,one for the context object.
But when you pass in a global function, it only have a function address, and the context object is null.
I think golang implement function type with two pointers. And the context data may more than one pointer size. So the runtime need to alloc that context and store the pointer of that alloc object in the function object. That will cause a hidden alloc if you store that function object in a hashmap or list.
I know zig hates hidden alloc. I think alloc is not avoidable if you use a closure with a lot of context data and store it to hashmap to use it after the function return.

If zig design a syntax that can support closure without hidden alloc. Then there may be two type for runnable type. One for no context plain function which is one pointer size. One for closure with context which is two pointer size.
That two runnable types will cause a lot of other problems.

Just a note, local, anonymous functions can technically be done already, albeit with a very non-intuitive syntax.

const std = @import("std");

pub fn main() void {
    var s = []u8{ 0, 41, 0, 3 };

    std.sort.sort(u8, s[0..], (struct {
        fn local(a: u8, b: u8) bool {
            return a < b;
        }
    }).local);

    for (s) |b| {
        std.debug.warn("{} ", b);
    }
    std.debug.warn("\n");
}

Closure syntax like golang can not avoid hidden alloc if zig want support apple ios operate system which do not support generate code at runtime.
I think a closure means a function with a context.

then just do it like coroutines:

// coroutines 
async<*std.mem.Allocator> fn simpleAsyncFn() void {
    x += 1;
}

const p = try async<std.debug.global_allocator> simpleAsyncFn();
// closure 
fn adder( x : i32) type {
    return fn<*std.mem.Allocator>(y : i32) i32 {
        return x + y;
    };
}

var add40 = adder<std.debug.global_allocator>(40);

the syntax is already established in zig you just have to combine it if needed

I put the <> behind the fn because there is no keyword like async and I think it is more readable.
Async syntax could be changed to

// coroutines changed 
async fn simpleAsyncFn<*std.mem.Allocator>() void {
    x += 1;
}

const p = try async simpleAsyncFn<std.debug.global_allocator>();

@bronze1man, why can't a closure be allocated the same way any other object on the stack is allocated?

@binary132
You can not only allocate closure on the stack.
You have to alloc it on the heap if you want all the functional of closure that golang support.
Because you can store data in the context of the closure.
And the closure can be return to the caller, or store in a global hash map.

It looks like that coroutines add a context is a closure.It already alloc it's stack on the heap or other place, so add a context to it should not add hidden alloc.

what do you mean?

You have to alloc it on the heap if you want all the functional of closure that golang support.
Because you can store data in the context of the closure.

yes I think this would be possible with the proposed syntax that would mimic coroutines, right?

what do you mean?

Sorry, I read the document of the coroutines, and I found I am wrong.

I have a propose of the syntax of anonymous function and closure:

anonymous function:

const std = @import("std");

pub fn main() void {
    var s = []u8{ 0, 41, 0, 3 };

    std.sort.sort(u8, s[0..],
        fn (a: u8, b: u8) bool {
            return a < b;
        }
    });

    for (s) |b| {
        std.debug.warn("{} ", b);
    }
    std.debug.warn("\n");
}

anonymous function is a function type, it is one pointer size. It can not capture any local variables.

closure:

const std = @import("std");

pub fn main() void {
    var s = []u8{ 0, 41, 0, 3 };
    var biggerThan: i32 = 10;

    std.sort.sort(u8, s[0..],
        std.closure(std.debug.global_allocator,
            fn (a: u8, b: u8,biggerThan: i32) bool {
                if (biggerThan>a and biggerThan>b){
                    return a>b;
                }else{
                    return a < b;
                }
            }
        ),biggerThan);

    for (s) |b| {
        std.debug.warn("{} ", b);
    }
    std.debug.warn("\n");
}

closure is a type define in std package. it is not the same as function type.It can pass in local variables by function call.

If the user wants to capture variables in a closure, they should be careful not to capture stack-locals by reference if that closure is going to be passed back. I guess that's why C++ lambdas default to capture by-value. Otherwise, I don't see the problem, and I definitely do not agree that "you have to alloc it on the heap" except in specific cases.

Zig should be careful not to introduce a syntax that causes heap allocation implicitly or by default.

On that note, I actually really like the C++ syntax for lambdas. It is maybe too featureful, but it is cool.

Another point re. C++ closures: if the value is _moved_ into the lambda, then you can just deallocate it when the lambda's lifetime ends, just as you would with any other value. I don't know if that is supported in Zig, but it would be great.

@binary123
Zig does not have move although it probably needs to, see https://github.com/ziglang/zig/issues/782#issuecomment-405097221

... the other issue about „CPP Satax is nice“ really destroys the language because having two totally unrelated ways of defining functions/ closures makes for very bad experience as described earlier.

Zig must keep the syntax coherent and not just copy paste different languages together.

I said _I personally_ like the C++ lambda syntax. I did not say I think Zig should use it. As a matter of fact, I criticized it. I am not at all suggesting Zig should bolt on syntax features from other languages. I _am_ suggesting Zig should _learn from the semantics_ of C++ lambdas.

I agree, having a single consistent syntax for functions is a good thing. But if function declarations can capture variables, that syntax should take capture semantics (move, stack reference, allocation? please no. etc.) into account. This is an unending source of foot-shootings in Go.

I personally like the way Rust does it where the closure context is an implicit struct having the closure as a "method". If you then disallow structs borrowing stack-local references to leave the lifetime of those references, that might be one way around it. But that would add a lot of language complication to get something which could be solved just by letting the user decide whether to capture by reference or not.

@binary132 C++ does the same thing as Rust. If you know C++03, there were only hand-written functor structs. C++11 just added compiler-generator anonymous structs.

@monouser7dig borrowing from other languages can be an invaluable tool. Programming languages have been around for a lot longer than Zig, and many encountered similar issues. Looking at how they solved certain problems should be encouraged.

You could just have a comptime function to create a struct type, and construct the struct, from some names in local scope, optionally by reference, and a function that would become an "eval" method on that struct. That new struct type would then be your "functor".... To evaluate the functor, you'd need to then call .eval() on the returned functor (or add a feature for eval on a struct itself, like an anonymous method?) Maybe that's too explicit, but then the more you capture, the more you have to be aware that you're capturing?

That might fit nicely with the "everything is a struct" issue.

AFAIK the idea of comptime does not extend to complete code generation. It isn't the same as a preprocessor. From what I understand, there is no way to generate a struct on the fly from a set of identifiers using comptime. I came across a simpler instance of this recently. As far as I know, there is no way to generate an array of strings corresponding to enum members at compile time. This is a somewhat unrelated issue, but thought it was worth bringing up to point out comptime isn't a panacea.

you'd need some more type/reflection capabilities in comptime, and something like a "context" object containing metadata of the local context.

@isaachier thats no excuse for inventing new language constructs that could have been done equally well with existing syntax which does exist as shown above by me, or are there any remaining shortcomings in that syntax that you see?

@monouser7dig 100% agree there. If there is a straightforward path from existing syntax to a certain feature, that is the best approach.

http://number-none.com/blow/blog/programming/2014/09/26/carmack-on-inlined-code.html
from John Carmack

Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done

making another point for local functions https://github.com/ziglang/zig/issues/1048#issuecomment-405043360

(and a sane syntax, not structs, that encourage usage of such)

and as a further IMO very good suggestion

It would be kind of nice if C had a “functional” keyword to enforce no global references.

It's already possible to define a "closure" using a function within a struct. This program compiles successfully, for example:

pub fn main() void {
    const j = 1;
    var b = struct{
        fn function(x: i32) i32 {
            return x+j;
        }
    }.function;

    var c = b(1);
}

@jarble Note that this only works as j is comptime known and thus can be captured at compile time. When you initialize j with a runtime-known value, this code will fail

Lambdas have not been added to C because over 45+ years, no one has ever felt they they needed them.

@bjornpagen this is not true.
GCC and Clang have lambda like blocks.

See: https://gist.github.com/gburd/4113660

Was this page helpful?
0 / 5 - 0 ratings