Rfcs: Filtered Blocks

Created on 4 Aug 2016  路  24Comments  路  Source: rust-lang/rfcs

In https://www.youtube.com/watch?v=QM1iUe6IofM the speaker describes an interesting idea that may be summarized as a block that explicitly states which variables from the enclosing scope will be used. Read/write is not explicit, only usage.

Today a plain-nested block that accesses the enclosing scope looks like this,

fn main() {
    let mut foo = 5;
    // Other variables

    {
        foo += 2;
    }

    assert_eq!(7, foo);
}

For lengthier blocks such as,

lfn main() {
    let mut foo = 5;
    let bar = 6;
    // Many other variables

    {
        foo += 2;
        // Other lines of code, only by inspection
        // can we tell what was used. Maybe bar was
        // used, maybe not. We'd have to study the code
        // to know
    }

    assert_eq!(7, foo);
}

there may be several of the preceding variables in the enclosing scope used inside the block.

If we could specify which variables that we'll access inside the nested block we could write something like,

using foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

While semantically equivalent to a named function we do not have to come up with a descriptive name for the function, nor can we call the block again, making unintended use "impossible".

I'd humbly suggest that this would be a nice addition to the Rust language.

T-lang

Most helpful comment

Programming is all about managing complexity effectively. As your codebase grows, you want to be able to keep your head around what exactly is going on. Keeping cognitive load to a minimum is vital in this regard.

There are a couple of reasons to extract a block of code into a function:
The first is abstraction. Being only human, there's only so much we can keep in our head at once. We want to be able to reason at a higher level without implementation details clogging up our cognitive resources. We want to think about _what_ is being done, not _how_. So we break our programs up into smaller, logical, modular chunks, and our brains don't overheat as a result.

The second is reuse. Don't Repeat Yourself. We don't want to be writing code with the same functionality over and over again.

The problem with extracting code into a function is that it conflates these two goals, and comes with a few downsides. If you have a single logical unit of behaviour, you don't _necessarily_ want to be able to reuse it. If not, extracting to a function leads to unnecessary indirection. You can't just read through the outer function to understand its behaviour, now you have to firstly _find_ the implementation of the extracted function, read it, and then come back and continue reading. Unnecessary extra complexity, poor use of limited cognitive resources.

Not only that, having a new name floating around gives you another thing to worry about. Is this function being called anywhere else? Who knows? It takes a few extra steps to find out. Another thing to worry about. Another potential way to complect the call graph.

The way I see it, the main benefit of a construct like using is to get the abstraction benefit of modular chunks of code, without the complexity cost of having to look elsewhere for the implementation. It also avoids the complexity of having to think about whether this chunk of modular code is called anywhere else (it isn't).

This seems to me to be an incredibly useful thing to have.

As mentioned in the video, it should be easy to add editor/ide support for extracting a use block to a function, if you do end up finding another use for it.

(Side note, I'm not super keen on the keyword using; too similar to use, and I'd prefer a shorter one. I can't think of anything better for now, though.)

All 24 comments

Unfortunately, that would be one more special syntax we'd have to teach users and, IMO, it doesn't pull its weight (I've never personally wanted something like this). As you pointed out, you can just use a function. If you don't want to define a separate function, you could have a macro do it:

macro_rules! restricted {
    (using $($var:ident),*; $($code:tt)*) => { /* stuff */ },
}

restricted! {
    using foo, bar;
    println!("{}, {}", foo, bar);
}

Implementation (that, unfortunately, does require declaring explicitly declaring mutability):

macro_rules! restricted {
    (@inner ($($refs:expr,)*) ($($vars:tt)*) , &mut $var:ident $($rest:tt)+) => {
        restricted!(@inner ($($refs,)* &mut $var,) ($($vars)* $var: &mut _,) $($rest)+)
    };
    (@inner ($($refs:expr,)*) ($($vars:tt)*) , &$var:ident $($rest:tt)+) => {
        restricted!(@inner ($($refs,)* &$var,) ($($vars)* $var: &_,) $($rest)+)
    };
    (@inner ($($refs:expr,)*) ($($vars:tt)*) , $var:ident $($rest:tt)+) => {
        restricted!(@inner ($($refs,)* $var,) ($($vars)* $var: _) $($rest)+)
    };
    (@inner ($($refs:expr,)*) ($($vars:tt)*) ; $($code:tt)*) => {{
        let inner = |$($vars)* _: ()| {
            $($code)*
        };
        {
            fn assert_static<T: 'static>(_: &T) {}
            assert_static(&inner);
        }
        inner($($refs,)* ())
    }};
    (using $($rest:tt)*) => {{
        restricted!(@inner () (), $($rest)*)
    }};
}

fn main() {
    let foo = 1u32;
    let bar = 2u32;
    let mut sum = 0u32;
    restricted! {
        using &foo, &bar, &mut sum;
        *sum = foo + bar;
    };
    println!("{}", sum);
}

This is not worth the price of complexity.

I do not understand what鈥檚 the point of this proposal if the using peach, banana { ... } isn鈥檛 a sugar for { let peach = peach; let banana = banana; ... }, but even if it was I鈥檓 not really keen on the idea.

@nagisa Basically, he doesn't want to accidentally use/mutate the wrong variable. IMO, the best way to deal with this is to just use descriptive variable names (or define a new function).

@nagisa You hit it square on the 'noggin. Unlike descriptive variable names, a feature like this let's us declare intent that can be verified at compile-time. As for new functions, they fill the same need but they create a named entity that can be called anywhere within its enclosing scope.

Programming is all about managing complexity effectively. As your codebase grows, you want to be able to keep your head around what exactly is going on. Keeping cognitive load to a minimum is vital in this regard.

There are a couple of reasons to extract a block of code into a function:
The first is abstraction. Being only human, there's only so much we can keep in our head at once. We want to be able to reason at a higher level without implementation details clogging up our cognitive resources. We want to think about _what_ is being done, not _how_. So we break our programs up into smaller, logical, modular chunks, and our brains don't overheat as a result.

The second is reuse. Don't Repeat Yourself. We don't want to be writing code with the same functionality over and over again.

The problem with extracting code into a function is that it conflates these two goals, and comes with a few downsides. If you have a single logical unit of behaviour, you don't _necessarily_ want to be able to reuse it. If not, extracting to a function leads to unnecessary indirection. You can't just read through the outer function to understand its behaviour, now you have to firstly _find_ the implementation of the extracted function, read it, and then come back and continue reading. Unnecessary extra complexity, poor use of limited cognitive resources.

Not only that, having a new name floating around gives you another thing to worry about. Is this function being called anywhere else? Who knows? It takes a few extra steps to find out. Another thing to worry about. Another potential way to complect the call graph.

The way I see it, the main benefit of a construct like using is to get the abstraction benefit of modular chunks of code, without the complexity cost of having to look elsewhere for the implementation. It also avoids the complexity of having to think about whether this chunk of modular code is called anywhere else (it isn't).

This seems to me to be an incredibly useful thing to have.

As mentioned in the video, it should be easy to add editor/ide support for extracting a use block to a function, if you do end up finding another use for it.

(Side note, I'm not super keen on the keyword using; too similar to use, and I'd prefer a shorter one. I can't think of anything better for now, though.)

@Stebalien

Basically, he doesn't want to accidentally use/mutate the wrong variable.

I feel like this doesn't quite capture the benefit. It's not about writing (it's not that hard to just not use a variable you don't want to use), it's about reading. Signalling to the reader that they don't have to worry about any effects of the block on the surrounding state, _except_ those caused by mutating the "using-ed" variables. Modularity.

@sullyj3 You described it as I wish that I would have, it is for precisely those reasons that you outlined that I think that this language construct would be worthwhile. Can't come up with a better keyword myself either.

As an aside: using a function isn't always a good solution, because they interfere with flow control, the variables being captured have to be specified twice, you have to manually ascribe all the types, and you can't pass by "owned pointer". All this means it's not just a question of slapping some tokens around a block and having it work.

Jon Blow also gave a pretty good justification for this in his own language, Jai: it exists as an intermediate step when refactoring code into a function: specify the variables you _expect_ the block is capturing, fix the code to actually conform to those expectations, then you can easily lift into a function.

This syntax, if introduced, should _also_ be applicable to closures.

@DanielKeep I disagree that the syntax should be applicable to closures. Whatever keyword is chosen a closure declaration might end up being _even_ more verbose. Consider,

let plus_one = |x: i32| -> i32 { x + 1 };

Adding another keyword into that mix will, in my opinion, have a negative effect on legibility.
Furthermore, a closure is already named, and if you want to explicitly declare which variables are used you may as well define a function (possibly nested). Move semantics and borrowing is already well-defined with respect to closures.

@sullyj3 Would with make for a nice keyword? It is shorter than using and not at all similar to use.

Then, we'd see code such as

with foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

which I read as "with foo and bar do all of these things".

I like that. One potential issue is that for Python programmers, with means invoke context management. I don't think it'd be too big a deal though.

I think that the usage is enough to distinguish Python with from a Rust with.

In my opinion this Python code

with open("x.txt") as f:

reads very differently from any Rust with statement I imagine myself writing.

@leksak Well, it's not like you have much choice in the matter. The body of a closure _is an expression_, so it would be _really_ weird if you _couldn't_ use this construct for a closure. Actually, now that I think about it, you could also just have the body of the block _be_ a closure.

I happen to think that since one of the purposes of a closure is to abstract out repeated code, it might be wise to consider how to most ergonomically integrate the new syntax with them, that's all. Just be to clear, since you might have misconstrued what I wrote: I'm not saying you _have_ to specify captures on closures, I'm saying it should be _possible._

Also, here's a description of how Jon Blow saw this being used for code refactoring purposes. In particular, this little snippet showing the progression from block to named function:

                                 { ... } // Anonymous code block
                       [capture] { ... } // Captured code block
     (i: int) -> float [capture] { ... } // Anonymous function
f := (i: int) -> float [capture] { ... } // Named local function

@DanielKeep:

I'm not saying you have to specify captures on closures, I'm saying it should be possible.

My interpretation was exactly that, I might not have a choice in the matter but I don't have to be keen on how it would look. But I'd rather have this feature, and extend the docs for closures, than not have this feature for the reasons outlined by @sullyj3.

@DanielKeep, I am trying to dream up some consistent syntax for this language construct, that aligns well with the existing syntax. Maybe you can weigh in?

For plain with blocks, I think it is reasonable to state the used variables up-front, like so

with foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

We have multiple ways of writing out a closure today,

let num = 5;
let plus_num_v1 = |x| x + num; // No type annotation
let plus_num_v2 = |x: i32| x + num; // Type annotation for input argument

// Type annotation for input argument and return value
let plus_num_v3 = |x: i32| -> i32 { x + num } 

As mentioned in the Rust documentation, this is consistent with function declarations,

fn  plus_num_v1   (x: i32) -> i32 { x + num }
let plus_num_v2 = |x: i32| -> i32 { x + num };
let plus_num_v3 = |x: i32|             x + num;  

We have to solve the problem of adding in the keyword, (we will use with for now) as well as the arguments to the with keyword, without making the resulting code look entirely too muddy.

The nosiest closure, according to me, is the one that uses type annotations. Hence, I want to "solve" adding in the language construct to let plus_num_v2 = |x: i32| -> i32 { x + num };. Personally, I cannot figure out an appeasing look, for instance including the keyword inside the || makes the semantics unclear,

let plus_num_v2 = |x: i32 with num| -> i32 { x + num };

Adding with at the end does not impact the current closure syntax, but is inconsistent with how I imagine a with block would look, I.e. having

let plus_num_v2 = |x: i32| -> i32 { x + num } with num;

should arguably change

with foo, bar {
    // uses foo, bar. Linter could catch if one of them is unused
}

to

{
    // uses foo, bar. Linter could catch if one of them is unused
} with foo, bar;

which I think is a poor trade-off. Also, we have to imagine how it would look like using type annotations in both cases. I think

with foo: i32, bar: i32 {
    // uses foo, bar. Linter could catch if one of them is unused
}

plays out fine, but the closure examples I have offered suffer even further in those cases. Any suggestions?

I'd probably just go with |x| -> i32 with num { x + num }. We already have where coming after the return type on functions. That way, the rule can just be "it goes before braces".

As an aside, there should probably be an explicit syntax for "captures nothing"; perhaps with () { ... }.

Actually, one of the problems with global variables is that you can't tell where they might be used. As such, it might be _nice_ to extend this to functions as well: fn f() -> i32 with some_global { ... }. In that case, I can imagine a clippy lint that lets you _require_ all captures be explicit, which I bet would make some people happy.

@DanielKeep that looks nice, and I agree with both of your additions.

IMO this adds complexity for a marginal gain at best and does not pull its weight.

With regard to @sullyj3 's comment:

The problem with extracting code into a function is that it conflates these two goals, and comes with a few downsides. If you have a single logical unit of behaviour, you don't necessarily want to be able to reuse it. If not, extracting to a function leads to unnecessary indirection. You can't just read through the outer function to understand its behaviour, now you have to firstly find the implementation of the extracted function, read it, and then come back and continue reading. Unnecessary extra complexity, poor use of limited cognitive resources.

Not only that, having a new name floating around gives you another thing to worry about. Is this function being called anywhere else? Who knows? It takes a few extra steps to find out. Another thing to worry about. Another potential way to complect the call graph.

I would definitely agree with the above for C code but not for Rust. Rust allows nesting functions and that alleviates most of the above concerns. Not to mention the availability of closures.

I vote nay for this proposal.

This would be extremely simple construct to implement in Lever programming language. But I think it will not provide sufficient advantage to be worth the cost it takes to document the added semantics. Because the function boundaries already solve the problem.

When your function grows large enough that the relationships between variable start to confuse people, it is legitimate reason to split the function into two. In your example, when you need two functions, you would do it like this in Lever:

was_way_too_big_function = ():
    # other things, plus 'foo' and 'bar' defined.
    neater_function(foo, bar)
    # other things after
neater_function = (foo, bar):
    # too big section that would have required 'using' syntax described above.

The standard convention in my language is to put the split-out part of function below where it was split-off. This is in logical order where the most people would prefer to read the code.

Note on using with as a keyword: JavaScript, VB, and Kotlin (and probably more) use with to bring an object's members into scope.

That is, instead of writing:

receiver.foo();
receiver.var();

You can write:

with receiver {
    foo();
    bar();
}

I'd be _very_ careful before using the with keyword for _anything_ in rust.

I'd rather see a procedural macro or lint plugin for this.

fn foo() {
  let mut a = something();
  let mut b = something_else();

  #[with(a)] {
    a = a.derived();
  }

  b.associate_with(a)
}

There's no need for this to be a part of the core language. Especially given its optionality.

Triage ping @leksak -- what's the status of this issue?

Was this page helpful?
0 / 5 - 0 ratings