Rust: MBE: more checks at definition time

Created on 22 May 2019  路  36Comments  路  Source: rust-lang/rust

There are a few checks that are currently only done at macro invocation time (or are not done at all) that really would make more sense to do a macro definition time.

This is a tracking issue to implement these checks.

A-macros C-cleanup C-tracking-issue E-help-wanted E-medium E-mentor T-compiler T-lang

Most helpful comment

@mark-i-m you can apply labels yourself by using @rustbot modify labels

All 36 comments

Could someone tag this with cleanup and macros and help-wanted please?

EDIT: and tracking-issue

cc @petrochenkov

@mark-i-m you can apply labels yourself by using @rustbot modify labels

Oh, that's good to know. Thanks :)

@rustbot modify labels: +C-cleanup

Just some a quick note for anyone who wants to implement this: the checks should probably be inserted here, where there are already some other checks:

https://github.com/rust-lang/rust/blob/dec4c5201f88efbc3020b04ba96a5ee2c3b6cfcd/src/libsyntax/ext/tt/macro_rules.rs#L356-L369

You can look at the existing checks to see what they are like.

Also, anything implemented will probably need a crater run. I think breakage here is acceptable, since any code that does the things in the OP is almost certainly wrong.

EDIT: also, we would probably need to follow a procedure like https://github.com/rust-lang/rust/issues/57742 and its PRs.

@rustbot modify labels: +E-medium +E-mentor

I am interested into implementing those checks. I would need a bit of mentoring though. Who can I ask questions and through which medium? For example, I implemented the check that variables in rhs are bound in lhs and it already prevents building stage 1 (with ./x.py build --stage 1) because quick-error has errors. All variables on this line https://github.com/tailhook/quick-error/blob/aadd84f9481dddb5f6c04743d1be0672a0589dc8/src/lib.rs#L468 are not bound. How can I locally patch this crate?

@ia0 Thanks! You can ask me on this thread :) But I am out of town this week, so I might not be super responsive.

Hmm... I am not super familiar with quick-error. I believe you can change the Cargo.toml for the faulty crate to use a path dependency. Does this help: https://internals.rust-lang.org/t/how-to-specify-a-private-repository-for-one-library-like-libc/9779 ?

Thanks, I was indeed looking for [patch.crates-io]. I think all 3 checks are implemented in https://github.com/ia0/rust/tree/issues_61053. What remains to be done are:

  • Fix all macro errors in rustc and dependencies (may take some time, there are a lot).
  • Clean up code (current code was mostly to get the check in place to get an idea of impact).
  • Improve error messages (very basic for now).
  • Add tests (none yet).
  • Run crater.
    It should take some time, but I'll keep you posted on progress.

I think most of the work is done and is ready for review. Should I create a pull request?

Here is a description of the commits available in https://github.com/ia0/rust/commits/issues_61053:

  1. Fix meta-variable binding errors in macros
    This is an independent change (could have its own pull request) fixing macro errors in the compiler itself.
  2. Run rustfmt
    This is an independent change (could have its own pull request) running rustfmt on 2 files that I'll modify in later commits.
  3. Remember the span of the Kleene operator in macros
    This change only depends on commit 2 and saves the span of the Kleene operator to give precise error messages in commit 5.
  4. Patch extern crates
    This is an independent change similar to commit 1 but for external crates. I actually don't know how we should handle this issue. One idea is to first have the checks as a warning instead of an error (which would also avoid having to run crater). Once the problematic external crates are bumped to a version fixing the issues, we can set it back to an error.
  5. Implement checks for meta-variables in macros
    This change depends on all previous commits and implements the 3 checks of this issue. I didn't run and fix all tests yet (only the ui test suite). Also note that there might be issues with #39412 that I did not investigate yet.

I will be offline for the next 4 days (leaving Thursday noon, back Sunday night).

Actually, there's an interesting problem:

macro_rules! outer {
    ($x:expr; $fragment:ident) => {
        macro_rules! inner { ($y:$fragment) => { $x + $y } }
    }
}

$y is free in the right-hand side of the outer macro, but it is correctly bound and used in the inner macro. So I would need to add support for macros defined in macros. But I'm not sure it will be easy to fix.

Ah, that's a great example.
Perhaps nested macros is the reason why this wasn't done in the first place.

Wow! Nice work @ia0!

It sounds like we need to keep track of scopes or something (perhaps like the ribs used by the normal name resolution mechanism in the compiler). That could be a lot of work though... @petrochenkov any ideas?

I'm actually not sure it is possible, just looking at the macro definition (and without looking at instantiations), to know whether a macro variable is well-scoped.

macro_rules! add {
    ([$($function:tt)*] $value:tt) => {
        $($function)* { ($x:expr) => { $x + $value } }
    }
}

macro_rules! extract {
    (($($parameters:tt)*) => { $dollar:tt x + $value:tt }) => { $value }
}

fn main() {
    add!([macro_rules! add_one] 1);  // OK
    assert_eq!(add_one!(4), 5);
    println!("{}", add!([stringify!] 1));  // OK
    println!("{}", extract! { ($x:expr) => { $x + 1 } });  // OK
    println!("{}", add!([extract!] 1));  // not OK
}

In this example, just looking at the definition of add!, it is not obvious whether $x is well-scoped or not. If $function is macro_rules! <ident> then it's a valid instantiation. However, if it's anything else, it looks like it would not be well-scoped ($stringify seems to be a particular case).

It would be a breaking change, but I guess we could reject nested scoping if macro_rules! <ident> is not literally present in the macro definition. We would apparently not lose any expressiveness doing so.

@ia0 @petrochenkov Ok, so here are some thoughts... let me know what you think.

Consider the following case:

macro_rules! foo {
    ($id:ident) => {
        $id! bar {
            ($y:ident) => {}
        }
    }
}

foo!(macro_rules);

Because macros can contain and expand to arbitrary non-rust-syntax token streams, I think it is impossible to check at definition time if a macro contains a nested macro. This means it is impossible to check for the case you pointed out, which is a rather sad result :cry:

Note that this only applies to the check that all meta-vars that are used are declared. The other checks can just operate on the declared variables (which are not ambiguous TMK).

  • Use heuristics:

    • If we see the token macro_rules anywhere, just give up.
    • A possibly stronger heuristic: if we see macro_rules anywhere in a macro, we check that all variables show up at least twice (since they must be declared and used)
    • The downside is that this would necessarily happen silently (to avoid breaking programs with nested macros), so people may erroneously assume that the lack of error messages means the compiler checked and found no errors.
  • My preference would be to provide an allow-by-default lint: when enabled, the lint assumes no nested macros and does full checking. When disabled, we do no checking (or minimal checking).

    • Perhaps in a future edition, this could be warn-by-default.
    • I imagine that the vast majority of macros don't contain nested macros, so they would benefit from this.

Note that this only applies to the check that all meta-vars that are used are declared. The other checks can just operate on the declared variables (which are not ambiguous TMK).

Indeed, this looks like a correct assumption. It seems that we cannot shadow macro variables:

macro_rules! foo {
    ($id:ident) => {
        $id! bar {
            ($id:ident) => {}  // $id is not a binder but an occurrence
        }
    }
}

foo!(macro_rules);

fn main() {
    bar!(macro_rules:ident);  // OK
    bar!(ignore);  // not OK
}

This implies that macro variables are unique (bound at most once).

Use heuristics:

I think we can even be complete if we only support explicit occurrences of macro_rules (contrary to through a meta variable). I'll try to give it a go.

My preference would be to provide an allow-by-default lint

Yes, I agree we should start with an allow-by-default lint and move to warn-by-default later. False positives should be very rare (implicit nested macro). Note that I didn't investigate non-legacy macro, aka decl_macro, yet. We may have additional issues there.

Note that I didn't investigate non-legacy macro, aka decl_macro, yet. We may have additional issues there.

Yeah, I'm not that familiar with decl_macros either. But since they are currently unstable (and it looks like they are proceeding slowly), I think we can focus on macro_rules for now. It would be good to make sure these problems don't exist in decl_macro too before we stabilize it, though.

I finally got some time to implement an approximative support for nested macro definitions. I think we should try a crater to know if it's enough. The next steps I see would be:

  • Understand the issue with decl_macro (2 failing tests in ui).
  • Make the errors a lint (allow by default).
  • Run crater.
    I should be able to look into that Thursday or this weekend.

Ok, so I checked the 2 failing tests (ui/hygiene/globs.rs and ui/hygiene/nested_macro_privacy.rs) due to decl_macro and it's actually also a nested macro definition issue. So I'm going to "fix" them the same way as for legacy macros, i.e. try to pattern match them and add the additional binders. This is going to be an approximative solution, as for legacy macros, but it should work for most cases.

Thanks @ia0! Great work :rocket:

Run crater. I should be able to look into that Thursday or this weekend.

When you are ready, just open a PR on this repo. We will do bors try to get some built artifacts. Then, we can do a crater run (with the artifacts from bors). And that's it :)

Ok, I created #62008. Should I migrate the errors to an allow-by-default before running crater? Or maybe I can migrate to a deny-by-default lint, run crater, and then switch the lint to allow-by-default.

@ia0 Thanks! I will take a look soon. Let I think we should make it an allow-by-default lint of the crater run. The goal will be to make sure we aren't breaking any existing known crates. If we want we could also try a run with the deny-by-default lint just to see how prevalent the errors are and what the false-positive rate is.

@petrochenkov raised a good point in #62008 that we don't need to specify the repetition in RHS. It is entirely determined by the LHS.

macro_rules! current { ($($x:tt)+) => { $($x)+ }; }
macro_rules! proposed { ($($x:tt)+) => { $($x) }; }

Although less noisy, I see 2 possible issues:

  • This is a breaking change. How do we know if current!(1 2 3) is 1 2 3 + (new compiler) or 1 2 3 (old compiler)? There is no issue for proposed!(1 2 3) because it would be rejected by an old compiler.
  • It may be useful as documentation to see how a token tree repeats in the RHS. For example let $x $(: $t)? = $v (current behavior) and let $x $(: $t) = $v (proposed behavior).

It might be a good thing to do for decl_macros

@ia0

This is a breaking change.

I would expect */+/? to still be accepted by the new compiler, just become optional.

I would expect */+/? to still be accepted by the new compiler, just become optional.

So you mean that current!(1 2 3) should produces 1 2 3 as with the old compiler. This looks brittle to me because if the user is using the new style throughout their crate and they want to output a *, +, or ? after a repeated sequence, they would have to use the old style (i.e. explicitly provide the Kleene operator) to avoid ambiguity: Use $($x)++ to produce 1 2 3 + because $($x)+ would produce 1 2 3.

I think it is preferable to have a breaking change (new compiler does not expect a Kleene operator after a sequence in the RHS) than to have the Kleene operator be optional in the RHS.

@petrochenkov That would be ambiguous unless we denied the use of ?/*/+ in matchers or required the first such character to be a Kleene op, as @ia0 pointed out.

Status

Implementation

62008 implements the meta_variable_misuse lint with the following checks at macro definition:

  • Meta-variables must not be bound twice in nested macro definitions (this is already an error at top-level)
  • Meta-variables must not be free
  • Meta-variables must repeat at least as many times as their binder
  • Meta-variables must repeat with the same Kleene operators as their binder

The main issue with #62008 is that it may have false negatives and false positives. This is due to the possibility to define macros in macros in ways that depend on the actual macro instantiation. The current lint tries to detect nested macro definitions by looking for one of the following patterns:

  • macro_rules! name { body }
  • macro_rules! $name { body }
  • macro name { body }
  • macro $name { body }
  • macro name(lhs) { rhs } (body is (lhs) => { rhs })
  • macro $name(lhs) { rhs } (body is (lhs) => { rhs })

In case a nested macro definition is detected, its body is analyzed as long as it matches a sequence of lhs => rhs separated and optionally terminated by ; for legacy macros and separated by , for decl_macro macros. As soon as the body does not match this pattern, the lint falls back and checks the remaining unchecked parts as if outside the nested macro definition.

Here are a few problematic examples classified by root cause.

Behavior depends on actual macro instantiation

We need a check at macro instantiation to be sound and complete.

macro_rules! foo {
    ($n:tt) => {
        // The lint does not recognize the possible nested macro definition.
        $n! bar {
            // $x is free unless $n is `macro_rules`
            ($x:tt) => { $x };
        }
    };
}
macro_rules! foo {
    ($d:tt) => {
        macro_rules! bar {
            // The lint does not recognize the possible meta-variable occurrence:
            // `$d z` is a free meta-variable if $d is `$`.
            ($y:tt) => { $d z };
        }
    };
}

Behavior depends on other macro definitions

We need some kind of abstract macro instantiation to unfold macro calls.

macro_rules! foo {
    () => {
        // The lint thinks bar is a nested macro definition but it's not.
        stringify!(macro_rules! bar { () => { $x }; })
    };
}
macro_rules! foo {
    () => {
        // The lint does not look at the definition of `def`, thus does not recognize
        // the nested macro definition, and thus reports $x as free.
        def!({ ($x:tt) => { $x }; });
    };
}
macro_rules! def { ($body:tt) => { macro_rules! bar $body }; }

Arbitrary limitations

The way we try to guess nested macro definitions and macro bodies can be improved.

macro_rules! foo {
    ($($k:tt $v:tt)*) => {
        macro_rules! bar {
            // The lint does not recognize the repetition as a body, thus falls back
            // to checking as if outside a nested macro definition, and thus reports
            // $f as free.
            $(($f:tt $k) => { $f($v) };)*
        }
    };
}

Discussion

Omit Kleene operator for repetition occurrences

As noticed by @petrochenkov, the Kleene operator on sequence occurrences is redundant. It is entirely defined by its binder. As such, we could consider a language change to omit the Kleene operator in those cases:

macro_rules! current { ($($x:tt)+) => { $($x)+ }; }
macro_rules! proposed { ($($x:tt)+) => { $($x) }; }

As noted by @mark-i-m, this may be easier to apply for decl_macro.

To measure the ratio of false positives of #62008 we ran a crater with deny-by-default: https://crater-reports.s3.amazonaws.com/pr-62008/downloads.html. In this post, we go over regressed crates to see if the regression is expected (actual errors in the crate) or if it is due to false positives. I went over all regressed crates that have a non-transitive error which is not a "different Kleene operator" (since those are most probably an actual error and it removes 355 crates). From those 53 crates, 6 have false positives:

  • reg/cluColor/0.1.5 false positives
  • reg/dyplugin/0.2.0 false positives
  • reg/easy_ffi/0.1.0 false positives
  • reg/pmutil/0.3.0 false positives
  • reg/seed/0.3.7 false positives
  • reg/webrtc-sctp/0.0.0 false positives

The 47 remaining crates only have expected errors:

  • gh/Webd01/rust-macros expected
  • gh/aylei/leetcode-rust expected
  • gh/hopv/hoice expected
  • gh/innerand/eggspire expected
  • gh/rantingpirate/spinnrd expected
  • gh/rasviitanen/svgmacro expected
  • gh/adamAndMath/AlgebraicManipulatorRust expected
  • gh/kyleheadley/type-theory-by-trait expected
  • gh/westerhack/quest expected
  • reg/abomonation/0.7.2 expected
  • reg/acacia_net/0.1.0 expected
  • reg/adapton/0.3.30 expected
  • reg/astro/2.0.0 expected
  • reg/binjs_io/0.2.1 expected
  • reg/bvh_anim/0.4.0 expected
  • reg/crack/0.1.0 expected
  • reg/crossbeam-channel/0.3.8 expected
  • reg/dockerfile-rs/0.3.0 expected
  • reg/error-chain/0.12.1 expected
  • reg/fbxcel-dom/0.0.3 expected
  • reg/func_swap/0.1.0 expected
  • reg/fungi-lang/0.1.63 expected
  • reg/galvanic-assert/0.8.7 expected
  • reg/handlebars/2.0.0-beta.2 expected
  • reg/indexed_vec/1.1.1 expected
  • reg/ld_preload/0.1.2 expected
  • reg/liftoff/0.1.1 expected
  • reg/maths-traits/0.1.3 expected
  • reg/nom-operator/0.0.2 expected
  • reg/probor/0.3.1 expected
  • reg/proptest/0.9.4 expected
  • reg/protobuf_codec/0.2.7 expected
  • reg/quick-error/1.2.2 expected
  • reg/quick-error2/2.0.1 expected
  • reg/relm/0.16.0 expected
  • reg/relm-state/0.16.0 expected
  • reg/render-tree/0.1.1 expected
  • reg/rml_rtmp/0.3.0 expected
  • reg/svgmacro/0.2.2 expected
  • reg/tensor-macros/0.2.0 expected
  • reg/tfdeploy/0.0.10 expected
  • reg/typeparam/0.0.1 expected
  • reg/ubend/0.2.1 expected
  • reg/upt/0.1.3 expected
  • reg/v9/0.1.2 expected
  • reg/wood/0.4.2 expected
  • reg/xmlhelper/0.1.0 expected

False positives

We mostly have 3 types of false positives:

  • Sequences of rules in nested macro definitions. (This shouldn't be hard to implement and would fix most issues.)
  • A meta-variable is expected to be a dollar. (This is hard to implement. To help we could consider extending the macro system with $$ that gets expanded to $ such that users don't need to take a $dollar meta-variable as argument.)
  • Macro definition using parentheses instead of braces. (Easy to implement.)

cluColor-0.1.5

On this line, we don't see that $s is a binder, because we don't support sequences for rules of nested macro definitions.

dyplugin-0.2.0

On this line, we don't see that $func is a binder, because we don't support sequences for rules of nested macro definitions.

easy_ffi-0.1.0

On this line, we think $attr is free because we don't realize that the $dol meta-variable may be a dollar which would bind attr on line 117.

pmutil-0.3.0

On this line, we don't see that $tokens is a binder, because we don't support sequences for rules of nested macro definitions.

seed-0.3.7

On this line, we think $d is free because we don't realize that the argument to with_dollar_sign! is the body of a macro definition.
On this line, we don't see that $value is a binder, because we don't support sequences for rules of nested macro definitions.

webrtc-sctp-0.0.0

On this line, we don't see that $i is a binder, because the macro body uses parentheses instead of braces. If this is a valid way to define macro, we should update this line.

How should we handle stringify!($x) where $x is free?

macro_rules! bar { () => { stringify!($x) }; }

This is not an error apparently (but the lint would report it). Note that this is specific to stringify! apparently. If I try to simulate this with another macro, I get an error.

macro_rules! foo { ($($x:tt)*) => {}; }
macro_rules! bar { () => { foo!($x) }; }

I'm currently marking those "errors" as expected when finding false positives, because I don't think this is really an issue of the lint, because the crates that have this issue seem to not rely on this deliberately.

@ia0 Thanks! Sorry for the long delay. Things have been busy...

stringify! and a few other macros are built-in macros -- they are special-cased in the compiler and are expanded in a special way. One thing we could do is just whitelist any problematic built-ins.

Also, thanks for all the work you have put into this!

No problem :-)

And by the way, I presented this lint at the Rust meetup in Z眉rich today (to try to get some people using it), and a few interesting questions came up:

  1. Given that this lint inherently has false positives (and false negatives), why is it a lint in the compiler instead of clippy?

  2. What is the future for this lint? Is it planned to stay a lint?

  3. How does it work with proc-macro? I'll look into this. I think it works fine, I just want to check at which location the errors are reported.

Also, I would like to know the following:

  1. Should I implement support for sequences in body of nested macro? I don't think this is too complicated to implement and it would fix some false positives.

  2. Are there any other implementation follow-ups I should do in the coming months? Or should we just wait until we get some user feedback in the next year?

Thanks a lot!

To respond to (3), I was wrong. It doesn't run when proc-macro outputs a macro definition.

# Cargo.toml
[package]
name = "proc_macro_experiments"
version = "0.1.0"
edition = "2018"

[lib]
proc-macro = true

```rust
// src/lib.rs

![deny(meta_variable_misuse)]

extern crate proc_macro;

use proc_macro::TokenStream;

[proc_macro]

pub fn make_foo(_: TokenStream) -> TokenStream {
format!("macro_rules! foo {{ () => {{ $x }}; }}")
.parse()
.unwrap()
}

```rust
// examples/main.rs
#![feature(proc_macro_hygiene)]
#![deny(meta_variable_misuse)]

proc_macro_experiments::make_foo!();

fn main() {
    // foo!();
}

cargo build --examples=main only generates an error if the call to foo!() is not commented.

Sorry! It's been quite a while!

Given that this lint inherently has false positives (and false negatives), why is it a lint in the compiler instead of clippy?

At the time I proposed the lint, I didn't realize it would have false positives. Also, I don't know enough about clippy to know whether it would be able to implement these.

What is the future for this lint? Is it planned to stay a lint?

Personally, I think they should be enabled by default (at least some of them can be). However, there is a bit more discussion on #68836 ...

At the very least, I think this work has been really useful in giving us an understanding of some of the downsides of the current MBEs implementation and behavior.

Sorry! It's been quite a while!

No problem :-) I wasn't waiting active follow-up. I guess this is a subject that may take quite some time to settle and depends on the future of MBE and proc-macro.

However, there is a bit more discussion on #68836 ...

I'll follow that. Thanks for the link!

At the very least, I think this work has been really useful in giving us an understanding of some of the downsides of the current MBEs implementation and behavior.

Yes, it was also very interesting to me to dig a bit into a small part of the Rust compiler :-)

Was this page helpful?
0 / 5 - 0 ratings