We take particular care to make sure that compilation errors associated with the items to which our attribute macros are applied are comprehensible. To that end we have tests to check the output of a variety of expected illegal programs to make sure developers could reasonably be expected to understand and correct those errors. In 1.46.0 we noticed a change in that behavior. While there were some beneficial changes to the way errors qualify structs, traits, etc. there seems to be a regression where 1.46.0 shows less information than in 1.45.0
I've put together a small repo that demonstrates the problem: https://github.com/ahl/span_regression
The first example is this macro:
#[proc_macro_attribute]
pub fn identity(
_attr: proc_macro::TokenStream,
item: proc_macro::TokenStream,
) -> proc_macro::TokenStream {
item
}
Now consider this test program:
struct Thing {
x: i32,
y: i32,
}
#[identity]
fn doit() -> Thing {
Thing {
"howdy".to_string(),
}
}
Under 1.45.0 one of the errors produced looks like this:
error[E0063]: missing fields `x`, `y` in initializer of `Thing`
--> examples/test.rs:10:5
|
10 | Thing {
| ^^^^^ missing `x`, `y`
Under 1.46.0 that has changed to this:
error[E0063]: missing fields `x`, `y` in initializer of `Thing`
Note that under 1.46.0 no code is underlined.
Now consider a slightly more complicated macro:
#[proc_macro_attribute]
pub fn nested(
_attr: proc_macro::TokenStream,
item: proc_macro::TokenStream,
) -> proc_macro::TokenStream {
let mut ret = proc_macro::TokenStream::new();
let pre = vec![
proc_macro::TokenTree::Ident(proc_macro::Ident::new("fn", proc_macro::Span::call_site())),
proc_macro::TokenTree::Ident(proc_macro::Ident::new("foo", proc_macro::Span::call_site())),
proc_macro::TokenTree::Group(proc_macro::Group::new(
proc_macro::Delimiter::Parenthesis,
proc_macro::TokenStream::new(),
)),
proc_macro::TokenTree::Group(proc_macro::Group::new(proc_macro::Delimiter::Brace, item)),
];
ret.extend(pre);
ret
}
(Note I had been using quote! but wanted to make sure that wasn't causing the problem)
With a similar example as above on 1.45.0:
error: expected identifier, found `"yo"`
--> examples/test.rs:20:9
|
19 | Thing {
| ----- while parsing this struct
20 | "yo".to_string(),
| ^^^^ expected identifier
With 1.46.0:
error[E0308]: mismatched types
--> examples/test.rs:17:1
|
17 | #[nested]
| ^^^^^^^^^
| |
| expected struct `Thing`, found `()`
| expected `Thing` because of return type
Rather than pointing to the offending code, the rustc error now points (unhelpfully) to the macro itself.
The general improvements made to error reporting in 1.46.0 (simpler naming, reduced duplicate errors) is greatly appreciated. This was one small regression I saw amongst the otherwise monotonic improvements.
If I remember correctly, we should still be able to use cargo-bisect-rustc to bisect a regression in 1.46.0.
@rustbot ping cleanup
Hey Cleanup Crew ICE-breakers! This bug has been identified as a good
"Cleanup ICE-breaking candidate". In case it's useful, here are some
[instructions] for tackling these sorts of bugs. Maybe take a look?
Thanks! <3
cc @AminArria @camelid @chrissimpkins @contrun @DutchGhost @elshize @ethanboxx @h-michael @HallerPatrick @hdhoang @hellow554 @imtsuki @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @senden9 @shekohex @sinato @spastorino @turboladen @woshilapin @yerke
Assigning P-high as discussed as part of the Prioritization WG procedure.
Cc @petrochenkov
This is caused by https://github.com/rust-lang/rust/pull/73345 - since we no longer pass a Nonterminal to an attribute macro, the pretty-print/reparse check get used (and fails), which causes us to lose spans.
This is a very interesting corner case - the span_regression error is testing the behavior of a proc-macro called on a syntactically invalid Item. It looks like this relies on the parser managing to recover an Item, despite the fact that the body is syntactically invalid.
I think there are two possible ways to fix this:
span_regression repository. However, this would effectively mean that the parser recovery behavior is now part of the stable surface of the compiler, instead of just a trick for producing more meaningful error messages. Any future changes to parser recovery could cause us to start invoking an attribute macro, if we were able to parse enough of the target to produce an AST struct.TokenStream as output, this would require us to remove the target item entirely (since we can't know what the attribute macro would have produced). Effectively, custom attribute macros would disable parser recovery.I'm leaning towards option 2 (assuming that it doesn't cause significant regressions in practice). Bang-proc-macros are the way to invoke a proc-macro on an arbitrary TokenStream - however, attributes macros are (syntactically) just another attribute, so it doesn't seem to make sense to invoke them on something that's not a syntactically valid attribute target. While the span_regression example is 'good enough' for recovery to be able to kick in, I think going with approach 1 would cause us to end up with an unprincipled list of a special cases where we invoke an attribute macro on a syntactically invalid target.
On the other hand, it's possible that option 1 could lead to a large number of spurious errors in practice - if the user introduces a syntax error when modifying a commonly-used struct (which has an attribute macro applied), removing this struct entirely could cause many spurious resolution errors. Invoking the attribute macro might lead to better (but still invalid) tokens being produced, leading to cleaner error messages.
@ahl Thanks for providing such a nicer reproducer!
@Aaron1011 thanks for the quick diagnosis.
I agree with your assessment that both options have downsides (I think in your final paragraph you're describing option 2 rather than option 1 vis-a-vis removing the struct entirely).
Option 1 seems like a bit of a mess; option 2 seems more "philosophically pure" while being a little less pragmatic. I agree that option 2 is probably preferable and that a crisp semantic like "only valid items will cause the invocation of attribute macros" is a very clear delineation between proc_macro and proc_macro_attribute.
@Aaron1011 I agree with your assessment and think that the second option is the obviously correct choice for us. This might cause some weird errors for proc macros with pseudo DSLs that are no longer turned into valid code, but given that we are already recovering the parse most of the body will be treated as empty anyways, so it should be fine™.
Could we also add something where the compiler emits one of its “this is a bug” messages if an error doesn’t show span information? Or is that not feasible?
@camelid: Right now, we deliberately create these 'dummy spans' due to issue https://github.com/rust-lang/rust/issues/43081. PR https://github.com/rust-lang/rust/pull/76130 makes progress towards eliminating these cases, but there's still more work to be done.
In the future, I think it would definitely make sense to emit a warning, and direct users to open a bug report. However, I don't think we'd want to emit an actual ICE - it would just create noise, since the query stack and call stack aren't useful for debugging missing spans.
@Aaron1011 Okay, thanks!
Most helpful comment
This is caused by https://github.com/rust-lang/rust/pull/73345 - since we no longer pass a
Nonterminalto an attribute macro, the pretty-print/reparse check get used (and fails), which causes us to lose spans.This is a very interesting corner case - the
span_regressionerror is testing the behavior of a proc-macro called on a syntactically invalidItem. It looks like this relies on the parser managing to recover anItem, despite the fact that the body is syntactically invalid.I think there are two possible ways to fix this:
span_regressionrepository. However, this would effectively mean that the parser recovery behavior is now part of the stable surface of the compiler, instead of just a trick for producing more meaningful error messages. Any future changes to parser recovery could cause us to start invoking an attribute macro, if we were able to parse enough of the target to produce an AST struct.TokenStreamas output, this would require us to remove the target item entirely (since we can't know what the attribute macro would have produced). Effectively, custom attribute macros would disable parser recovery.I'm leaning towards option 2 (assuming that it doesn't cause significant regressions in practice). Bang-proc-macros are the way to invoke a proc-macro on an arbitrary
TokenStream- however, attributes macros are (syntactically) just another attribute, so it doesn't seem to make sense to invoke them on something that's not a syntactically valid attribute target. While thespan_regressionexample is 'good enough' for recovery to be able to kick in, I think going with approach 1 would cause us to end up with an unprincipled list of a special cases where we invoke an attribute macro on a syntactically invalid target.On the other hand, it's possible that option 1 could lead to a large number of spurious errors in practice - if the user introduces a syntax error when modifying a commonly-used struct (which has an attribute macro applied), removing this struct entirely could cause many spurious resolution errors. Invoking the attribute macro might lead to better (but still invalid) tokens being produced, leading to cleaner error messages.