This is a tracking issue for RFC 2151 (rust-lang/rfcs#2151).
Steps:
Unresolved questions:
r# syntax when printing identifiers that overlap keywords?r# syntax? e.g. to document pub use old_epoch::*@Centril you rock! @petrochenkov, think you could supply some mentoring instructions here?
I'd like to take a shot at this. It seems like it'd be a decent way to learn how rustc works.
The relevant code is probably this, you'll want to make the ('r', Some('#'), _) case allow for the third character to be alphabetic or an underscore, and in that case skip the r and the # before running ident_continue.
We could add an is_raw boolean to token::Ident as well.
You'll also need to feature gate this but we can do that later.
lmk if you have questions
We could add an
is_rawboolean totoken::Identas well.
This is possible, but would be unfortunate. Idents are used everywhere and supposed to be small.
Ideally we should limit the effect of r# to lexer, for example by interning r#keyword identifiers into separate slots (like gensyms) so r#keyword and keyword have different NameSymbols.
EDIT: The first paragraph is about ast::Ident, token::Ident may actually be the appropriate place.
Some clarifications are needed:
r# affects context-dependent identifiers (aka weak keywords) like default.// `union` is a normal ident, this is not an error
union U {
....
}
// `union` is a raw ident, is this an error?
r#union U {
...
}
r# affect keywords that are "semantically special" and not "syntactically special"?Self in Self::A::B is already treated as normal identifier during parsing, it only gains special abilities during name resolution when we resolve identifiers named Self (or self/super/etc) in a special way.#[derive(Default)]
struct S;
impl S {
fn f() -> S {
r#Self::default() // Is this an error?
}
}
oh, I didn't realize we reuse Ident from the lexer.
I think r#union is an error (when used to create a union). We'll need the ident lexing step to return a bool on the lexed ident's raw-ness.
I think it's ok for r#Self to work; but don't mind either way
Also, lifetime identifiers weren't covered by the RFC - r#'ident or 'r#ident.
(One more case of ident vs lifetime mismatch caused by lifetime token being a separate entity rather than a combination of ' and identifier, cc https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716).
I think it's fine if we don't have raw lifetime identifiers. Lifetimes are crate-local, their identifiers never need to be used by consumers of your crate, so lifetimes clashing with keywords can simply be fixed on epochs. Admittedly, writing a lint that makes that automatic may be tricky.
Raw identifiers are primarily necessary because people may need to call e.g. functions named catch() in crates on an older epoch. This problem doesn't occur for lifetimes.
Yeah, it's mostly a consistency question rather than a practical issue.
From what I've been seeing while looking around the codebase, I think the best way to implement this is to add a new parameter to token::Ident, rather than messing with the Symbol itself?
I think this would make implementing epoch-specific keywords easier, since there's no question of what Symbol should be used when, and in what epoch. (For example, you'd have to make sure the Symbol for catch being used as an identifier in 2015 epoch code is the same as the Symbol for r#catch being used in a epoch where it's a full keyword.) This was already something I wasn't sure how to handle with contextual keywords.
My main questions, right now, would be:
parser.rs. Would adding one there be an issue? How would I go about doing that, considering that module doesn't have other feature checks that I can use as a template?token::Ident is declared as Ident(ast::Ident). To add an is_raw field would mean having a mystery unnamed bool field in a tuple struct, or making it use named fields, in which case, matching on token::Idents becomes nastier. One idea that did come to mind is adding an RawIdent(ast::Ident) variant, but then the compiler can't help me find places I might need to worry about raw identifiers. Any advice on this?I'll implement lifetime parameters if it turns out to be easy to, I guess. As Manishearth said, it's not something you really need to escape ever.
Yes, we should not be affecting Symbol.
Regarding the feature gate, we can solve the problem later, but I was thinking of doing a delayed error or something since we don't know what feature gates are available whilst lexing
a mystery unnamed bool field in a tuple struct
I think that's fine. Folks usually do this as Ident(ast::Ident, /* is_raw */ bool)
I think it's best if we _don't_ allow this to work for lifetime parameters, actually. We restrict the number of places where raw identifiers are allowed at a first pass, and if we need this for a later epoch, we add it then
But yeah, your plan sounds good otherwise
.... right, that makes sense. The lexer obviously doesn't know what feature flags there are, because it's busy lexing them. :D
@petrochenkov
In my opinion, r#foo should always be a "generic identifier" and hence not eligible as a contextual keyword. So:
How r# affects context-dependent identifiers (aka weak keywords) like default.
They are not special anymore. r#union Foo would be an error.
How does r# affect keywords that are "semantically special" and not "syntactically special"?
Is this just about self and Self? Does this apply to super too? We currently talk about self and Self as if they were keywords. I am therefore inclined to think that r#self would be "just another name" and not have the special properties that self ordinarily has.
So e.g. use r#self::foo would be an absolute path.
cc @rust-lang/lang -- do others agree?
@nikomatsakis That's exactly what I'd have expected.
That said, I don't know what that means with macros, since apparently this works:
macro_rules! foo {
($i:ident) => {
$i Foo {
x: u32,
y: i32,
}
}
}
foo!(union);
(I wish it didn't, but it might be too late?)
I hesitate about treating self and r#self as distinct things, because it seems like this would allow overlapping names in the same scope, like:
impl Foo {
fn foo(self, r#self: Bar) {
println!("I have distinct {} and {} at the same time?", self, r#self);
}
}
Maybe that's in fact OK, but if so it's at least a corner case to test...
(In general, foo and r#foo are supposed to refer to the same thing.)
@scottmcm I'd expect your foo!(union) to work, but not foo!(r#union). That is, macros will have to preserve the metadata whether a particular :ident is raw or not.
I've found some other unexpected places where raw identifiers might show up while implementing this. Should these be allowed?
#[r#struct]macro_rules! foo { ($r#struct:expr) => { r#expr } }There's some weirdness with the built-in procedural macros taking raw identifiers too:
concat_args!(r#abc, r#def)format!("{struct}", r#struct = 0);Also, libproc_macro seems to need to be able to deal with raw identifiers somehow too. I've been creating artificial Symbols with content like r#struct, but this doesn't seem like a very good solution since these Symbols wouldn't be used anywhere else. Any advice here?
Procedural macros see them after lexing, so that will Just Work.
I don't think libproc_macro needs to know anything? Again, this is all after lexing.
I personally think it's fine to allow all those. Simplifies things.
libproc_macro has an unstable enum that represent a token: https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.html
This needs to know about the new field in token::Ident to properly serialize/deserialize it. I don't know how much unstable stuff depends on it.
Yeah, fair. As long as it's unstable we can tweak it.
рео рдорд╛рд░реНрдЪ, реирежрезрео резреж:резрел рдо.рдЙ. рд░реЛрдЬреА, "Lymia Aluysia" notifications@github.com
рдиреЗ рд▓рд┐рд╣рд┐рд▓реЗ:
libproc_macro has an unstable enums that represent a token:
https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.htmlThis needs to know about the new field in token::Ident to properly
serialize/deserialize it.тАФ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-371723696,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSCOe13j2WQTrr0q94xEQlHXUW0aUks5tch4SgaJpZM4SVdSg
.
That said, I don't know what that means with macros, since apparently this works:
macro_rules! foo { ($i:ident) => { $i Foo { x: u32, y: i32, } } } foo!(union);
Actually, checking on the playground, it looks like this isn't even unique to contextual keywords: https://play.rust-lang.org/?gist=98bba154f78cd9aba5838bf82ac2fbb4&version=stable
Hrm, another strange case that came up while writing tests:
Given this macro definition, which branch should test_macro!(r#a) match:
macro_rules! test_macro {
(a) => { ... };
(r#a) => { ... };
}
@nikomatsakis ^ ?
Could even make this change based on the epoch. idk.
macro matching is kinda-sorta-breakable already
The code currently matches the first branch, which I don't think is correct? My intuition is that macros can generally see through distinctions like this.
There's no backwards compatibility problem I can think of with new keywords and macros, since struct already matches $id:ident, etc.
The problem is that someone could have written a macro that expects
parameters to be given to it as $a:ident # $b:ident and it won't work
anymore.
реп рдорд╛рд░реНрдЪ, реирежрезрео рео:реирен рдо.рдЙ. рд░реЛрдЬреА, "Lymia Aluysia" notifications@github.com рдиреЗ
рд▓рд┐рд╣рд┐рд▓реЗ:
The code currently matches the first branch, which I don't think is
correct? There's no backwards compatibility problem I can think of with new
keywords and macros, since I believe struct already matches $id:ident, etc.
тАФ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-372001706,
or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABivSCaWeuum2fbbUN8NfuZ4nqFYnxNPks5tc1Y_gaJpZM4SVdSg
.
Ah, yeah, that's a problem.
The problem is that someone could have written a macro that expects parameters to be given to it as $a:ident # $b:ident and it won't work anymore.
This was checked in the RFC. r#foo won't match such a pattern, because the r# prefix is currently only allowed as the start of a raw string.
macro_rules! foo {
($a:ident # $b:ident) => (())
}
fn main() {
foo!(r#foo);
}
error: found invalid character; only `#` is allowed in raw string delimitation: f
Ah, perfect.
реп рдорд╛рд░реНрдЪ, реирежрезрео реп:реиреп рдо.рдЙ. рд░реЛрдЬреА, "Josh Stone" notifications@github.com рдиреЗ
рд▓рд┐рд╣рд┐рд▓реЗ:
The problem is that someone could have written a macro that expects
parameters to be given to it as $a:ident # $b:ident and it won't work
anymore.This was checked in the RFC. r#foo won't match such a pattern, because it
the r# prefix is currently only allowed as the start of a raw string.macro_rules! foo {
($a:ident # $b:ident) => (())
}fn main() {
foo!(r#foo);
}error: found invalid character; only
#is allowed in raw string delimitation: fтАФ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-372004565,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSAUiiWyXL1dAItPBCUfvUAsxGMTaks5tc2TEgaJpZM4SVdSg
.
I hesitate about treating self and r#self as distinct things, because it seems like this would allow overlapping names in the same scope, like:
That does look confusing. We also currently do treat self like any other identifier with respect to hygiene iirc, which is kind of annoying (i.e., if you generate an expression that references self, it will fail to match against a self in scope). I'm not sure which way that precedent points. =)
Yeah I think self and Self conceptually count as identifiers which are
magic, unlike dyn and catch which are not identifiers.
Also super though that's less problematic.
On Mar 12, 2018 10:30 AM, "Niko Matsakis" notifications@github.com wrote:
I hesitate about treating self and r#self as distinct things, because it
seems like this would allow overlapping names in the same scope, like:That does look confusing. We also currently do treat self like any other
identifier with respect to hygiene iirc, which is kind of annoying (i.e.,
if you generate an expression that references self, it will fail to match
against a self in scope). I'm not sure which way that precedent points. =)тАФ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-372329086,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSJxiRjXqmdxpbYsgFjrxPZ7nNMqXks5tdoaPgaJpZM4SVdSg
.
Isn't self also a keyword in some ways? fn foo(&self) is special syntax beyond self simply being an identifier (and it's implemented that way, currently, fn foo(&r#self) doesn't work)
Isn't
selfalso a keyword in some ways?
self, Self and friends are keywords in the literal sense of the word - they are not permitted in arbitrary identifier position (e.g. you can't write struct Self {}), only in some limited contexts.
However, they are permitted in more contexts (namely, in path segments) than some other keywords like struct or else, so it may create and impression that they are less "keywordy".
Maybe the best approach for now would actually be to forbid r#self, r#Self and r#super entirely? It looks like neither fully supporting r#self as fully identical to self or r#Self as fully different to Self is trivial. With no special care towards this, it's being treated as a weird hybrid.
Yeah, that's what I was suggesting -- we can be conservative here.
@Lymia
Maybe the best approach for now would actually be to forbid r#self, r#Self and r#super entirely?
Seems reasonable as a starting point. =)
@Lymia how goes your impl efforts?
@nikomatsakis
The implementation (https://github.com/rust-lang/rust/pull/48942) is almost ready to be merged, a few nits remain to be fixed.
Does rustdoc need to use the
r#syntax? e.g. to documentpub use old_epoch::*
That's more a docs question (of what we want to display) than a rustdoc question (of how it's implemented). It's probably simple enough to print a r# on an identifier if it had one originally (if that information makes it into the AST/HIR), but anything on top of that will probably require filtering any identifiers through a known list of keywords before printing.
That list probably only makes sense in the context of a given edition, though, so it may not be worth pursuing. Since a crate can be compiled in one edition and used from another, it's possible that some item would need the r# in one edition but not in another. Printing it universally may be misleading, especially if the feature isn't made available to the 2015 edition once it lands.
That's just my thoughts on it, though. @rust-lang/docs What do you think?
I didn't see any mention of it in this issue, nor could I find it in the project, but having this stable for the 2018 edition seems like a requirement, if any new keywords will also be introduced (which seems to be a certainty).
Without this, people couldn't update to the new epoch if they used any crates with identifiers like async, try, delegate, etc.
Yes, that is the plan.
On Fri, May 4, 2018, 1:00 PM Sean McArthur notifications@github.com wrote:
I didn't see any mention of it in this issue, nor could I find it in the
project https://github.com/rust-lang/rust/projects/3, but having this
stable the 2018 edition seems like a requirement, if any new keywords will
also be introduced (which seems to be a certainty).Without this, people couldn't update to the new epoch if they used any
crates with identifiers like async, try, delegate, etc.тАФ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-386717294,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSItuxwcfMbjRZF6BhOSNTrMsNhdpks5tvLNugaJpZM4SVdSg
.
I believe this feature is ready for consideration for stabilization (in the runup to the Edition). It's a very simple feature, and widely understood to be necessary for Editions in general (to support adding keywords). I don't believe any of the listed unresolved questions are blockers to stabilization per se (though they should be addressed for the Edition itself).
If you haven't had a chance to try this out and think you may have an opinion, please give it a whirl and leave feedback.
@rfcbot fcp merge
Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:
No concerns currently listed.
Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
I found the other day that proc_macro::quote!(r#foo) will not do anything specific about raw identifiers, although I'm not sure if it's broken. Also, it's unstable still.
I have a small concern which I don't think has been addressed yet; quote, the library that most proc macros use, uses #foo for interpolation, because it is unable to use $foo (I forget why, maybe we can fix this).
This seems to have the potential to break serde and a lot of other crates, have we had enough time to test that out?
Since the macro uses syn it's only a problem if syn gains support for raw identifiers, but it will have to eventually, so we should perhaps ensure that the dollar sign thing works for syn/quote before pushing this out?
cc @dtolnay @mystor
I don't know about any tests, but r# was chosen initially because it should not be ambiguous with anything in macros.
IIRC, r# was already formerly consumed by the parser as the start of a raw string, so r#foo in a macro would have been a syntax error, whereas r #foo would have been and should still be legal. So I would expect that now r# is greedily a raw string or raw identifier, but never split into r #foo otherwise.
So quote!(r#foo) parses as a raw identifier, and quote!(r #foo) leads quote to try and interpolate foo.
@withoutboats quote from the quote crate is a different flavor of macro.
I think if we can be sure syn can disambiguate, this is fine
@Manishearth I don't know what you mean, but quote is a macro_rules macro. It should work just like any other macro_rules macro. If syn lexes r# the same way rustc does, anything using syn should also work correctly.
Oh, I see what you mean.
I meant that #foo carried meaning in quote that does not in regular macro-defining syntax, but that wasn't your point.
Looking at the implementation it seems to be fine.
:bell: This is now entering its final comment period, as per the review above. :bell:
You can't detect $foo properly in a macro_rules macro, but if you have a proc macro, it should work (at least now, maybe it has had issues in the past).
The final comment period, with a disposition to merge, as per the review above, is now complete.
I've nominated this mostly so that someone (lang team, maybe) can find or select someone to write up docs and stabilize this feature.
This is in need of a stabilization PR! There's a stabilization guide on the forge. This feature is already documented in the edition guide, so much of that documentation can probably be reused in the stabilization PRs. Please post here if you plan to take this issue! (I'll circulate it around and see if anyone wants to take it as a good first or third PR)
I'll have a go. This looks pretty straightforward.
When it comes to new documentation, is there anything that should be updated besides the Reference? I'm not sure it belongs in the Book.
Merged! I think some boxes can be ticked off now, @Centril. :-)
As for the unresolved questions:
pub use old_epoch::*, @Centril?Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
I'm not sure of this. What do you mean by pub use old_epoch::*
For instance, if the old_epoch crate had a fn catch() which it could declare without bothering with raw identifiers, and a new_epoch crate used and re-exported it. The new crate would have had to use raw identifiers if it had declared that function itself. Should this affect the way rustdoc presents it?
@cuviper Right, makes sense. I think the rules should be the same as what I proposed for diagnostics, in that case.
Most helpful comment
@petrochenkov
In my opinion,
r#fooshould always be a "generic identifier" and hence not eligible as a contextual keyword. So:They are not special anymore.
r#union Foowould be an error.Is this just about
selfandSelf? Does this apply tosupertoo? We currently talk aboutselfandSelfas if they were keywords. I am therefore inclined to think thatr#selfwould be "just another name" and not have the special properties thatselfordinarily has.So e.g.
use r#self::foowould be an absolute path.cc @rust-lang/lang -- do others agree?