Rust: Tracking issue for RFC 2151, Raw Identifiers

Created on 27 Feb 2018 · 63Comments · Source: rust-lang/rust

This is a tracking issue for RFC 2151 (rust-lang/rfcs#2151).

Steps:

[x] Implement the RFC (#48942)
[ ] Adjust documentation (see instructions on forge)
[x] Stabilization PR (see instructions on forge)
- [x] Settle on a final syntax for raw identifiers.

Unresolved questions:

[ ] Do macros need any special care with such identifier tokens?
Probably not.
[ ] Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
Depends on the edition?
[ ] Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*

B-RFC-implemented C-tracking-issue E-easy E-mentor P-high T-lang disposition-merge finished-final-comment-period

Source

Centril

Most helpful comment

@petrochenkov

In my opinion, r#foo should always be a "generic identifier" and hence not eligible as a contextual keyword. So:

How r# affects context-dependent identifiers (aka weak keywords) like default.

They are not special anymore. r#union Foo would be an error.

How does r# affect keywords that are "semantically special" and not "syntactically special"?

Is this just about self and Self? Does this apply to super too? We currently talk about self and Self as if they were keywords. I am therefore inclined to think that r#self would be "just another name" and not have the special properties that self ordinarily has.

So e.g. use r#self::foo would be an absolute path.

cc @rust-lang/lang -- do others agree?

nikomatsakis on 8 Mar 2018

👍2

All 63 comments

@Centril you rock! @petrochenkov, think you could supply some mentoring instructions here?

nikomatsakis on 28 Feb 2018

❤1

I'd like to take a shot at this. It seems like it'd be a decent way to learn how rustc works.

Lymia on 6 Mar 2018

👍1

The relevant code is probably this, you'll want to make the ('r', Some('#'), _) case allow for the third character to be alphabetic or an underscore, and in that case skip the r and the # before running ident_continue.

We could add an is_raw boolean to token::Ident as well.

You'll also need to feature gate this but we can do that later.

lmk if you have questions

Manishearth on 7 Mar 2018

👍1

We could add an is_raw boolean to token::Ident as well.

This is possible, but would be unfortunate. Idents are used everywhere and supposed to be small.

Ideally we should limit the effect of r# to lexer, for example by interning r#keyword identifiers into separate slots (like gensyms) so r#keyword and keyword have different ~~Name~~Symbols.

EDIT: The first paragraph is about ast::Ident, token::Ident may actually be the appropriate place.

petrochenkov on 7 Mar 2018

Some clarifications are needed:

How r# affects context-dependent identifiers (aka weak keywords) like default.
Do they lose their special context-dependent properties and turn into "normal identifiers"?

// `union` is a normal ident, this is not an error
union U {
    ....
}

// `union` is a raw ident, is this an error?
r#union U {
    ...
}

How does r# affect keywords that are "semantically special" and not "syntactically special"?
I'm talking about path segment keywords specifically.
For example, Self in Self::A::B is already treated as normal identifier during parsing, it only gains special abilities during name resolution when we resolve identifiers named Self (or self/super/etc) in a special way.

#[derive(Default)]
struct S;

impl S {
    fn f() -> S {
        r#Self::default() // Is this an error?
    }  
}

petrochenkov on 7 Mar 2018

oh, I didn't realize we reuse Ident from the lexer.

I think r#union is an error (when used to create a union). We'll need the ident lexing step to return a bool on the lexed ident's raw-ness.

I think it's ok for r#Self to work; but don't mind either way

Manishearth on 7 Mar 2018

Also, lifetime identifiers weren't covered by the RFC - r#'ident or 'r#ident.
(One more case of ident vs lifetime mismatch caused by lifetime token being a separate entity rather than a combination of ' and identifier, cc https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716).

petrochenkov on 7 Mar 2018

I think it's fine if we don't have raw lifetime identifiers. Lifetimes are crate-local, their identifiers never need to be used by consumers of your crate, so lifetimes clashing with keywords can simply be fixed on epochs. Admittedly, writing a lint that makes that automatic may be tricky.

Raw identifiers are primarily necessary because people may need to call e.g. functions named catch() in crates on an older epoch. This problem doesn't occur for lifetimes.

Manishearth on 7 Mar 2018

Yeah, it's mostly a consistency question rather than a practical issue.

petrochenkov on 7 Mar 2018

From what I've been seeing while looking around the codebase, I think the best way to implement this is to add a new parameter to token::Ident, rather than messing with the Symbol itself?

I think this would make implementing epoch-specific keywords easier, since there's no question of what Symbol should be used when, and in what epoch. (For example, you'd have to make sure the Symbol for catch being used as an identifier in 2015 epoch code is the same as the Symbol for r#catch being used in a epoch where it's a full keyword.) This was already something I wasn't sure how to handle with contextual keywords.

My main questions, right now, would be:

Does this actually sound like the best approach?
Reading the code, it looks like most feature gating is done on the AST after parsing, and not during parsing. Since nothing in the AST would reflect the raw identifiers being there at all with this approach, the feature check would have to be in parser.rs. Would adding one there be an issue? How would I go about doing that, considering that module doesn't have other feature checks that I can use as a template?
A minor code style point: right now, token::Ident is declared as Ident(ast::Ident). To add an is_raw field would mean having a mystery unnamed bool field in a tuple struct, or making it use named fields, in which case, matching on token::Idents becomes nastier. One idea that did come to mind is adding an RawIdent(ast::Ident) variant, but then the compiler can't help me find places I might need to worry about raw identifiers. Any advice on this?

I'll implement lifetime parameters if it turns out to be easy to, I guess. As Manishearth said, it's not something you really need to escape ever.

Lymia on 7 Mar 2018

Yes, we should not be affecting Symbol.

Regarding the feature gate, we can solve the problem later, but I was thinking of doing a delayed error or something since we don't know what feature gates are available whilst lexing

a mystery unnamed bool field in a tuple struct

I think that's fine. Folks usually do this as Ident(ast::Ident, /* is_raw */ bool)

Manishearth on 7 Mar 2018

I think it's best if we _don't_ allow this to work for lifetime parameters, actually. We restrict the number of places where raw identifiers are allowed at a first pass, and if we need this for a later epoch, we add it then

Manishearth on 7 Mar 2018

But yeah, your plan sounds good otherwise

Manishearth on 7 Mar 2018

.... right, that makes sense. The lexer obviously doesn't know what feature flags there are, because it's busy lexing them. :D

Lymia on 8 Mar 2018

@petrochenkov

In my opinion, r#foo should always be a "generic identifier" and hence not eligible as a contextual keyword. So:

How r# affects context-dependent identifiers (aka weak keywords) like default.

They are not special anymore. r#union Foo would be an error.

How does r# affect keywords that are "semantically special" and not "syntactically special"?

So e.g. use r#self::foo would be an absolute path.

cc @rust-lang/lang -- do others agree?

nikomatsakis on 8 Mar 2018

👍2

@nikomatsakis That's exactly what I'd have expected.

That said, I don't know what that means with macros, since apparently this works:

macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

(I wish it didn't, but it might be too late?)

scottmcm on 8 Mar 2018

I hesitate about treating self and r#self as distinct things, because it seems like this would allow overlapping names in the same scope, like:

impl Foo {
    fn foo(self, r#self: Bar) {
        println!("I have distinct {} and {} at the same time?", self, r#self);
    }
}

Maybe that's in fact OK, but if so it's at least a corner case to test...

(In general, foo and r#foo are supposed to refer to the same thing.)

cuviper on 8 Mar 2018

@scottmcm I'd expect your foo!(union) to work, but not foo!(r#union). That is, macros will have to preserve the metadata whether a particular :ident is raw or not.

cuviper on 8 Mar 2018

I've found some other unexpected places where raw identifiers might show up while implementing this. Should these be allowed?

#[r#struct]
macro_rules! foo { ($r#struct:expr) => { r#expr } }

There's some weirdness with the built-in procedural macros taking raw identifiers too:

concat_args!(r#abc, r#def)
format!("{struct}", r#struct = 0);

Also, libproc_macro seems to need to be able to deal with raw identifiers somehow too. I've been creating artificial Symbols with content like r#struct, but this doesn't seem like a very good solution since these Symbols wouldn't be used anywhere else. Any advice here?

Lymia on 9 Mar 2018

Procedural macros see them after lexing, so that will Just Work.

I don't think libproc_macro needs to know anything? Again, this is all after lexing.

I personally think it's fine to allow all those. Simplifies things.

Manishearth on 9 Mar 2018

libproc_macro has an unstable enum that represent a token: https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.html

This needs to know about the new field in token::Ident to properly serialize/deserialize it. I don't know how much unstable stuff depends on it.

Lymia on 9 Mar 2018

Yeah, fair. As long as it's unstable we can tweak it.

८ मार्च, २०१८ १०:१५ म.उ. रोजी, "Lymia Aluysia" notifications@github.com
ने लिहिले:

libproc_macro has an unstable enums that represent a token:
https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.html

This needs to know about the new field in token::Ident to properly
serialize/deserialize it.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-371723696,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSCOe13j2WQTrr0q94xEQlHXUW0aUks5tch4SgaJpZM4SVdSg
.

Manishearth on 9 Mar 2018

That said, I don't know what that means with macros, since apparently this works:
macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

Actually, checking on the playground, it looks like this isn't even unique to contextual keywords: https://play.rust-lang.org/?gist=98bba154f78cd9aba5838bf82ac2fbb4&version=stable

Lymia on 9 Mar 2018

Hrm, another strange case that came up while writing tests:

Given this macro definition, which branch should test_macro!(r#a) match:

macro_rules! test_macro {
    (a) => { ... };
    (r#a) => { ... };
}

Lymia on 10 Mar 2018

@nikomatsakis ^ ?

Could even make this change based on the epoch. idk.

Manishearth on 10 Mar 2018

macro matching is kinda-sorta-breakable already

Manishearth on 10 Mar 2018

The code currently matches the first branch, which I don't think is correct? My intuition is that macros can generally see through distinctions like this.

There's no backwards compatibility problem I can think of with new keywords and macros, since struct already matches $id:ident, etc.

Lymia on 10 Mar 2018

The problem is that someone could have written a macro that expects
parameters to be given to it as $a:ident # $b:ident and it won't work
anymore.

९ मार्च, २०१८ ८:२७ म.उ. रोजी, "Lymia Aluysia" notifications@github.com ने
लिहिले:

The code currently matches the first branch, which I don't think is
correct? There's no backwards compatibility problem I can think of with new
keywords and macros, since I believe struct already matches $id:ident, etc.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-372001706,
or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABivSCaWeuum2fbbUN8NfuZ4nqFYnxNPks5tc1Y_gaJpZM4SVdSg
.

Manishearth on 10 Mar 2018

Ah, yeah, that's a problem.

Lymia on 10 Mar 2018

The problem is that someone could have written a macro that expects parameters to be given to it as $a:ident # $b:ident and it won't work anymore.

This was checked in the RFC. r#foo won't match such a pattern, because the r# prefix is currently only allowed as the start of a raw string.

macro_rules! foo {
    ($a:ident # $b:ident) => (())
}
fn main() {
    foo!(r#foo);
}

error: found invalid character; only `#` is allowed in raw string delimitation: f

cuviper on 10 Mar 2018

Ah, perfect.

९ मार्च, २०१८ ९:२९ म.उ. रोजी, "Josh Stone" notifications@github.com ने
लिहिले:

The problem is that someone could have written a macro that expects
parameters to be given to it as $a:ident # $b:ident and it won't work
anymore.

This was checked in the RFC. r#foo won't match such a pattern, because it
the r# prefix is currently only allowed as the start of a raw string.

macro_rules! foo {
($a:ident # $b:ident) => (())
}fn main() {
foo!(r#foo);
}

error: found invalid character; only # is allowed in raw string delimitation: f

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-372004565,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSAUiiWyXL1dAItPBCUfvUAsxGMTaks5tc2TEgaJpZM4SVdSg
.

Manishearth on 10 Mar 2018

I hesitate about treating self and r#self as distinct things, because it seems like this would allow overlapping names in the same scope, like:

That does look confusing. We also currently do treat self like any other identifier with respect to hygiene iirc, which is kind of annoying (i.e., if you generate an expression that references self, it will fail to match against a self in scope). I'm not sure which way that precedent points. =)

nikomatsakis on 12 Mar 2018

Yeah I think self and Self conceptually count as identifiers which are
magic, unlike dyn and catch which are not identifiers.

Also super though that's less problematic.

On Mar 12, 2018 10:30 AM, "Niko Matsakis" notifications@github.com wrote:

I hesitate about treating self and r#self as distinct things, because it
seems like this would allow overlapping names in the same scope, like:

That does look confusing. We also currently do treat self like any other
identifier with respect to hygiene iirc, which is kind of annoying (i.e.,
if you generate an expression that references self, it will fail to match
against a self in scope). I'm not sure which way that precedent points. =)

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-372329086,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSJxiRjXqmdxpbYsgFjrxPZ7nNMqXks5tdoaPgaJpZM4SVdSg
.

Manishearth on 12 Mar 2018

Isn't self also a keyword in some ways? fn foo(&self) is special syntax beyond self simply being an identifier (and it's implemented that way, currently, fn foo(&r#self) doesn't work)

Lymia on 13 Mar 2018

Isn't self also a keyword in some ways?

self, Self and friends are keywords in the literal sense of the word - they are not permitted in arbitrary identifier position (e.g. you can't write struct Self {}), only in some limited contexts.
However, they are permitted in more contexts (namely, in path segments) than some other keywords like struct or else, so it may create and impression that they are less "keywordy".

petrochenkov on 13 Mar 2018

Maybe the best approach for now would actually be to forbid r#self, r#Self and r#super entirely? It looks like neither fully supporting r#self as fully identical to self or r#Self as fully different to Self is trivial. With no special care towards this, it's being treated as a weird hybrid.

Lymia on 13 Mar 2018

Yeah, that's what I was suggesting -- we can be conservative here.

Manishearth on 13 Mar 2018

@Lymia

Maybe the best approach for now would actually be to forbid r#self, r#Self and r#super entirely?

Seems reasonable as a starting point. =)

nikomatsakis on 18 Mar 2018

@Lymia how goes your impl efforts?

nikomatsakis on 18 Mar 2018

@nikomatsakis
The implementation (https://github.com/rust-lang/rust/pull/48942) is almost ready to be merged, a few nits remain to be fixed.

petrochenkov on 20 Mar 2018

👍1

Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*

That's more a docs question (of what we want to display) than a rustdoc question (of how it's implemented). It's probably simple enough to print a r# on an identifier if it had one originally (if that information makes it into the AST/HIR), but anything on top of that will probably require filtering any identifiers through a known list of keywords before printing.

That list probably only makes sense in the context of a given edition, though, so it may not be worth pursuing. Since a crate can be compiled in one edition and used from another, it's possible that some item would need the r# in one edition but not in another. Printing it universally may be misleading, especially if the feature isn't made available to the 2015 edition once it lands.

That's just my thoughts on it, though. @rust-lang/docs What do you think?

QuietMisdreavus on 3 Apr 2018

I didn't see any mention of it in this issue, nor could I find it in the project, but having this stable for the 2018 edition seems like a requirement, if any new keywords will also be introduced (which seems to be a certainty).

Without this, people couldn't update to the new epoch if they used any crates with identifiers like async, try, delegate, etc.

seanmonstar on 4 May 2018

Yes, that is the plan.

On Fri, May 4, 2018, 1:00 PM Sean McArthur notifications@github.com wrote:

I didn't see any mention of it in this issue, nor could I find it in the
project https://github.com/rust-lang/rust/projects/3, but having this
stable the 2018 edition seems like a requirement, if any new keywords will
also be introduced (which seems to be a certainty).

Without this, people couldn't update to the new epoch if they used any
crates with identifiers like async, try, delegate, etc.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/48589#issuecomment-386717294,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABivSItuxwcfMbjRZF6BhOSNTrMsNhdpks5tvLNugaJpZM4SVdSg
.

Manishearth on 4 May 2018

I believe this feature is ready for consideration for stabilization (in the runup to the Edition). It's a very simple feature, and widely understood to be necessary for Editions in general (to support adding keywords). I don't believe any of the listed unresolved questions are blockers to stabilization per se (though they should be addressed for the Edition itself).

If you haven't had a chance to try this out and think you may have an opinion, please give it a whirl and leave feedback.

@rfcbot fcp merge

aturon on 18 Jul 2018

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

[x] @aturon
[x] @cramertj
[x] @eddyb
[x] @joshtriplett
[x] @nikomatsakis
[x] @nrc
[ ] @pnkfelix
[ ] @scottmcm
[x] @withoutboats

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot on 18 Jul 2018

I found the other day that proc_macro::quote!(r#foo) will not do anything specific about raw identifiers, although I'm not sure if it's broken. Also, it's unstable still.

eddyb on 18 Jul 2018

I have a small concern which I don't think has been addressed yet; quote, the library that most proc macros use, uses #foo for interpolation, because it is unable to use $foo (I forget why, maybe we can fix this).

This seems to have the potential to break serde and a lot of other crates, have we had enough time to test that out?

Since the macro uses syn it's only a problem if syn gains support for raw identifiers, but it will have to eventually, so we should perhaps ensure that the dollar sign thing works for syn/quote before pushing this out?

cc @dtolnay @mystor

Manishearth on 18 Jul 2018

I don't know about any tests, but r# was chosen initially because it should not be ambiguous with anything in macros.

withoutboats on 18 Jul 2018

IIRC, r# was already formerly consumed by the parser as the start of a raw string, so r#foo in a macro would have been a syntax error, whereas r #foo would have been and should still be legal. So I would expect that now r# is greedily a raw string or raw identifier, but never split into r #foo otherwise.

So quote!(r#foo) parses as a raw identifier, and quote!(r #foo) leads quote to try and interpolate foo.

cuviper on 18 Jul 2018

@withoutboats quote from the quote crate is a different flavor of macro.

I think if we can be sure syn can disambiguate, this is fine

Manishearth on 18 Jul 2018

@Manishearth I don't know what you mean, but quote is a macro_rules macro. It should work just like any other macro_rules macro. If syn lexes r# the same way rustc does, anything using syn should also work correctly.

withoutboats on 18 Jul 2018

Oh, I see what you mean.

I meant that #foo carried meaning in quote that does not in regular macro-defining syntax, but that wasn't your point.

Looking at the implementation it seems to be fine.

Manishearth on 18 Jul 2018

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 24 Jul 2018

You can't detect $foo properly in a macro_rules macro, but if you have a proc macro, it should work (at least now, maybe it has had issues in the past).

eddyb on 31 Jul 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

rfcbot on 3 Aug 2018

I've nominated this mostly so that someone (lang team, maybe) can find or select someone to write up docs and stabilize this feature.

Mark-Simulacrum on 8 Aug 2018

This is in need of a stabilization PR! There's a stabilization guide on the forge. This feature is already documented in the edition guide, so much of that documentation can probably be reused in the stabilization PRs. Please post here if you plan to take this issue! (I'll circulate it around and see if anyone wants to take it as a good first or third PR)

cramertj on 9 Aug 2018

I'll have a go. This looks pretty straightforward.

alexreg on 9 Aug 2018

👍1

When it comes to new documentation, is there anything that should be updated besides the Reference? I'm not sure it belongs in the Book.

alexreg on 9 Aug 2018

Merged! I think some boxes can be ticked off now, @Centril. :-)

alexreg on 21 Aug 2018

As for the unresolved questions:

Do macros need any special care with such identifier tokens?
No, I can't see how this would affect macros. Perhaps @petrochenkov could second this, however.
Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
Yes, I think so, although possibly we should not do this if we're using an old edition and the keyword was only introduced in a later one.
Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
I'm not sure of this. What do you mean by pub use old_epoch::*, @Centril?

alexreg on 21 Aug 2018

Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
I'm not sure of this. What do you mean by pub use old_epoch::*

For instance, if the old_epoch crate had a fn catch() which it could declare without bothering with raw identifiers, and a new_epoch crate used and re-exported it. The new crate would have had to use raw identifiers if it had declared that function itself. Should this affect the way rustdoc presents it?

cuviper on 21 Aug 2018

@cuviper Right, makes sense. I think the rules should be the same as what I proposed for diagnostics, in that case.

alexreg on 21 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings