Rust: Streamline `Symbol`, `InternedString`, and `LocalInternedString`.

Created on 16 May 2019  路  33Comments  路  Source: rust-lang/rust

We currently have three closely-related symbol types.

Symbol is the fundamental type. A Symbol is an index. All operations work on that index. StableHash is not implemented for it, but there's no reason why it couldn't be. A Symbol can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.

InternedString is a thin wrapper around Symbol. You can convert a Symbol to an InternedString. It has two differences with Symbol.

  • Its PartialOrd/Ord/Hash impls use the chars, rather than the index.
  • Gensym-ness is ignored/irrelevant.

LocalInternedString is an alternative that contains a &str. You can convert both Symbol and InternedString to LocalInternedString. Its PartialOrd/Ord/Hash impls (plus PartialEq/Eq) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within a Symbol or InternedString, which is sometimes necessary.

I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between Symbol and InternedString is subtle and has caused
bugs.

Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.

Here's what I would like as a final state.

  • Symbol exists.
  • InternedString does not exist.
  • LocalInternedString perhaps exists, but is only used temporarily when code needs access to the chars within a Symbol. Alternatively, Symbol could provide a with() method (like InternedString currently has) that provides access to the chars, and then LocalInternedString wouldn't be needed.
  • Symbol's impl of Hash uses the index, and its impl of StableHash uses the chars.
  • Not sure about Symbol's impl of PartialOrd/Ord. If a stable ordering is really needed (perhaps for error messages?) we could introduce a StableOrd trait and use that in the relevant places, or do a custom sort, or something.
  • Gensyms don't really exist. They are simulated: when you call gensym(), it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. And Interner could keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, #49300 talks about other ways of dealing with gensyms.
  • All this should also help performance, because we'd end up with more operations on indexes, and only the necessary ones on chars (which require TLS lookups).

I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.

Thoughts?

CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister

C-enhancement T-compiler

Most helpful comment

Gensyms are now gone, that should unblock the other clean up here.

All 33 comments

StableHash is not implemented for it, but there's no reason why it couldn't be.

Small correction: StableHash is implemented for Symbol, see impl<'a> HashStable<StableHashingContext<'a>> for ast::Name in impls_syntax.rs.

(ast::Name is an alias of Symbol conventionally used for identifier symbols (in AST/HIR/etc. coming from token::Ident, lowered from ast::Ident.))

In general, I'm in favor of simplifying the setup here (and properly documenting it in the process). The main problem here, from the point of view of incremental compilation, is that gensym depends on the state of the entire crate, so changing something in one source file will change gensyms in (potentially) all other source files. As long as gensym indices are generated from a global counter, every tiny change will look to incremental compilation as if you'd renamed half things in your code base.

I think that:

  • the gensym stuff should be factored out of the string interning infrastructure entirely (there's no fundamental reason to intertwine the two), and
  • gensym index generation should use a more stable scheme rather than a global counter.

I'm all for reducing the number of types if that's possible.
I'd rather use a separate "stable" set of operations/traits used by incremental (and query?) infra, than introduce separate types that have to leak into all other parts of the compiler, if that's possible.

I'm not sure the scheme with simply incrementing a session-local counter will work for gensyms.
Gensyms can come from metadata from other crates, encoded as strings and "foo$3" from other crate may mean entirely different thing than "foo$3" based on the local counter.
Perhaps appending long enough hashes will work better.
Anyway, I think the proper solution is to use hygiene and move gensymness from Symbol to Ident, so I'm not sure any intermediate solutions would improve significantly enough on the current situation to implement them. (But you can certainly try.)

StableHash is not implemented for it, but there's no reason why it couldn't be.

Small correction: StableHash is implemented for Symbol, see impl<'a> > HashStable> for ast::Name in impls_syntax.rs.

(ast::Name is an alias of Symbol conventionally used for identifier symbols (in AST/HIR/etc. coming from token::Ident, lowered from ast::Ident.))

Ah, interesting. I think the real problem is actually that StableHash must be equivalent to PartialEq, that is:

  • x == y implies stable_hash(x) == stable_hash(y), and
  • x != y implies stable_hash(x) != stable_hash(y).

That second condition makes it more strict than Hash (i.e. no hash collisions are allowed). I going to open PRs trying to document this.

I tried doing the "foo$3" approach but bootstrapping failed very early on with blatant bugs in name resolution. I suspect it's a dead end for the reasons mentioned above.

I agree that moving the gensym stuff from the symbol/interning level to the ident level sounds like the right idea. (Plenty of the interned strings are not identifiers.) What would "use hygiene" look like for dealing with gensyms? I know that Spans can have a hygiene element, but I don't know what data would be added to the hygiene element.

What would "use hygiene" look like for dealing with gensyms? I know that Spans can have a hygiene element, but I don't know what data would be added to the hygiene element.

"Gensym span" == Span::def_site() of a unique freshly introduced macro.
This means a unique fresh outer_mark in SyntaxContextData specifically.

Most of gensyms used in the compiler (those introduced by built-in derives or desugarings) don't even need the "unique fresh" part, they can just use Span::def_site() of the macro/desugaring they are introduced by.

A relevant recent thread - https://github.com/rust-lang/rust/pull/60106.

I think the proper solution is to use hygiene and move gensymness from Symbol to Ident

60903 is a first step: it moves the gensym operations from Symbol to Ident. (The gensym implementation details are still within Interner.)

So I've had a look through the code and done some experimentation with replacing gensyms with hygiene. To give a summary

Don't need to be unique identifiers at all

As far as I can tell, these don't escape into the AST, so the gensym call is unnecessary:

https://github.com/rust-lang/rust/blob/50a0defd5a93523067ef239936cc2e0755220904/src/libsyntax/ext/tt/macro_rules.rs#L255-L256

I also can't see the point of the module declared here:

https://github.com/rust-lang/rust/blob/50a0defd5a93523067ef239936cc2e0755220904/src/libsyntax/diagnostics/plugin.rs#L124

Refactor out of existence

This gensym avoids an assert, before it get converted to an interned string. It could be removed now, but there's probably a larger set of refactorings around type/lifetime parameters so that we aren't using the name of a parameter to determine whether it's Self or not.

https://github.com/rust-lang/rust/blob/50a0defd5a93523067ef239936cc2e0755220904/src/librustc/hir/lowering.rs#L2712-L2716

Once the async fn desugaring can be moved to HIR lowering, we can also avoid gensyms there.

Could be replaced with _

This doesn't really avoid the gensym, so much as make someone else do it.

Subject to a decision on #61019, we could use _ as the rename instead.

https://github.com/rust-lang/rust/blob/50a0defd5a93523067ef239936cc2e0755220904/src/libsyntax/std_inject.rs#L72

Likewise, we could use _ here as well:

https://github.com/rust-lang/rust/blob/50a0defd5a93523067ef239936cc2e0755220904/src/librustc_resolve/build_reduced_graph.rs#L317

Can use hygiene

Making built in derives have def-site hygiene appears to work (from some simple testing). format_args! appears to be similar.

Don't have an associated macro

Global allocators, tests and proc macros have AST passes that use a gensym to hide a global module. We could use hygiene, but we don't have a mark associated to a macro expansion, so the compiler will ICE here:

https://github.com/rust-lang/rust/blob/50a0defd5a93523067ef239936cc2e0755220904/src/librustc_resolve/build_reduced_graph.rs#L761

Tests

Here we run in to trouble with because the obvious way to add hygiene runs into the problem with this code:

#![feature(decl_macro)]

macro a() {
    pub struct A;

    mod module {
        use super::A; // Fails to resolve, since the the context of `A` is stripped when we try to resolve it in the root.
    }
}

a!();

Resolve and _

Finally we use gensyms to allow multiple items called _ in the same module. I'm not sure what the motivation for using gensyms rather than a special case in resolve is. But this seems to be the case that is most likely to need "gensym spans"

Wow, nice write-up, @matthewjasper!

Global allocators, tests and proc macros have AST passes that use a gensym to hide a global module. We could use hygiene, but we don't have a mark associated to a macro expansion, so the compiler will ICE here:

Could we get the Mark from the fact that in all of these cases there is an attribute?
And perform all the work as expanding that attribute?

There's one attribute per test. Since a user isn't obligated to write at least one test we don't necessarily have a mark but we still have to hide the main test runner function.

Global allocators could probably use the attribute.

I global allocators can be formulated in terms of a proc macro, that would be great (one less special case).

If not, then we can create fresh marks for them (Mark::fresh, which is already used in various non-macro places, even too liberally perhaps), it just needs to be "registered" in resolve, so it has a module parent and other associated data, and doesn't ICE on attempts to access them.

@matthewjasper You can make --test inject a #![test] on the crate (or #![test_harness] etc.)
I don't remember who was working in this area, but there were some plans to make it all less magical, for various reasons (fixing bugs and allowing custom test harnesses) - cc @Manishearth

#[global_allocator] is turned into a macro in https://github.com/rust-lang/rust/pull/62735.
I'm pretty sure now that the gensym in it can be removed together with the whole generated allocator_abi module and some imports.

Gensyms are now gone, that should unblock the other clean up here.

LocalInternedString perhaps exists, but is only used temporarily when code needs access to the chars within a Symbol. Alternatively, Symbol could provide a with() method (like InternedString currently has) that provides access to the chars, and then LocalInternedString wouldn't be needed.

64141 and #65426 greatly reduced the usage and capability of LocalInternedString, getting it pretty close to this desired state. I also tried adding Symbol::with()... it works in principle, but in practice it's pretty annoying, and there are hundreds of use sites that would need changing. So I gave up on that.

I'm looking now into removing InternedString. To do this, we must effectively merge it with Symbol. There are two obvious possibilities:

  • Change Symbol::{Ord,Hash} to work with the symbol chars. This appears to work, but I get some performance regressions of up to 3%. The regressions come much more from Hash than from Ord; the char-based hashing is more expensive, plus we have to access TLS to get the chars in the first place (which is also bad for the parallel compiler).
  • Change InternedString::{Ord,Hash} to work with the symbol index. I get a few test errors this way, on tests relating to stability of names. This would be my preferred option if I can get it to work. I suspect only a handful of places actually need the char-based operations.

Here are the first test failures I get when I change InternedString::Ord to work with the symbol index instead of the chars:

---- [run-make] run-make-fulldeps/reproducible-build stdout ----

error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
rm -rf /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build && mkdir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  linker.rs -O
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build-aux.rs
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build.rs -C linker=/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build.rs -C linker=/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker
diff -u "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments1" "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments2"
--- /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments1   2019-10-18 09:31:45.248924701 +1100
+++ /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments2   2019-10-18 09:31:45.456923894 +1100
@@ -5,7 +5,7 @@
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.0.rcgu.o: 9024332235029870339
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.1.rcgu.o: 8252971569739609253
-/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.2.rcgu.o: 3534221490709993710
+/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.2.rcgu.o: 1155913470043241769
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.3.rcgu.o: 10237418006570950210
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.4.rcgu.o: 15369966276953066318
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.5.rcgu.o: 320128326741088814

------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:21: smoke] Error 1

------------------------------------------


---- [run-make] run-make-fulldeps/reproducible-build-2 stdout ----

error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
rm -rf /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 && mkdir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build-aux.rs
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build.rs -C lto=fat
cp /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build.rs -C lto=fat
cmp "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a" "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build" || exit 1
/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build differ: byte 781, line 1

------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:17: fat_lto] Error 1

------------------------------------------

failures:
    [run-make] run-make-fulldeps/reproducible-build
    [run-make] run-make-fulldeps/reproducible-build-2

test result: FAILED. 0 passed; 2 failed; 199 ignored; 0 measured; 0 filtered out

Here is a representative failure I get when I change InternedString::Hash to work with the symbol index instead of the chars:

---- [ui] ui/symbol-names/basic.rs#legacy stdout ----
diff of stderr:

-   error: symbol-name(_ZN5basic4main17hd72940ef9669d526E)
+   error: symbol-name(_ZN5basic4main17h0e00d3497edec60eE)
2     --> $DIR/basic.rs:7:1
3      |
4   LL | #[rustc_symbol_name]

5      | ^^^^^^^^^^^^^^^^^^^^
6   
-   error: demangling(basic::main::hd72940ef9669d526)
+   error: demangling(basic::main::h0e00d3497edec60e)
8     --> $DIR/basic.rs:7:1
9      |
10  LL | #[rustc_symbol_name]

AFAICT the symbol suffix changed in this test is produced by get_symbol_hash(), which is interesting because it appears to only be using stable hashing, which still uses the chars:
https://github.com/rust-lang/rust/blob/5a8fb7c24d85d0bd911ca465c4a077c7cee608ae/src/librustc_codegen_utils/symbol_names/legacy.rs#L71-L136

But maybe some non-stable hashing is sneaking into that computation somehow?

It would be worrisome for non-stable hashing to be sneaking in, but I suspect not impossible. I'm not sure how to try and track that down. (Maybe @michaelwoerister has thoughts?).

If the new hash is still stable though it seems fine to switch over to that? i.e., if the test just needs updating but not in a constant manner.

Here is a representative failure I get when I change InternedString::Hash to work with the symbol index instead of the chars:

I changed the InternedStrings within DefPathData to Symbols and this failure reproduces, so that narrows it down a lot. I think def_path_hash() is involved somehow, and maybe Definitions::next_disambiguator. I think the new hash value is stable. So if these suffixes aren't supposed to be stable across different rustc versions then updating the test seems ok.

I have confirmed that changing Symbol to work with chars (for ordering and hashing) works: #65543 has the code. That would be the easiest path toward eliminating InternedString, but I don't think it's the best one, because it's pre-emptively giving up on the performance advantages of Symbol.

I am also working on converting InternedString occurrences to Symbol in chunks, in order to work out which conversions are problematic.

Here are the first test failures I get when I change InternedString::Ord to work with the symbol index instead of the chars:

I can reproduce these just by changing SymbolName::name from InternedString to Symbol.

Should either of the types even implement Ord? As for stable hashing, try removing the Hash impl from InternedString and see what errors, only StableHash should be used.

Should either of the types even implement Ord?

It's used in various places, mostly sorting things for error messages.

try removing the Hash impl from InternedString and see what errors, only StableHash should be used.

Definitions::next_disambiguator seems to be the only use.

Sounds like next_disambiguator should be using StableHash.

It's used in various places, mostly sorting things for error messages.

Might make more sense to use String for sorting errors. cc @nikomatsakis @estebank

I looked more closely, and my description of how Ord and Hash are used was woefully incomplete. I will give a more comprehensive description on Monday.

Here is how the various impls are used. These lists may be incomplete.

Symbol::Hash needed for:

  • Ident::Hash
  • Stability::Hash
  • StabilityLevel::Hash
  • RustcDeprecation::Hash
  • BUILTIN_ATTRIBUTE_MAP
  • edition_enabled_features
  • probably other things too

Symbol::{PartialOrd,Ord} needed for:

  • ProjectionElem::{PartialOrd,Ord}
  • BoundNameCollector::regions: BTreeSet

InternedString::Hash needed for:

  • DefKey::compute_stable_hash calls name.hash() on a name obtained from a
    DisambiguatedDefPathData
  • InternedString is used within DefPathData, which derives Hash

    • Required for Definitions::next_disambiguator

  • CompileCodegenUnit/codegen_unit queries

    • Because QueryConfig::Key requires Hash, for QueryCache

InternedString::{PartialOrd,Ord} needed for:

  • ToStableHashKey impl uses Ord, for various sorts

    • Perhaps it should really use OrdStable, like it uses HashStable?

  • Symbol's Hash should probably be kept as-is, for O(1) hashing
  • InternedString's Hash can probably be replaced by Symbol's StableHash

Why does ProjectionElem need Ord? Sounds like uses of Symbol's Ord actually should be using InternedString's semantics, especially if user-facing.

OrdStable on Symbol and derived on containing types sounds good, if we can't get away with manually computing contents-based Orderings in a couple places (or using String in perf-uncritical code).

I have a patch stack that eliminates InternedString entirely by converting to Symbol and using LocalInternedString in the handful of places where we need stability. I will file a PR once I do some more clean-ups and perf measurements.

After #65657, I have pretty much everything I want:

  • InternedString is gone.
  • LocalInternedString is very limited.
  • Symbols's Hash, PartialOrd and Ord impls all use the index.

65657 finished this off.

65776 is a final follow-up that just cleans some stuff up.

Was this page helpful?
0 / 5 - 0 ratings