Rust: Use section/symbol ordering files for compiling rustc

Created on 11 May 2018  路  9Comments  路  Source: rust-lang/rust

The order in which code is located in binaries has an influence on how fast the binary executes because (as I understand it) it affects instruction cache locality and how efficiently the code is paged in from disk. Many linkers support specifying this order (e.g. LLD via --symbol-ordering-file and MSVC via -ORDER). The hard part, though, is to find an order that will actually improve things. The chromium project has a tool for thisand somewhere else I've read that valgrind could be used for this too. The expected speedups are a few percent.

Prerequisites:

  • [ ] Support function instrumentation in rustc (if using the chromium tool) similar to what GCC's -finstrument-functions does.
  • [ ] Compile an instrumented version of the compiler
  • [ ] Run the instrumented version of the compiler for a realistic test program (this should be less sensitive than full PGO)
  • [ ] Use the generated ordering file for building release artifacts

The first point shouldn't be too hard. The rest, however, would big a big infrastructure investment. I hope that we'll get PGO support for our CI at some point. This symbol ordering business could then be part of that.

cc @glandium @rust-lang/wg-compiler-performance @rust-lang/infra

A-rustbuild C-enhancement I-compiletime T-compiler T-infra WG-compiler-performance

Most helpful comment

@Mark-Simulacrum I don't think this necessarily needs to involve CI at all. I envision these tools as useful for the artifacts that we distribute to users, rather than as an aid to rustc developers. Seems like it could just be the final step on the build servers while we're doing releases.

All 9 comments

For your reference, Git uses their integration tests as a source of PGO.

Missing slash at the end in the link to cygprofile (should be https://cs.chromium.org/chromium/src/tools/cygprofile/) without it I get an error.

@ishitatsuyuki Interesting!

As an alternative to the google tool, there is BOLT by facebook (github link).

Great find, @est31!

(This was originally typed in response to https://github.com/rust-lang/rust/issues/55137 which has been closed as a duplicate of this issue)

I think the blocker historically for BOLT/PGO/LTO has been finding CI time, especially in the case of BOLT and PGO for gathering profile data. I think if the answer to "Can BOLT be run on a different binary from which we've gathered data for? (e.g., stage1/bin compiler is profiled while building stage2/bin compiler and then stage2/bin compiler is optimized?" is yes -- and there's still benefit from this -- then my next question is "how long does BOLT take?"

If someone would be willing to do the research to answer these questions then I think integrating this into CI would become more feasible. One good thing is that we can likely not worry about implementing this for all platforms at once since AFAICT BOLT is "just" an optimization

@Mark-Simulacrum I don't think this necessarily needs to involve CI at all. I envision these tools as useful for the artifacts that we distribute to users, rather than as an aid to rustc developers. Seems like it could just be the final step on the build servers while we're doing releases.

Well, our CI is Rust's build server, so in that regard that's why time especially is important.

I tried BOLT with my own build, and it performed 3% better on average. This was a rough benchmark since I'm using my laptop though, so it might be just noise. (I'm probably not going to run this again until I get a workstation.)

BOLT has some caveats:

  • You need a linker flag to keep relocation information for BOLT usage.
  • BOLT uses enormous amount of memory (about 6GB in my case). The run time itself is not bad, it was below 30s I think.
  • Perf needs to be ran with LBR support; this is almost always unsupported with VMs (which means you don't want to run the measurement inside CI).
  • Perf records tend to be big. Watch out for disk space.
  • You can use data from previous runs without issues, but for releases (presumably stable/beta) fresh data is recommended.

As for gathering data, maybe running them on rustc-perf is another option? We can make use of its perf support.

Was this page helpful?
0 / 5 - 0 ratings