Rust: Tracking Issue for stabilizing Profile-Guided Optimization support

Created on 12 Apr 2019  路  11Comments  路  Source: rust-lang/rust

Profile-guided optimization (PGO) is a common optimization technique for ahead-of-time compilers. It works by collecting data about a programs typical execution (e.g. probability of branches taken, typical runtime values of variables, etc) and then uses this information during program optimization for things like inlining decisions, machine code layout, or indirect call promotion.

LLVM supports different forms of PGO, the most effective of which relies on generating instrumented binaries that collect the profiling information. The Rust compiler has had experimental support for this via the -Z pgo-gen and -Z pgo-use flags for a while. This issue tracks the progress of making this functionality available in stable Rust.

Steps:

  • [x] Add run-make test that makes sure that inline and cold attributes are added to LLVM IR by LLVM's PGO passes. (#60262)
  • [x] Add codegen test making sure that instrumentation is generated as expected. (#60038)
  • [x] Add run-make test that makes sure PGO works in a mixed Rust/C++ codebase using ThinLTO done via lld. (#61036)
  • [x] Document the command line flags in the rustc book.
  • [x] Document the implementation in the rustc-guide. (https://github.com/rust-lang/rustc-guide/pull/318)
  • [x] Fix suboptimal compilation of the profiling runtime (#59531, PR fixing it is #60402)
  • [x] Fix instrumentation related linker errors on MSVC (#59812)
  • [x] Clarify role of the pre-inlining pass optionally done by LLVM when compiling with PGO.

    • I verified that rustc already behaves like Clang here: Pre-inlining is enabled except for -Copt-level=s and -Copt-level=z.

  • [x] Triage PGO-related issues reported so far

    • [x] #49409 (support for Windows GNU)

    • [x] #57257 (support for Aarch64, fixed)

    • [x] #57258 (linker error with rather specific setup, not a blocker)

  • [x] Clarify whether reported problems related to SEH on Windows affect the stabilization of this feature. (see #61002 and #61005)
  • [x] Adapt PGO cli flags to how Clang) and GCC handle them. (Implement in https://github.com/rust-lang/rust/pull/59874)
  • [x] Make PGO available via stable -C profile-generate and -C profile-use flags (mirroring the corresponding Clang flags).
  • [x] Clarify if PGO support should be restricted to tier 1 platforms (because we need to provide the profiling runtime). cc @rust-lang/release

Non-Goals:

Further Information:

  • Some further information is provided at the WG-profile-guided-optimization page.
  • As far as I can tell, PGO subsumes "ordering file" related optimizations (see https://github.com/rust-lang/rust/issues/50655). UPDATE: This should be verified. PGO might only affect basic block ordering, not the order of entire functions.

cc @rust-lang/core @rust-lang/compiler

A-codegen C-tracking-issue T-compiler WG-codegen metabug

Most helpful comment

Clarify if PGO support should be restricted to tier 1 platforms (because we need to provide the profiling runtime).

Why restricted -- shouldn't we welcome anyone to implement the other platforms? But it's understandable if it's only supported on tier 1 at first.

All 11 comments

Thanks for opening this up @michaelwoerister!

One thing I might requests as well is a solid idea of how PGO could be integrated into a Cargo subcommand. I don't think we should add support to Cargo itself immediately, nor do I think a proof-of-concept is required to actually exist, but rather I think we should make sure that there's a solid idea of how this could be used by general folks. I suspect that such a command may not actually be used by codebases like Firefox, but for general optimization and usage I think it would be pretty valuable to have a solid "PGO in Rust story" in at least an MVP state which doesn't require tons of voodoo and working around various aspects of the build

Clarify if PGO support should be restricted to tier 1 platforms (because we need to provide the profiling runtime).

Why restricted -- shouldn't we welcome anyone to implement the other platforms? But it's understandable if it's only supported on tier 1 at first.

Why restricted -- shouldn't we welcome anyone to implement the other platforms? But it's understandable if it's only supported on tier 1 at first.

Yeah, that's what I meant really: Stabilization would only entail support for tier 1 platforms. If other platforms work too that would be fine of course :)

One thing I might requests as well is a solid idea of how PGO could be integrated into a Cargo subcommand.

Yes, that's a good idea. What form would you expect this to take exactly?

Heh well I've never actually done PGO myself, but knowing at least the basics of it in the very end it'd be cool to have something like:

$ cargo build --release --pgo foo

where foo is a binary in the project which is the PGO test suite and executes everything that needs to be profiled. That would build everything in release mode with PGO instrumentation, run the binary, and then rebuild everything afterwards without PGO instrumentation but using the PGO data. (magically passing all the right flags everywhere, handling build scripts and procedural macros that aren't PGO'd, etc).

In the meantime though I'd be fine with:

$ cargo pgo

which maybe runs the test suite by default? Something like cargo pgo --bin foo as well if you only wanted to execute one binary.

That sounds like it should be straightforward to implement. The only complication I see is that the llvm-profdata tool needs to be available to Cargo. The workflow is something like:

# Compile with instrumentation
rustc -Cprofile-generate=target/pgo src/main.rs
# Run the instrumented binary
target/release/proggy
# Run llvm-profdata to bring profiling data into usable state
llvm-profdata merge -output=target/pgo/proggy.profdata target/pgo
# Run the compiler again
rustc -Cprofile-use=target/pgo/proggy.profdata src/main.rs

Indeed yeah I suspect it's straightforward :)

I just want to make sure that the default use case isn't "oh be sure to install the compiler from source because we don't provide you everything" and shipping llvm-profdata with the llvm-tools component seems reasonable. (or maybe even renaming that to be rustc-specific rather than LLVM specific).

The other thing I'd want to make sure is that it works reasonable well for dependencies as well, where if you have rlib dependencies you don't have anything special to do it "just works" or works somehow. (also FWIW I haven't used PGO before, this may already be the case!)

The other thing I'd want to make sure is that it works reasonable well for dependencies as well, where if you have rlib dependencies you don't have anything special to do it "just works" or works somehow. (also FWIW I haven't used PGO before, this may already be the case!)

Yes, that should already work if dependencies are built with instrumentation too (i.e. -Cprofile-generate and -Cprofile-use are passed to rustc when compiling them).

Is it possible to generate a pgo optimized rustc using ./x.py ? Ideally it would use the rustc-perf benchmarks as a source, and not rustc tests, for generating the profiles.

@gnzlbg x.py does not have any builtin support for this at the moment. It would be possible but quite a bit of work to implement, I suspect.

PGO has been stabilized in https://github.com/rust-lang/rust/pull/61268 馃帀

Was this page helpful?
0 / 5 - 0 ratings