It'd be really nice if there were an easy way to breakdown compilation time into a few categories, and to get a summary of reuse:
Something that we could conveniently copy and paste into bug reports to get an overall sense of what is going. This would replace and/or build on the existing profile options. (Even better would be if we could then have an option to get per-query data, later on.)
cc @nnethercote @michaelwoerister
Excluding the reuse part, isn't this what -Ztime-passes is for?
I've been told that profiling queries are meant to replace -Ztime-passes, but when I followed the instructions at https://forge.rust-lang.org/profile-queries.html I couldn't get it to work -- it produced empty files.
Here are some ways that I think -Ztime-passes could be improved:
A lot of improvements could be made by recording data and then having a post-processing step, which could aggregate insignificant buckets, for example. I don't know whether that post-processing should happen within rustc itself or be a separate tool.
@nnethercote I think that "time-passes" and the profile-queries concepts probably ought to be merged, and renamed to something like -Zself-profile
. One problem with time-passes is that the compiler is moving towards a model where it doesn't have passes β but we're not there yet. This means that the implementation model of time-passes is pretty broken: it is based around some top-level calls, but in fact those calls can sort of unreliably include a bunch of unrelated work. For example, invoking the MIR borrow checker may also construct MIR, but it's probably not a good idea to account those the same. This is what profile-queries was meant to fix, but as you observed, the impl is broken. I think we should probably fix profile-queries and simplify itΒ β for example, it doesn't seem like we've really had much need for the HTML-based output it produces, and we'd rather just kind of get a flat list of which queries are taking up time. (And yes it'd be nice to group those queries, because often we don't need that much detail.)
I think my preference would be that when we declare queries, we also declare a "phase" or something that is supposed to be used for a coarse-grained dump (and then you could use -Zfull-profile
or something if you want the full details).
I'd like to work on this.
It would be great if we had a solution that can deal well with parallelism, potentially also showing points of contention. We have the (possibly bitrotted) -Ztrans_time_graph
option which generates a diagram that is rather crude but was indispensable for implementing the scheduler correctly. There was also some discussion about generalizing this and using a known output format that can be visualized in existing tools here: https://github.com/rust-lang/rust/issues/43804
That being said, I don't think we should try to fix any of the existing solutions but build a new one. It should support at least a pure text summary format that can easily be collected by users and pasted in bug reports. An additional, more detailed output format that can be easily processed by other tools would be great too.
@nnethercote, are there any lessons to be learned from FF's profiling support? Or the valgrind-based tools?
Also cc the whole @rust-lang/wg-compiler-performance
Apologies for the slow reply.
Text-based output is indeed extremely convenient for pasting into bugs, etc., so I recommend it whenever it's appropriate. The simplest thing to do is just dump out the final text directly from rustc, as -Ztime-passes currently does, but that's inflexible.
The alternative is to write data to a file and then have a separate viewing tool. If the data has a tree-like structure (as -Ztime-passes already does) then the ability to expand/collapse subtrees is very useful. Firefox's about:memory is a good example of this. Here's some sample output:
397.53 MB (100.0%) -- explicit
βββ108.74 MB (27.35%) ββ heap-unclassified
ββββ63.97 MB (16.09%) -- js-non-window
β βββ39.85 MB (10.02%) -- zones
β β βββ35.94 MB (09.04%) -- zone(0x114a80000)
β β β βββ18.46 MB (04.64%) ++ (36 tiny)
β β β βββ17.48 MB (04.40%) -- realm([System Principal], shared JSM global)
β β β ββββ8.81 MB (02.22%) ++ classes
β β β ββββ8.67 MB (02.18%) ++ (11 tiny)
It's all text so it can be copy and pasted easily. (The lines are Unicode light box-drawing chars u2500, u2502, u2514, u251c, which usually render nicely in terminals and text boxes.) But it's also interactive: when viewing it within Firefox you can click on any of those lines that have "++" or "--" to expand or collapse a sub-tree. Also, by default it only shows as expanded the sub-trees that exceed a certain threshold; this is good because you can collect arbitrarily detailed data (i.e. as many buckets, sub-buckets, sub-sub-buckets, etc., as you like) without it being overwhelming.
I would recommend using a web browser as your display tool. One possibility is to generate an HTML page(s) directly from rustc, as is done by the profile-queries. Another possibility is to generate some other intermediate format (JSON?) and then write a webpage/website that reads that in, as is done by rustc-perf's site. The latter approach might be more flexible.
I've got a very basic version of this here. Here's a sample of the output:
wesley@endurance:~/code/rust/rustc-serialize> RUSTFLAGS="-Zself-profile" cargo build
Compiling rustc-serialize v0.3.24 (file:///home/wesley/code/rust/rustc-serialize)
Self profiling results for rustc_serialize:
Parsing 27
Expansion 132
TypeChecking 2199
1597268 hits 1666449 queries
BorrowChecking 150
10064 hits 14436 queries
Codegen 2098
44011 hits 54580 queries
Linking 17
4724 hits 5660 queries
Other 601
2389409 hits 2461645 queries
Optimization level: No
Incremental: on
Finished dev [unoptimized + debuginfo] target(s) in 5.36 secs
It might be nice to have a category for "incremental overhead" such as hashing & serializing/deserializing the graph? Perhaps Parsing
could also output the number of lines parsed to give a sense for the size of the crate? Are there any other rustc
options that would be useful to include in the output?
That looks very cool! Some feedback:
hits
cache hits? It might be more readable to display a ratio instead of absolute numbers.rustc
options take the --verbose
flag into account. This could be used here to control how much data this provides.-Zincremental-info
could also be shown here (like CGU re-use, red vs green nodes, etc).This is awesome work @wesleywiser! π―
I agree with @michaelwoerister's suggestion that we include ms
or some other unit. I also think it'd be nice to display the data more "tabularly". Something like this:
| Phase | Time (ms) | Queries | Hits (%) |
| ----- | --------- | ------- | -------- |
| Parsing | 27 | | |
| Expansion | 132 | | |
| TypeChecking | 2199 | 1666449 | 95.8 |
| BorrowChecking | 150 | 14436 | 69.7 |
| Codegen | 2098 | 54580 | 80.6 |
| Linking | 17 | 5660 | 83.5 |
| Other | 601 | 2461645 | 97.1 |
I've gone ahead and made it valid Markdown, too, so that it can easily be pasted for nice display =)
| Phase | Time (ms) | Queries | Hits (%) |
| ----- | --------- | ------- | -------- |
| Parsing | 27 | | |
| Expansion | 132 | | |
| TypeChecking | 2199 | 1666449 | 95.8 |
| BorrowChecking | 150 | 14436 | 69.7 |
| Codegen | 2098 | 54580 | 80.6 |
| Linking | 17 | 5660 | 83.5 |
| Other | 601 | 2461645 | 97.1 |
Finally, I think we display the time as a percentage of the total for each phase.
As a last point, I think it'd be great if β to determine the total time β we do a single measurement from the "outside". This will help us to ensure we don't have "leakage" β that is, time that is not accounted for. Or at least it will give us an idea how much leakage there is.
Thanks for the feedback @nikomatsakis and @michaelwoerister! Those are all great ideas and I'll start working on those.
So it looks like the output is being produced directly by rustc, is that right?
@nnethercote Yes, that's correct.
Progress update: I've implemented much of the above feedback. The output now looks like this:
wesley@endurance:~/code/rust/rustc-serialize> RUSTFLAGS="-Zself-profile -Zprofile-json" cargo build
Compiling rustc-serialize v0.3.24 (file:///home/wesley/code/rust/rustc-serialize)
Self profiling results for rustc_serialize:
| Phase | Time (ms) | Queries | Hits (%) |
| ---------------- | -------------- | -------------- | -------- |
| Parsing | 27 | | |
| Expansion | 132 | | |
| TypeChecking | 2182 | 3321830 | 96.17 |
| BorrowChecking | 144 | 28872 | 69.71 |
| Codegen | 2065 | 110571 | 79.61 |
| Linking | 17 | 11320 | 83.46 |
| Other | 590 | 4922566 | 97.07 |
Optimization level: No
Incremental: on
Finished dev [unoptimized + debuginfo] target(s) in 5.29 secs
In addition, -Zprofile-json
causes a file containing the timing data to be created:
wesley@endurance:~/code/rust/rustc-serialize> cat self_profiler_results.json
[
{
"category": "Parsing",
"time_ms": 27
},
{
"category": "Expansion",
"time_ms": 132
},
{
"category": "TypeChecking",
"time_ms": 2182
},
{
"category": "BorrowChecking",
"time_ms": 144
},
{
"category": "Codegen",
"time_ms": 2065
},
{
"category": "Linking",
"time_ms": 17
},
{
"category": "Other",
"time_ms": 590
}
]
One suggestion: the JSON file should contain enough data to exactly recreate the -Zself-profile
output. Currently it appears to lack the information to recreate:
If it's possible to get the full command line that rustc was invoked with, I suggest including that in the JSON output too. That way if you end up with a bunch of different JSON files you have full data on how each one was created.
Finally, whenever you have coarse categories like this, the natural inclination is for users to want more detail. "Codegen takes 2065ms... can we break that down some more?" (I experienced this back in the early days of Firefox's about:memory.) This naturally takes us back to the idea of a tree-like structure, where you have categories, sub-categories, sub-sub-categories, etc.
a while ago I did some experiments with replacing time-passes with chrome tracing events I have updated my branch of rustc now and have some results posted on this page.
a sad thing is that the pages generated only works in chrome.
For some of the events I have extra data to describe the event so it is possible to see why some parts takes more time.
maybe some thing similar can be used for the verbose mode.
@wesleywiser this looks really nice.
I believe this can be closed now. We have the -Zself-profile
option, tools to process that data and documentation on how to do so.
Most helpful comment
I've got a very basic version of this here. Here's a sample of the output:
It might be nice to have a category for "incremental overhead" such as hashing & serializing/deserializing the graph? Perhaps
Parsing
could also output the number of lines parsed to give a sense for the size of the crate? Are there any otherrustc
options that would be useful to include in the output?