This issue is intended to track support for splitting a rustc
invocation that ends up invoking a system linker (e.g. cdylib
, proc-macro
, bin
, dylib
, and even staticlib
in the sense that everything is assembled) into two different rustc
invocations. There are a number of reasons to do this, including:
This can improved pipelined compilation support. The initial pass of pipelined compilation explicitly did not pipeline linkable compilations because the linking step needs to wait for codegen of all previous steps. By literally splitting it out build systems could then synchronize with previous codegen steps and only execute the link step once everything is finished.
This makes more artifacts cacheable with caching solutions like sccache
. Anything involving the system linker cannot be cached by sccache
because it pulls in too many system dependencies. The output of the first half of these linkable compilations, however, is effectively an rlib
which can already be cached.
This can provide build systems which desire more control over the linker step with, well, more control over the linker step. We could presumably extend the second half here with more options eventually. This is a somewhat amorphous reason to do this, the previous two are the most compelling ones so far.
This is a relatively major feature of rustc, and as such this may even require an RFC. This issue is intended to get the conversation around this feature started and see if we can drum up support and/or more use cases. To give a bit of an idea about what I'm thinking, though, a strawman for this might be:
rustc
, --only-link
and --do-not-link
. bin
crate type by passing the --do-not-link
flag, passing all the flags it normally does today.rustc
again, only this time passing the --only-link
flag.These two flags would indicate to rustc
what's happening, notably:
--do-not-link
indicates that rustc should be creating a linkable artifact, such as a one of the ones mentioned above. This means that rustc should not actually perform the link phase of compilation, but rather it's skipped entirely. In lieu of this a temporary artifact is emitted in the output directory, such as *.rlink
. Maybe this artifact is a folder of files? Unsure. (maybe it's just an rlib!)
The converse of --do-not-link
, --only-link
, is then passed to indicate that the compiler's normal phases should all be entirely skipped except for the link phase. Note that for performance this is crucial in that this does not rely on incremental compilation, nor does this rely on queries, or anything like that. Instead the compiler forcibly skips all this work and goes straight to linking. Anything the compiler needs as input for linking should either be in command line flags (which are reparsed and guaranteed to be the same as the --do-not-link
invocation) or the input would be an output of the --do-not-link
invocation. For example maybe the --do-not-link
invocation emits an file that indicates where to find everything to link (or something like that).
The general gist is that --do-not-link
says "prepare to emit the final crate type, like bin
, but only do the crate-local stuff". This step can be pipelined, doesn't require upstream objects, and can be cached. This is also the longest step for most final compilations. The gist of --only-link
is that it's execution time is 99% the linker. The compiler should do the absolute minimal amount of work to figure out how to invoke the linker, it then invokes the linker, and then exits. To reiterate again, this will not rely on incremental compilation because engaging all of the incremental infrastructure takes quite some time, and additionally the "inputs" to this phase are just object files, not source code.
In any case this is just a strawman, I think it'd be best to prototype this in rustc, learn some requirements, and then perhaps open an RFC asking for feedback on the implementation. This is a big enough change it'd want to get a good deal of buy-in! That being said I would believe (without data at this time, but have a strong hunch) that the improvements to both pipelining and the ability to use sccache
would be quite significant and worthwhile pursuing.
cc @cramertj and @jsgf, we talked about this at the RustConf and figured y'all would want to be aware of this
Would the --only-link
literally just invoke ld? If so, it might be useful to just be able to extract whatever extra libraries/options its adding to the link line, so we can independently regenerate the link line.
This would be useful for linking hybrid Rust/C++/(other) programs where the final executable is non-Rust. In other words, we could have C++ depending on Rust without needing to use staticlib/cdylib.
I don't think it'd just be a thin wrapper around ld
, no, but it would also prepare files to get passed to the linker. For example when producing a dylib
rustc will unpack an rlib and make a temporary *.a
without bytecode/metadata. Additionally if performing LTO this'd probably be the time we'd take out all the bytecode and process it. (maybe LTO throws a wrench in this whole thing). Overall though I don't think it's safe to assume that it'll just be ld
.
Firstly, we'd want the final linker doing LTO in order to get it cross-language, regardless of whatever language the final target is in and what mix of languages went into the target.
Secondly, since Buck has full dependency information, including Rust dependencies on C/C++, it will arrange for all the right libraries to be on the final link line. As a result we never want to use or honor #[link]
directive, and our .rlib
s don't contain anything to be unpacked.
(Even if that weren't true, at least on Unix systems, the .rlib
is just a .a
and could be used directly, except perhaps for the extension).
I like this proposal because it allows us to factor out the Rust-specific details from the language-independent ones. For example there's no reason for rustc to implement LTO if we're already having to solve that for other languages - especially when that solution pretty infrastructure-specific (distributed thin LTO, for example). There's also no real reason for us to use staticlib
/cdylib
if we can arrange for all the Rust linker parameters to be on the final link line, even if the final executable is C++, and it would be a significant code duplication reduction (unless LTO see it and eliminate it, but that's still a compile-time cost).
Ultimately, Rust lives in the world of linkable object files, and a final artifact is generated by calling the linker with a certain set of inputs. Since Rust doesn't have unusual requirements that make it incompatible with C/C++ linkage (eg special linkage requirements or elaborate linker scripts) then the final linker stage could be broadly language agnostic.
+1 to wanting the ability to turn off / disable #[link]
.
I'm generally in favor of this. Some thoughts:
--do-not-link
step and store it in the .rlink
output. rustc
? That would make things simpler for rustc
but probably has some overhead. Also, as far as I know, llvm-ar
does not support doing LTO, but for staticlibs one might want to have it (Firefox does this at least).I don't disagree that y'all's sorts of projects don't want to use the rustc-baked-in LTO, but I don't think we can remove it because many other projects do use it (and rightfully want to). Also this is still just a sort of high-level concept, but if a lot of feature requests are piled onto this it's unfortunately unlikely to happen.
Hi, my name is Victor and I'm workin with @tmandry.
I am very interested in this Issue as it potentially enables many other great cool. Therefore, I plan to make a prototype and ultimately implement it. @alexcrichton , do you have any suggestions/tips to start?
Great @0dvictor! The steps I'd recommend for doing this would probably look like:
As for the actual change itself I haven't looked too much into this, so I wouldn't know where best to start there.
Some implementation notes:
The main run_compiler
function calls Compiler::link
, which ensures there is a codegen step running. It then dispatches to the trait method CodegenBackend::join_codegen_and_link
.
For the LLVM backend (which is the only one right now), this method is implemented here. join_codegen_and_link
joins all the threads running codegen, which save their results to an object file per thread (often named foo.bar.<hash>-cgu.<threadno>.rcgu.o
on linux). The names of these object files are saved in the CodegenResults
struct (specifically, look in CompiledModule
).
Finally, it calls link_binary
which has the main logic for invoking the linker.
Obviously, we need to split apart all the code that assumes codegen and linking happen at the same time. This starts with the join_codegen_and_link
trait method. Thankfully, it doesn't seem like there is too much code that assumes this, but there's still a question of what to do in the new code.
For the flags, we can start with unstable options (-Z no-link
and -Z only-link
) and later stabilize them via the RFC process. When the no-link
flag is passed, we should not invoke link_binary
anymore. When the only-link
flag is passed, we need a way of recovering the information that was in our CodegenResults
struct so we can call link_binary
.
- In lieu of this a temporary artifact is emitted in the output directory, such as
*.rlink
. Maybe this artifact is a folder of files? Unsure. (maybe it's just an rlib!)
I don't much like using a directory as output, since some build systems might not support this.
Probably the best thing to do is to make an ar file (which I should note is what an rlib
is today). I don't much care what the extension is (I like .rlink
, since it probably gets handled differently than rlibs or staticlibs do today).
That said, the choice of extension should be up to whoever is invoking rustc, and we should use the -o
flag to decide where our output goes.
I think there might be details we need to pay attention to regarding the linker's default behavior when linking object files vs archive files (like symbol visibility), but not sure what those details are. cc @petrhosek
Bundling it together in a single ar file would do unnecessary work (both IO and CPU) Object files are always written to the disk. When building an ar file, they are copied then copied to the ar file and a symtab is created (ranlib). Creating a symtab can't be avoided if you dont want to unpack the ar file again before linking, as the linker requires a symtab to be present.
I suppose that in build systems that need it, we can wrap invocations with
a script that tars and untars the files, for example. I'm fine with having
an output directory in that case. Ideally the compiler would support both,
but that doesn't seem realistic for an initial implementation.
On Tue, Nov 19, 2019 at 10:58 AM bjorn3 notifications@github.com wrote:
Bundling it together in a single ar file would do unnecessary work (both
IO and CPU) Object files are always written to the disk. When building an
ar file, they are copied then copied to the ar file and a symtab is created
(ranlib). Creating a symtab can't be avoided if you dont want to unpack the
ar file again before linking, as the linker requires a symtab to be present.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/64191?email_source=notifications&email_token=AARMYYDGTTWB44PHNDV6OKTQUP5JBA5CNFSM4IUB7HWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEOPEEI#issuecomment-555545105,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AARMYYFKOYAUILYEUDBUCB3QUP5JBANCNFSM4IUB7HWA
.
I finally get a prototype working for the first part - generate a linkable object or bitcode file, as well as the linking command to invoke manually to finish linking. In addition, I also successfully ran LTO linking with a native lib.
While starting on the second stage, I found Rust is moving to LLD per https://github.com/rust-lang/rust/issues/39915. Can I make an assumption that I only need to support LLD or "clang/gcc -fuse-ld=lld"?
While moving to LLD is nice, it's unlikely to happen any time soon, so it's best to not make the assumption of LLD.
@0dvictor As a suggestion, you may want to file a PR that includes only the -Z no-link
option while you work on finishing the implementation of -Z link-only
. That way you can start getting feedback sooner and stay more in sync with mainline. But you may decide it's not worth the trouble.
rustc flags are in src/librustc/session/config.rs
. See e.g. this PR for an example.
@0dvictor As a suggestion, you may want to file a PR that includes only the
-Z no-link
option w
Good idea, let me polish my changes and create a PR.
Sorry about the delay.
After studying the code and making some experiments, I found all linker arguments comes from the following four sources:
rustc
: e.g. -L
-C relocation-model
, -C link-args
, -Z pre-link-args
, etc.[link_args = "-foo -bar -baz"]
At linking stage, assume user always pass the required CLI arguments:
--emit=object
or --emit=llvm-bc
--emit=metadata
Therefore, in my experiment, to compile without linking is basically:
--emit=metadata,object` or `--emit=metadata,llvm-bc
The user can choose to generate either an object file or LLVM bitcode file.
I have to make the following three changes to get it to work [PR #67195]:
.bc/.o/.ll/.s
file of the allocator and metadata (if needed) when user requests --emit=llvm-bc/object/llvm-ir/asm
.bc/.o/.ll/.s
file does not contain all code compiled from the rust source code.]--emit=metadata
even if the OutputType
is OutputType::Exe
.To minimize the impact of existing code, all changes are guarded by -Z no-link
.
I have not included (5) yet. My plan is to save it in either the rmeta file, or the bitcode/object file using LLVM’s !llvm.linker.options
. I prefer the latter as we can get it for free for targets using LLD. (Of course we still have to generate corresponding linker args for targets do not use LLD)
If we want one single .rlink
file, we can ar
the .rmeta
and .bc/o
files generated in compiling stage.
Then for the linking stage, I plan to insert code here to read the .rmeta
or .rlink
file, resolve all dependencies then reconstruct the CodegenResults, so that we can create and execute the linker.
Finally, some thoughts on LTO: once this Issue finishes, we should be able to do LTO easily when we use LLD (either directly or via clang/gcc -fuse-ld=lld
). I have successfully run LTO linking a rust crate and a llvm bitcode file generated by clang -flto -c
. However, LLD only takes uncompressed bitcode files (either by itself or residing inside a .a
file). Rust rlib only contains compressed ones, so that we would have to extract and uncompressed them before sending to LLD.
Out of curiosity, why does an rlib
contains both native object file and LLVM bitcode file? Is it because the time cost of “LLVM bitcode => native object” is too expensive? Otherwise, we only need to save LLVM bitcode in an rlib
and generate native objects when needed.
Out of curiosity, why does an rlib contains both native object file and LLVM bitcode file? Is it because the time cost of “LLVM bitcode => native object” is too expensive? Otherwise, we only need to save LLVM bitcode in an rlib and generate native objects when needed.
Generate and save the
.bc/.o/.ll/.s
file of the allocator and metadata (if needed) when user requests--emit=llvm-bc/object/llvm-ir/asm
- [I feel this should be the expected behavior instead of completely ignoring allocator and metadata. I was very confused before knowing the
.bc/.o/.ll/.s
file does not contain all code compiled from the rust source code.]Write the metadata to file when user requests
--emit=metadata
even if theOutputType
isOutputType::Exe
.
- [I also feel this should be the expected behavior.]
- Skip linking.
To minimize the impact of existing code, all changes are guarded by
-Z no-link
.
Yeah, those should probably be the default. Would you mind opening an issue to track this?
Yeah, those should probably be the default. Would you mind opening an issue to track this?
Here they are: #67292 and #67293
I also created #67294 to track a different issue found while I am working on splitting compiling and linking.
The rustc part of this has been merged. This is waiting on cargo support now.
Bundling it together in a single ar file would do unnecessary work (both IO and CPU) Object files are always written to the disk. When building an ar file, they are copied then copied to the ar file and a symtab is created (ranlib). Creating a symtab can't be avoided if you dont want to unpack the ar file again before linking, as the linker requires a symtab to be present.
Note that for large projects this overhead adds up. GNU ar
added a "thin archive" mode a while back which just stores the paths to the object files instead of copying them all in. We hand-rolled something like this for Firefox as well. It may not be as bad as the C/C++ case since rlibs don't have as many object files in them but it's worth looking into. From sccache
's perspective it'd probably be fine if the rlink
file just contained paths to the rlibs to use and maybe specific object file names or byte ranges or whatever.
Also I just wanted to clarify slightly this bit from the original comment:
Anything involving the system linker cannot be cached by sccache because it pulls in too many system dependencies.
The reason sccache can't cache compiles that invoke the system linker is because it doesn't have code to interrogate linkers about what their inputs and outputs for a specific link would be. sccache needs to be able to know all the inputs and outputs of compilation in order to generate correct cache keys and store the results in the cache. I looked into this a little bit a while ago and it's probably doable, just complex. (It'd probably wind up having to parse linker scripts from ld
, for example.)
it doesn't look like this emits an artifact notification for .rlink
files. It might also be useful if it emitted notifications for all the object files as well - otherwise a build tool which is trying to keep track of all the emitted objects would need to parse the json (presumably not a stable format at this point) to work out what they all are.
@0dvictor and I were discussing how to collect the input files. We decided the easiest way was in fact to archive them, and that we would profile the performance impact in Fuchsia to see if it was a problem.
This approach has the nice property that you know exactly how many output files there will be and what their names are, which makes integrating into existing systems easier. Given that this unblocks using Rust in our distributed build system, it should still be a win for performance.
That said, it sounds from @luser's comment like the overhead might be more than I had originally thought, and we might want support for N output files at some point. We may end up with a solution that supports both archived and non-archived outputs.
Emitting the names in artifact notifications was another possibility that came up. It seems correct to me that they would be emitted, but I haven't thought about it too carefully. We don't use these in our build system today, though we could probably teach it to use them with a bit of effort. Still, it will require more work to handle in our distributed build system, which is why we didn't jump for that option first.
Yeah, a heavyweight ar
would add a lot of unnecessary cost in extra copying. A thin archive would be fine though.
Is there a plan to stabilize/document the format of .rlink
files? It looks like it has everything needed to directly generate a linker line, which might be preferable for us.
rustc -Zlink-only thing.rlink
deletes its input files as part of its execution. That seems... strange.
-Zlink-only
invokes the linker code like normal. Only when -Csave-temps
is used, does the linker code not cleanup object files.
I don't think we want to stabilize the rlink format anytime soon. It's basically a JSON representation of an internal Rust data structure.
Thin archives or json directives seem like a better bet. Or a distributed build worker can scan the output directory for any new files and send them to the client, rsync-style.
@bjorn3 Sure, but they're not really internal tmp files in the no-link
/link-only
flow - they're outputs of one build step and inputs of the next, and build steps don't generally trash their inputs. You'd expect to be able to re-run link-only
without needing to redo no-link
.
Consider, for example, where the input files are paths into a shared cache where deleting them removes them from or corrupts the cache. The current behaviour would require them to be copied to avoid this.
@tmandry Nope to scanning a dir, but a thin archive is a good starting point.
However, I was interested in the other info in that file, like the paths to other libraries which need to be linked in - though TBH I'm not sure that's the best way to receive that information.
Sure, but they're not really internal tmp files in the no-link/link-only flow - they're outputs of one build step and inputs of the next, and build steps don't generally trash their inputs.
After the linking phase, when there is no shared cache, you will likely want to delete those intermediate files. Those intermediate files don't have predictable names, so cargo or other programs won't be able to remove them. Simply searching for .o
files won't work, as other kinds of files are created and new kinds may be created in the future.
they're outputs of one build step and inputs of the next
It is very unlikely that you want to run either of the parts without the other. If any input file is changed, you need to run -Zno-link
to handle the changes. If you run -Zno-link
, you will also want to run -Zlink-only
to perform the linking. If you didn't, you could just have used --emit metadata
instead like cargo check
does. The only reason -Zno-link
and -Zlink-only
are separated, is because you would otherwise need to notify rustc that all dependencies are fully built.
However, I was interested in the other info in that file, like the paths to other libraries which need to be linked in - though TBH I'm not sure that's the best way to receive that information.
Use -Zbinary-dep-depinfo
to put all crate dependencies in the --emit dep-info
file.
@bjorn3 In a distributed build environment, the no-link
and link-only
steps are likely happening on different machines, with the intermediates being copied between them (or via some shared content store).
After the linking phase, when there is no shared cache, you will likely want to delete those intermediate files.
No, they're still useful for subsequent links if you're rebuilding the target executable because something else (eg non-Rust) changed.
You may just want to generate multiple versions of the same executable from the same inputs (eg built with different linker scripts, or static vs dynamic for system libraries).
In a distributed build environment, the no-link and link-only steps are likely happening on different machines, with the intermediates being copied between them (or via some shared content store).
Many of the paths in the .rmeta
file are absolute, so you will need to copy everything to the exact same location on the other machine. Also you don't know what files you have to copy. Even with -Zbinary-dep-depinfo
the names of the intermediate files are not mentioned in neither the depinfo nor the artifact messages, and they are merely an implementation detail. In theory rustc could even put those intermediate files in /tmp
. (For example if it detects that /tmp
is a tmpfs
, and the files fit in memory)
No, they're still useful for subsequent links if you're rebuilding the target executable because something else (eg non-Rust) changed.
You may just want to generate multiple versions of the same executable from the same inputs (eg built with different linker scripts, or static vs dynamic for system libraries).
When changing linker flags (this includes static vs dynamic for system libraries), you will need to call -Zno-link
again to generate a new .rlink
file with those linker flags.
We decided the easiest way was in fact to archive them, and that we would profile the performance impact in Fuchsia to see if it was a problem.
I'm generally in favor of implementing the straightforward thing and profiling to see if it's actually a problem. I just wanted you to be aware of existing work in this area.
Many of the paths in the
.rmeta
file are absolute, so you will need to copy everything to the exact same location on the other machine. Also you don't know what files you have to copy. Even with-Zbinary-dep-depinfo
the names of the intermediate files are not mentioned in neither the depinfo nor the artifact messages, and they are merely an implementation detail.
Given that sccache
was mentioned in the initial comment here, if this is going to be cache-friendly we do need to have a way to know what the full set of inputs and outputs will be for a specific command otherwise we can't reliably store a result in cache that we can use later.
@bjorn3 I see this work as a good start, but an intermediate step to:
Step 1 is necessary because we need to be able to extract the path info, and rewrite it to remove/restructure absolute paths. The whole value in splitting linking is so we can run compilation of bin/dylib/procmacro crates and their linkage as separate build steps. "Splitting" them but then still requiring them to be run 1:1 on the same machine doesn't help all that much.
Step 2 is necessary so we can avoid using static
and cdylib
crate types as they're painful to use in hybrid Rust/C++ programs where main()
is not Rust (ie, we want to put the path to libstd on the final link line, not link it in multiple times for each C++ -> Rust dependency)..
- the .rlink is a well-defined file with a documented schema
This will never happen. There are many implementation details written to .rlink
. For example:
[...]
"compiler_builtins" : 3,
[...]
"missing_lang_items" : {
"14" : [],
"8" : [
"EhPersonalityLangItem"
],
[...]
},
"panic_runtime" : 14,
"profiler_runtime" : null,
[...]
"metadata_module" : null,
"allocator_module" : {
"bytecode" : null,
"name" : "33dyzt1ekirinwy8",
"kind" : "Allocator",
"object" : "rust_out.33dyzt1ekirinwy8.rcgu.o",
"bytecode_compressed" : null
},
[...]
"modules" : [
{
"name" : "rust_out.7rcbfp3g-cgu.0",
"bytecode" : null,
"object" : "rust_out.rust_out.7rcbfp3g-cgu.0.rcgu.o",
"bytecode_compressed" : null,
"kind" : "Regular"
},
[...]
compiler_builtins
is an implementation detail and could technically be merged with libcore
. Lang items are an implementation detail. panic_runtime
and profiler_runtime
are implementation details and could maybe in the future be replaced with normal deps. The exact things known about modules could change. For example bytecode_compressed
may be removed once bytecode for LTO is no longer emitted alongside regular object files. metadata_module
and allocator_module
may in the future be folded into modules
. ... (note: all things except for bytecode_compressed
are purely hypothetical)
Step 1 is necessary because we need to be able to extract the path info, and rewrite it to remove/restructure absolute paths. The whole value in splitting linking is so we can run compilation of bin/dylib/procmacro crates and their linkage as separate build steps. "Splitting" them but then still requiring them to be run 1:1 on the same machine doesn't help all that much.
The splitting is mostly meant to be able to perform pipelined compilation on the final artifact, not to perform it codegen and linking on different machines. What is even the benefit of performing codegen and linking on different machines by the way? It doesn't increase parallelism, as the linking has to wait on codegen anyway.
Step 2 is necessary so we can avoid using static and cdylib crate types as they're painful to use in hybrid Rust/C++ programs where main() is not Rust (ie, we want to put the path to libstd on the final link line, not link it in multiple times for each C++ -> Rust dependency)..
That would require making the linker arguments stable, as changes could break such thing. Also why not create one rust library that depends on all rust dependencies and then depend on that rust library from C++?
The splitting is mostly meant to be able to perform pipelined compilation on the final artifact, not to perform it codegen and linking on different machines. What is even the benefit of performing codegen and linking on different machines by the way? It doesn't increase parallelism, as the linking has to wait on codegen anyway.
Sure, but from our point of view we want to break the build up into atomic actions with well-defined inputs and outputs and then be able to freely schedule them across a build cluster. I don't want to have to treat Rust build + link as a special case - adding a constraint that they have to execute on the same machine would make it much harder to schedule.
But I think if we can use a thin ar to logically bundle them all up, it will be managable.
This will never happen. There are many implementation details written to .rlink. For example:
OK, so I think there's too much stuff in the .rlink
file then. If we assume that the no-link
and link-only
invocations are passed an identical set of flags (aside from the -Z
option itself), then all the other details are regeneratable and don't need to be in the .rlink
. The .rgcu.o
files are the only thing that's uniquely useful to pass from the first invocation to the second.
Also why not create one rust library that depends on all rust dependencies and then depend on that rust library from C++?
That doesn't scale. There could be hundreds of C++ libraries linked in, any of which could be using some combination of Rust libraries.
then all the other details are regeneratable and don't need to be in the .rlink
No, link args can come from #[link]
attributes on extern
blocks. Because of proc-macros those can be generated non-deterministically.
No, link args can come from #[link] attributes on extern blocks. Because of proc-macros those can be generated non-deterministically.
Fair enough - they can be encoded in the .rlink
too. But as they won't work in our environment we'll want to ignore them or complain if they're present (that would effectively be a proc-macro making up a dependency the build system doesn't know about which is a big nono - and if the build system does know about it, we don't need the #[link]
).
What is even the benefit of performing codegen and linking on different machines by the way? It doesn't increase parallelism, as the linking has to wait on codegen anyway.
Linking, in theory, depends on a lot more artifacts than codegen does. Codegen should only require source code and the rmeta files from any crates you depend on. Linking requires all the generated code. In our sccache-like environment, this would mean uploading many rlib files and possibly system libraries to the worker. Network bandwidth becomes a bottleneck. So it's much better to send compile steps to the workers, hitting cache when possible, and do linking locally.
That would require making the linker arguments stable, as changes could break such thing.
Link args don't need to be stable, just the file format which contains them. I don't think the fact that the file contains _references_ to implementation details like compiler-builtins is a problem, actually. As long as those details can change without changing the _schema_, a well-written tool should be able to consume them without breakage.
That said, stabilizing rlink seems more ambitious than having a rustc option which spits out the final linker line, allowing you to run it yourself.
Apologize for my such late reply.
After experimenting the archiving approach (using ar
) to collect the linker's input files, I found there are several drawbacks, which made me believe ar
is not the correct approach:
.a
file requires extra storage, CPU, and IO resourcescrate-type
is a rlib
or staticlib
, the .a
file must be extracted before passing to the linkerlink.exe
supports itSimilarly, simply archiving the files with tar
or zip
or even somehow embedding them into a .rlink
file would not be ideal either.
Therefore, I am experimenting a different approach: after finishing LLVM's optimizations, linking all CGUs into a combined one using llvm::Linker
; so that only one .o
file is generated for the crate. (Of course, there will be potentially another two .o
files: the allocator and the metadata). Therefore, the distributed build system only has to deal with a known number of .o
files: one for the main crate, zero or one for the allocator, and zero or one for the metadata. If -Z human-readable-cgu-names
is also enabled, these .o
files can be easily identified.
The main benefits comparing to the archiving approach would be:
rustc
's peak memory usagellvm::Linker
is expected to use much less memory than optimizations; therefore, performing linking after optimizations is unlikely to exceed the memory footprint during optimization stage.--emit [llvm-bc|llvm-ir|asm|obj]
WDYT?
Therefore, I am experimenting a different approach: after finishing LLVM's optimizations, linking all CGUs into a combined one using llvm::Linker
That won't work for non LLVM based backends.
Linking with llvm::Linker is expected to use much less memory than optimizations; therefore, performing linking after optimizations is unlikely to exceed the memory footprint during optimization stage.
What about a crate with many codegen units? During optimizations only a few codegen units are in memory at any time, while during linking with llvm::Linker
, all of them have to be in memory. Also llvm::Linker
links bitcode I believe, which means that all passes after the user facing LLVM-ir are forced to be done in a single thread after the linking. Especially in debug mode I expect this to be a significant performance regression.
That won't work for non LLVM based backends.
Correct, but we can implement similar, if not the same, way to combine the CGUs for that non-LLVM backend.
What about a crate with many codegen units? During optimizations only a few codegen units are in memory at any time, while during linking with llvm::Linker, all of them have to be in memory. Also llvm::Linker links bitcode I believe, which means that all passes after the user facing LLVM-ir are forced to be done in a single thread after the linking. Especially in debug mode I expect this to be a significant performance regression.
Great points. I should've been clear that this proposed feature would be guarded by an option, say -Z combine-cgus
, and even if the feature is stablized, user should be able to turn if on/off; so that user is able to pick the best behavior that fits their usecases. In other words, the new feature should not introduce regressions to existing workloads.
Regardless of splitting out the linker invocation, Being able to generate a single or defined number of .o
files eases many tasks. The distributed compiling tool is one example: the tool would not have to parse the .rlink
and/or other files to figure out the undefined number of files need be copied from the server to the client. This also means rustc
does not have to provide a stable .rlink
file format.
Other examples include tasks that run with --emit [llvm-bc|llvm-ir|asm|obj]
. As these workloads require a single .[bc|ll|asm|o]
output file, currently they are done in a single thread. Enabling parallelism for these tasks should yield a good performance boost.
Can llvm::Linker
not link both bitcode and object code? IIUC, it would indeed regress performance quite a bit in that case, since LLVM IR -> Object code is one of the slowest compilation steps.
@0dvictor, can we compare the build time using the flag in your working branch versus -Ccodegen-units=1
, versus a baseline? That would help indicate if this flag is worth maintaining.
Can llvm::Linker not link both bitcode and object code? IIUC, it would indeed regress performance quite a bit in that case, since LLVM IR -> Object code is one of the slowest compilation steps.
Unfortunately, llvm:Linker
is only for linking bitcode.
can we compare the build time using the flag in your working branch versus -Ccodegen-units=1, versus a baseline? That would help indicate if this flag is worth maintaining.
Good idea, let me do that.
@jsgf The project I'm working on seems to share similar properties to yours:
.rlib
s scattered throughout a very deep dependency tree, which may eventually roll up into one or multiple binariesWe can't:
rustc
for final linking. C++ is the boss in our codebase; we're not ready to make the commitment to put Rust in charge of our final linking..rlib
s. This works if we're using Rust in only one place. For any binary containing several Rust subsystems, there would be binary bloat and often violations of the one-definition-rule.rlib
s don't know what final binaries they'll end up in; and the build directives for the final binaries don't know what .rlib
s they've pulled in from deep in their dependency tree. Our build system forbids that sort of global knowledge, or it would be too slow across so many targets.rlib
s. That monster static library would depend on many C++ symbols, so each binary would become huge (and in fact not actually link properly)Therefore we want to link rlib
s directly into our final linker invocation. This is, in fact, what we're doing, but we have to add some magic:
.rlib
s, which would be easy apart from the fact they contain the symbol metadata hash in their names.__rust_alloc
to __rdl_alloc
etc.rustc
-driven linker invocation might add extra magic.Therefore one of the solutions you and @tmandry suggest would be awesome: either stabilizing .rlink
(which sounds unlikely) or having some official way to build a linker command line which is not under the control of rustc (-Zbinary-dep-depinfo
helps a bit).
I think, though, that this request is a little bit orthogonal to this issue. I wonder if we should submit a new issue? I think it's quite a big request.
That would be useful and it's also orthogonal. A new issue sounds like a
good idea.
On Fri, May 1, 2020 at 4:17 PM adetaylor notifications@github.com wrote:
@jsgf https://github.com/jsgf The project I'm working on seems to share
similar properties to yours:
- Using a non-Cargo build system based on static dependency resolution
(and has 20000+ targets)- Final linking performed by an existing C++ toolchain
- A few Rust .rlibs scattered throughout a very deep dependency tree,
which may eventually roll up into one or multiple binariesWe can't:
- Switch from our existing linker to rustc for final linking. C++ is
the boss in our codebase; we're not ready to make the commitment to put
Rust in charge of our final linking.- Create a Rust static library for each of our .rlibs. This works if
we're using Rust in only one place. For any binary containing several Rust
subsystems, there would be binary bloat and often violations of the
one-definition-rule- Create a Rust static library for each of our output binaries. The
build directives for the Rust .rlibs don't know what final binaries
they'll end up in; and the build directives for the final binaries don't
know what .rlibs they've pulled in from deep in their dependency tree.
Our build system forbids that sort of global knowledge, or it would be too
slow across so many targets.- Create a single Rust static library containing all our Rust .rlibs.
That monster static library would depend on many C++ symbols, so each
binary would become huge (and in fact not actually link properly)Therefore we want to link rlibs directly into our final linker
invocation. This is, in fact, what we're doing, but we have to add some
magic:
- The final C++ linker needs to pull in all the Rust stdlib .rlibs,
which would be easy apart from the fact they contain the symbol metadata
hash in their names.- We need to remap __rust_alloc to __rdl_alloc etc.
- In future the final rustc-driven linker invocation might add extra
magic.Therefore one of the solutions you and @tmandry
https://github.com/tmandry suggest would be awesome: either stabilizing
.rlink (which sounds unlikely) or having some official way to build a
linker command line which is not under the control of rustc (
-Zbinary-dep-depinfo helps a bit).I think, though, that this request is a little bit orthogonal to this
issue. I wonder if we should submit a new issue? I think it's quite a big
request.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/64191#issuecomment-622607220,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AARMYYFU42EPAMD5YNWZ5XLRPNJ77ANCNFSM4IUB7HWA
.
This issue came up in a discussion about @dtolnay's cxx which wants to be able to describe callback relationships between Rust and C++ code - eg Rust calls C++, which then calls back into Rust. If you're using linker symbols to resolve these calls (vs runtime indirect function calls via pointers), then that effectively means you have a rust.o and cxx.o with mutual references.
AFAIK the only way to correctly link this is with something like --start-group rust.o cxx.o --end-group
on the linker command line. If rustc
is driving the linking process, then it would need to understand this concept, which I think it out of scope for it. Alternatively, if the build system understands these kinds of relationships, then it already knows how to invoke the linker this way. (Moot for Cargo, since it doesn't have a way to model C++ dependencies on Rust, and certainly not cyclic ones.)
I'm interested in the outcome of this. Is it currently in a state where cargo could potentially implement a useful subset of it? Or is there further work needed on the final link step before cargo can do anything?
@joshtriplett Kernel?
I apologize for the long delay in continuing this work. PR #75094 is created to allow generating a single object file for a crate.
Performance was measured by compiling rustc
itself:
| | -Ccodegen-units=1
| [DEFAULT] | -Zcombine-cgu
|
| --- | ---: | ---: | ---: |
| raw comp time (second) | 987 | 294 | 303 |
| relative perf to -Ccodegen-units=1
(%) | 100 | 29.79 | 30.70 |
| relative perf to default (%) | 335.71 | 100 | 103.06 |
| relative perf to -Zcombine-cgu
(%) | 325.74 | 97.03 | 100 |
Though -Zcombine-cgu
is around 3% slower than the default configuration, it is still a significant improvement comparing to the single threaded case, which would significantly improve the performance of --emit [llvm-bc|llvm-ir|asm|obj]
.
Most helpful comment
I don't disagree that y'all's sorts of projects don't want to use the rustc-baked-in LTO, but I don't think we can remove it because many other projects do use it (and rightfully want to). Also this is still just a sort of high-level concept, but if a lot of feature requests are piled onto this it's unfortunately unlikely to happen.