Couldn't find a previous issue on this, so I'd like to open a tracking issue for this. We've known this for a long time, but the metadata format for the compiler is _far_ too large and there are surely methods to shrink its size and impact. Today when I compile librustc, I get the following numbers:
librustc.rlib - 64MBrustc.o - 12MBrust.metadata.bin - 32MBrustc.0.bytecode.deflate - 21MBThis means that the metadata is three times as large as the code we're generating. Another statistic is that 36% of the binary data of the nightly is metadata.
There are, however, a number of competing concerns around metadata:
There are a few open issues on this already, but none of them are necessarily a silver bullet. Here's a smattering of wishlist ideas or various strategies.
More will likely be added to this over time as it's a metabug.
For your information, I had written a custom metadata decoder in Python for debugging #15309 (the table is a bit out of date, but still works), and out of curiosity I've also made some basic efforts to reduce the metadata overhead. The rewrite code does the job, and spews the following outputs for 2015-01-20 nightly's lib/libstd-4e7c5e5c.so:
# original compressed size, the total size of given binary (*.so).
2117863 5081718
# uncompressed size of various encoding strategies.
# - orig: the original (unaltered)
# - relax: optimized size fields. the original metadata uses lots of
# 4-byte-long sizes even when less bytes are sufficient;
# reclaiming them will require some works.
# - no-label: relax + no `Label` node. used for debugging purpose but
# never disabled afaik.
# - size-elide: relax + one-byte tag. all tags are <0x100, so ignoring EBML
# (requires 2-byte encoding for tags >=0x80) gives some gains.
# also do not add sizes for known fixed-size tags (e.g. `U64`).
# - size-elide-2-or-4: one-byte tag + another relaxation strategy.
# uses different size encoding algorithm: 2 bytes (big endian)
# for sizes <0x8000, 4 bytes with MSB set otherwise.
# trade-off between size and performance.
# - size-elide-4: one-byte tag + fixed 4-byte-long size.
orig 16084126 relax 13577526 no-label 8654851 size-elide 13019868 size-elide-2-or-4 14418657 size-elide-4 17563943
# recompressed (zlib -9) size of above.
# note that the original compressed size is *not* optimal.
orig 2087004 relax 1991335 no-label 1747192 size-elide 1966400 size-elide-2-or-4 2014455 size-elide-4 2123731
Wow, those are some nice wins! I had no idea that existed! If we could implement some of those optimizations today that would be awesome.
@alexcrichton Does any breaking modification to metadata need a new snapshot? I'm afraid if such modifications cannot be easily done incrementally.
Thankfully you shouldn't need a snapshot, the stage N compiler conveniently only ever reads metadata generated by itself so there's no bootstrapping issues.
storing metadata uncompressed in rlibs to allow LLVM to mmap it directly into the address space and page it in for reading.
Is this true? I was under the impression that:
@michaelwoerister Rustc uses LLVM's memory map abstraction to mmap the executable. LLVM itself does not use the metadata.
@lifthrasiir Oh, that refers to this: https://github.com/rust-lang/rust/blob/master/src/librustc/metadata/loader.rs#L270. All clear now :)
I'm currently working on two temporary but public branches:
compact-metadata)metadata-reform), which is intended to be rebased after the formerAny suggestions or patches would be appreciated.
Given https://github.com/rust-lang/rust/pull/22971 was merged, is this fixed? It's hard to tell from
Fixes #2743, #9303 (partially) and #21482.
which implies the first and last were total, and the middle one was partial?
@steveklabnik #2743 is fully fixed. I think I've said #9303 is fixed partially because it does not really fix the naming issue ("Rename it from ebml to atom_trees, change all internal naming"), but in retrospect you can safely close that. I guess this metabug needs to be open since the metadata reduction is an ongoing work (my PR was a sum of low-hanging fruits) and we probably need a central place to discuss that.
Some updates pertaining to this issue:
#[inline]d functions are no longer stored as ASTs - The metadata of libcore was more than halved!rustup update saw quite an improvement:
info: downloading component 'rustc'
38.2 MiB / 38.2 MiB (100 %) 1.8 MiB/s ETA: 0 s
info: downloading component 'rust-std'
46.1 MiB / 46.1 MiB (100 %) 1.8 MiB/s ETA: 0 s
Before, it looked like this:
info: downloading component 'rustc'
49.0 MiB / 49.0 MiB (100 %) 1.8 MiB/s ETA: 0 s
info: downloading component 'rust-std'
61.9 MiB / 61.9 MiB (100 %) 1.8 MiB/s ETA: 0 s
So, at what point is metadata small enough that this bug can be considered fixed?
1 byte, tops
This is a super old and much less relevant issue now, so closing.
Most helpful comment
1 byte, tops