Cargo: How to effectively clean target folder for CI caching

Created on 13 Aug 2018  Â·  12Comments  Â·  Source: rust-lang/cargo

I have a workspace repository with several crates in it (> 10). A clean build of all of them takes about 10 minutes on Travis, which is why I want to cache the target/ folder.

The problem is, the target folder gets quite big (~ 1.7GB) so it takes also about 3 minutes to upload the cache to S3 after the build.

The question is: How can I clean the target folder from any artifacts generated by my own code?

If I can achieve that, then the target folder would only have the artifacts of all the dependencies in it. As long as they don't change, Travis would not have to re-upload the cache. At the same time, rebuilding only the workspace crates takes only 30 seconds.

I have already tried several things:

  • cargo clean -p for every workspace package
  • Delete all files in target that mention a workspace crate's name

None of the above were enough in order to get the target folder into a state where NOTHING changes between two builds.

I couldn't really understand the layout of the target folder: The artifacts of dependencies seem to be mixed up with those of the workspace crates. Is there some documentation available on how the target folder is structured?

A-caching C-feature-request

Most helpful comment

I don't know how the files are currently laid out, so my opinion should come with a big helping of salt, it would be lovely to have the rust version in the path for artifacts that can only be used by that version of rust. That would make it straightforward to make a GC that dells artifacts that are for a version of rust that are no longer installed. This is not mutually exclusive with other information being encoded in the path, or stored in some other format (like a timestamp file).

All 12 comments

Thanks for the report! Right now there's not a great answer here in that Cargo doesn't have anything implemented to do something like this nor does it internally support the ability to know which artifacts are super old.

Currently I'd recommend using something like sccache for CI instead of caching the entire target folder in the meantime, but work on this would definitely be appreciated!

Thanks for the quick answer!
We are using sccache now which helps already quite a bit!

For posterity: in my .travis.ci, in before_cache, I try to explicitly delete artifacts for current workspace, while keeping artifacts from the deps intact:

https://github.com/rust-analyzer/rust-analyzer/blob/9a7db8fa009c612168ef16f6ed72315b5406ed09/.travis.yml#L2-L4

This seems to greatly speed-up CI builds. I wonder if we can make caching easier by adding more structure to the target dir? If, for example, we group artifacts into folders based on source_id, then it should be easy to purge anything not from crates.io.

Thanks for sharing! l I've tried something like this already.

The problem is that our project is a workspace and things are not that easy
then. Many crates depend on each other and thus cause many workspace
related artifacts in target/ - impossible to delete so that the cache is
not poisoned.

For now, we use sccache with the local directory cache and only put that
into the travis cache. We don't cache the target folder at all.

Works reasonably well so far.

On Thu, 25 Oct 2018, 03:14 Aleksey Kladov, notifications@github.com wrote:

For posterity: in my .travis.ci, in before_cache, I try to explicitly
delete artifacts for current workspace, while keeping artifacts from the
deps intact:

https://github.com/rust-analyzer/rust-analyzer/blob/9a7db8fa009c612168ef16f6ed72315b5406ed09/.travis.yml#L2-L4

This seems to greatly speed-up CI builds. I wonder if we can make caching
easier by adding more structure to the target dir? If, for example, we
group artifacts into folders based on source_id, then it should be easy to
purge anything not from crates.io.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/cargo/issues/5885#issuecomment-432723546,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFO3NbwV0lCzFfeQotYWY3Vl6gmY7znZks5uoJHUgaJpZM4V6UfN
.

So, I've been using the rm ./targetd/debug/deps/changing-stuff trick with rust-analyzer for the past couple of months, and it really makes the huge difference. Here's how the typical build looks like:

https://travis-ci.org/rust-analyzer/rust-analyzer/jobs/463540847

One interesting bit is compilation:

$ cargo test
   Compiling test_utils v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/test_utils)
   Compiling tools v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/tools)
   Compiling ra_syntax v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_syntax)
   Compiling gen_lsp_server v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/gen_lsp_server)
   Compiling ra_editor v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_editor)
   Compiling ra_db v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_db)
   Compiling ra_cli v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_cli)
   Compiling ra_hir v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_hir)
   Compiling ra_analysis v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_analysis)
   Compiling ra_lsp_server v0.1.0 (/home/travis/build/rust-analyzer/rust-analyzer/crates/ra_lsp_server)
warning: unused `#[macro_use]` import
 --> crates/ra_lsp_server/src/main.rs:5:1
  |
5 | #[macro_use]
  | ^^^^^^^^^^^^
  |
  = note: #[warn(unused_imports)] on by default

warning: unused `#[macro_use]` import
 --> crates/ra_lsp_server/src/main.rs:5:1
  |
5 | #[macro_use]
  | ^^^^^^^^^^^^
  |
  = note: #[warn(unused_imports)] on by default

    Finished dev [unoptimized + debuginfo] target(s) in 1m 32s

This is awesome, considering the fact that rust-analyzer has 145 dependencies, two of which are syns.

Another interesting bit is cache uploading:

before_cache.1
$ find ./target/debug -type f -maxdepth 1 -delete
find: warning: you have specified the -maxdepth option after a non-option argument -type, but options are not positional (-maxdepth affects tests specified before it as well as those specified after it).  Please specify options before other arguments.
before_cache.2
$ rm -fr ./target/debug/{deps,.fingerprint}/{*ra_*,*test*,*tools*,*gen_lsp*}
before_cache.3
$ rm -f  ./target/.rustc_info.json
cache.2
store build cache
nothing changed, not updating cache (yay!)

I think such caching strategy may make a huge difference for project which build/gate on stable (nightly is trickier, because cache will die with the next nightly).

In terms of cost/benefit, I think adding a small help from the Cargo's side (dumping all deps from crates.io source to a separate cachable dir) will be very welcome here.

cc @rust-lang/cargo: perhaps I am overly enthusiastic here, but it does look like a low-hanging watermelon :)

I don't think there's any disagreement that Cargo can do a lot here, just needs someone to help push through a design!

Made a rough POC here: https://github.com/matklad/cargo/commit/b3566b168ee035c1d0652210e9125ae1aa4bece0

It was harder then anticipated, mainly because we have three different dirs we need to account for: .fingerprint, deps and build. The POC deals only with the first two, by introducing .fingerprint/crates-io and deps/crates-io for dependencies which never change. It successfully avoids rebuilding crates.io dependencies, which do not have build scripts.

The real implementation should probably transpose the order of directories:

target/
  debug/
    .fingerprint/
    deps/
    build/
    cacheable/
      .fingerprint/
      deps/
      build/

so that you can point CI cache to target/debug/cacheable and be done with it. Probably, the cacheable should even come before profile/target_triple.

However, with this design, if we then allow overriding the location of cacheable dir via env_var, we pretty much get the "share common dependencies across projects" behavior, which is also a sought for feature..

Does this also account/work for workspaces? We have had the biggest problems with workspaces because local crates depend on each other and that causes a weird structure inside the target folder.

Nice that this is being worked on! :)

I don't know how the files are currently laid out, so my opinion should come with a big helping of salt, it would be lovely to have the rust version in the path for artifacts that can only be used by that version of rust. That would make it straightforward to make a GC that dells artifacts that are for a version of rust that are no longer installed. This is not mutually exclusive with other information being encoded in the path, or stored in some other format (like a timestamp file).

cc crates.io having trouble with this.

Found something that might be useful in this regard:

https://github.com/ustulation/cargo-prune

So, we at clap tried to optimize the cache for CI since we kept making the cache bigger every time we ran on Travis. We took inspiration from rust-analyzer and ended up with the following: https://github.com/clap-rs/clap/pull/1658/files#diff-354f30a63fb0907d4ad57269548329e3R5-R16

It has multiple packages and also has trybuild tests. I am not sure if I was overzealous a bit. But I thought you guys might want to know.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

briansmith picture briansmith  Â·  3Comments

alilleybrinker picture alilleybrinker  Â·  3Comments

dotnetspec picture dotnetspec  Â·  3Comments

nox picture nox  Â·  3Comments

japaric picture japaric  Â·  3Comments