Bazel: Bazel leaves behind too many old files in "install"

Created on 18 Nov 2016  Â·  25Comments  Â·  Source: bazelbuild/bazel

$ du -sh /private/var/tmp/_bazel_camillol/install
2.3G /private/var/tmp/_bazel_camillol/install

There are 50 directories in there. The oldest dates back to November 24, 2015; the newest is from September 1, 2016. May 12 has five different folders.

Even though I ran "blaze clean" in all of my blaze clients, lots of these folders are left behind. They should be cleaned up somehow.

P2 help wanted team-Local-Exec feature request

Most helpful comment

I just stumbled on this issue and would like to vote in favor of what @camillol is saying: Bazel created the garbage, so it's Bazel's responsibility to clean it up automatically. These whole "installation directories" are a very strange concept after all, so when Bazel abandons them, Bazel has to destroy them. And as @damienmg says, the current behavior is far from a great user experience.

All 25 comments

bazel clean removes output files, bazel clean --expunge removes the who output base (which contains the install directory). See https://bazel.build/versions/master/docs/bazel-user-manual.html#clean for more info.

The output base does not contain the install directory, but only a symlink to it. The actual install directory is not removed by bazel clean --expunge.

Oh, that's true. I guess we could add an option, but you can just delete the directories, too.

An option doesn't really help. If you know to use the option, you know enough to delete the directories manually. The problem is that we leave behind a separate installation directory for every single build of Blaze that the user has ever used, and they never get cleaned up, until the user starts running low on space and goes looking for things to delete. We should not burden the user with that; we should just clean up old installations periodically.

That doesn't seem like Bazel's responsibility: if you wanted you could put the bazel directories on a filesystem that will delete things that haven't been used in a while. "We now slow down your build to delete some files taking up a little disk space" doesn't seem like a tradeoff most developers would want to make.

Is it safe to delete the entire user tree, i.e. $ rm -rf /private/var/tmp/_bazel_camillol? Mine is 14G and my disk is 99% full. This is kindof an issue with people that are working with bazel with multiple repos and not a monorepo.

It's perfectly safe to delete the install directories (as long as you aren't running Bazel in parallel). Bazel just re-creates them on the next run if necessary. Deleting the entire _bazel_ tree is also safe, but will cause Bazel to rebuild everything on the next run, and the bazel-* symlinks will all be dead links.

@kchodorow: in general, ensuring that temporary files get deleted is the responsibility of whoever creates them. If this were a 10 MB cache, you could say "eh, whatever, it's not going to hurt to just leave it there", but dumping 14 GB of old temporary files on @pcj's disk is not a reasonable thing to do.

We don't need to slow down builds at all, either. Bazel runs as a daemon, it can easily do the cleanup as an idle task when it's not building anything.

BTW, if you are not seeing this problem on your own machine it's probably because your company has set up a cron job to clean up old bazel directories automatically. (Which could, in theory, slow down your build if it runs at the same time as bazel... have you ever noticed that issue?) But this is really a responsibility that Bazel should take on, so that it works by default for everyone.

+1 to this. ran into the same thing recently myself - bazel had used 18GB of my disk for it's caching, on a vm with 60GB - which caused me to run out of space and go hunting for my gigabytes.

If bazel is going to cache GBs of files, it should be responsible for doing some basic tracking of their usage and deleting them when they haven't been accessed in a while. I don't mind giving a few GB to bazel to use as a cache, but it needs to be respectful of my disk and not cause me to run out of space.

@camillol I disagree. Google has a separate tool that basically takes care of this problem for you, which is why it isn't built into Bazel. Bazel is huge and complicated, we'd like to keep it focused on doing one thing (building) well.

That's just the thing, though. Dumping tens of gigabytes of stale temp
files in the course of normal operation is not doing things "well".
And Bazel is an open source project. We can't say "well, to use Bazel
properly you need to get a copy of this internal tool, which we haven't
released, and install it".
Even if it were released, it still doesn't make sense for a Blaze
installation to behave incorrectly by default, and to require the
installation of a separate program to clean up after it. Things should work
out of the box. Defaults should be reasonable. This is a basic product
excellence issue.
2016年12月9日(金) 8:30 Kristina notifications@github.com:

@camillol https://github.com/camillol I disagree. Google has a separate
tool that basically takes care of this problem for you, which is why it
isn't built into Bazel. Bazel is huge and complicated, we'd like to keep it
focused on doing one thing (building) well.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/2109#issuecomment-266057379,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAL-sC1Bl5LoZ8rcDGeGB4vty8SVuTfYks5rGYI4gaJpZM4K2ewX
.

I think this should be done. Either having the launcher clean the install dir or have the installer install such a service. Probably better to have this simpe code has part of the launcher.

I agree with @camillol that the defaults of bazel should give an awesome user experience and this is not it.

I just stumbled on this issue and would like to vote in favor of what @camillol is saying: Bazel created the garbage, so it's Bazel's responsibility to clean it up automatically. These whole "installation directories" are a very strange concept after all, so when Bazel abandons them, Bazel has to destroy them. And as @damienmg says, the current behavior is far from a great user experience.

Gentle ping. I keep pruning it, but it grows back quickly. This is mostly within the last two months:

$ du -sh /private/var/tmp/_bazel_camillol/install/
883M /private/var/tmp/_bazel_camillol/install/

I would suggest reclassifying this from "feature request" to "bug".

I'd also like to bump this ... our build agents have output bases of over 100G pretty quickly. Some notion of being more careful about leaving behind garbage would be great.

@jgavris The output base is different than the install base. The output base is specific to your project and you are responsible for getting rid of it via bazel clean if you want to (it's your data, so Bazel shouldn't get rid of it automatically). The install base is what's an artifact of how Bazel works today... and I think it's pretty hard to pile up 100GB of such data...

@jmmv my bad ... you're right. We actually quickly hit 100GB of artifacts in CI in one day doing about 10 variant builds of a medium / large codebase (debug and release configs for 5 different architectures).

What's the actual proposal here? I suppose it should still be possible to use several bazel versions on one machine without having to extract them on each new invocation.

one option is to handle this in https://github.com/philwo/bazelisk and look at the time stamp of the install base

I have an old design document that addressed this (which was internal only because it contained some non-public numbers IIRC). I didn't bother to externalize it because many of the comments suggested to go for a much simpler approach. I'll try to distill those proposals and add them here.

Alright. I won't bother much with my proposal because it was rather long and wasn't well received. (The summary is that it was about creating lock files and using advisory locks on them on each install base, and then some complex logic getting the locks to determine which install bases were not in use. Overkill and many of the corner cases wouldn't be regularly exercised, thus rendering a fragile algorithm.)

An alternative proposal from @ulfjack had better reception and it goes like this. I don't think it deserves a full-blown design document as it's quite straightforward. Quoting more or less verbatim:


  1. On startup, touch a file last-used under the install base (say, in the client).
  2. In the server, list install bases and get mtime of install-base/last-used.
  3. If mtime is old (say >1 week), rename install base atomically by appending .delete (if you're uncomfortable comparing mtime with system time, we can get the mtime of last-used of the currently running server to compare with).
  4. If there are any *.delete directories, delete them; no need to block shutdown: if we get killed / shut down too quickly we'll just pick up on the next run.

Cons:

  • There is a chance that we could end up deleting an install base underneath a running Bazel that has been running for more than a week (not just an idle server, but an active command), but it seems sufficiently unlikely.
  • mtime may be wrong.
  • File system may be on the network.
  • Race between 2 and 3 could cause a binary to use an install dir that's about to get deleted.

Looks like, on the server side, we can probably put all this logic on its own BlazeModule so as to not to pollute any core logic. Thus it might be nice if even 1. happens on the server as well.

Someone else asked for a flag to disable this feature in case we really have a server that is alive for more than a week. I really don't like adding a long-term flag just for this: we can fix this problem by having a thread that touches last-used once an hour to cope with this case. However, we should have a flag to disable this feature in case it causes trouble during initial rollout; but I'd like it removed after a release cycle or so.

I am looking at this - I will deal with signing the contributor license agreement and all that soon.

I doubt this is a product question. The fact that this mentions "install" files doesn't mean it's an installation issue. Also, given that #1035 is very related to this and that that bug is even more out of scope, I think it makes sense to reclassify this as team-Local-Exec.

Any work on this? I was wondering why my du command was taking so long on my machine... turns out bazel had 704920 files in the cache totaling 23 GB.

I think even a warning message would be a good start so users don't need to debug issues like this.

Was this page helpful?
0 / 5 - 0 ratings