Bazel: Error while building tensorflow 0.11.0 - cache (directory not empty)

Created on 20 Oct 2016  路  15Comments  路  Source: bazelbuild/bazel

I'm trying to install tensorflow 0.11.0 by running

./configure

I'm getting an error saying :

ERROR: /home/abc/.cache/bazel/_bazel_abc/235fe154e0/server (Directory not empty).

I'm not sure if they are related, but before the error message, I also get a warning saying:

WARNING: Output base '/home/abc/.cache/bazel/_bazel_abc/235fe154e0' is on NFS.     
This may lead to surprising failures and undetermined behavior.

I have no clue what the error message means, but if I try running ./configure right after this error message, I get another message saying:

/home/rkohli1/.cache/bazel/_bazel_rkohli1/235fe154e0a4c7e0c0527cd185fe6b6b/server/
.nfs00000000820050bd00000e9e (Device or resource busy).

At this point, I just tried deleting the entire .cache folder (I had to first kill a process which was preventing me from deleting it). I tried running configure with the --expunge_async flag as well but it doesn't help. It takes me back to the first error message.

Not sure if it's relevant, but I'm trying to install tensorflow with GPU support and use cuda 8.0 and cudNN 5

I raised this issue on stackoverflow (http://stackoverflow.com/questions/40144776/tensorflow-installation-error-directory-not-empty), and someone pointed out that it's due to a bug in bazel. Please advise me if I'm wrong.

Most helpful comment

I attempted both solutions suggest by @sfincke and @yselivonchyk, but without luck. Finally, I managed to change the cache location by running TEST_TMPDIR=/tmp/bazel/ ./configure, which solved the issue.

This global variable sets the overall cache directory as described here: https://bazel.build/versions/master/docs/output_directories.html.

All 15 comments

I am also having the same issue at the moment.

Same error message also for me.

Also having the same issue.

No sure if this is correct, but after I make the following change in the configure file:

function bazel_clean_and_fetch() {
# bazel clean --expunge currently doesn't work on Windows
# TODO(pcloudy): Re-enable it after bazel clean --expunge is fixed.
if ! is_windows; then
#bazel clean --expunge
bazel clean --expunge_async
fi
bazel fetch //tensorflow/...
}

I can install tensorflow 0.11 from source, with

  • bazel 0.3.1
  • cuDNN 5
  • Cuda 8.0

Jian

+meteorcloudy

Wait there is multiple various issue collated to this one:

  1. NFS mount point are known to be problematic. Use --output_base to direct the cache dir of bazel out of the NFS mount point.
  2. This is a known issue on Windows and @meteorcloudy is working on fixing it IIUC.

Anyway rm the bazel cache should fix the issue all the time.

Closing this issue please reopen a specific one for your use cases if you are not on those case.

hi @damienmg, I encountered the same issue as yours. As you suggested, I used ./configure --output_base=/temp/cache_bazel, however, I still found below warnings during configuration:

WARNING: Output base '/home/AIJ/.cache/bazel/_bazel_AIJ/aa61f742fcd63eed03445cc6cf85534c' is on NFS. This may lead to surprising failures and undetermined behavior.

Does this mean the output base has not changed to my specified folder, i.e. /temp/cache_bazel? And what should I do to make the cache dir of bazel out to be local?

Thanks!

After having the same problems as @AIROBOTAI , I finally hacked ./configure into submission. In 'bazel_clean_and_fetch', I added '--output_base TARGET_DIRCTORY' to both 'bazel ... clean' and 'bazel ... fetch'.

I was trying to build TF0.12 from source with bazel using NFS.

Neither of the suggestions from above worked for me:
1) editing .config file and adding --output_base did not work for fetch
2) everything from that thread resulted in the same NFS warning and issue with bazel's cache

This solution seems to be helping:
http://stackoverflow.com/questions/40144776/tensorflow-installation-error-directory-not-empty

Solution:
edit .config file and replace
bazel clean --expunge
with
bazel clean --expunge_async

I attempted both solutions suggest by @sfincke and @yselivonchyk, but without luck. Finally, I managed to change the cache location by running TEST_TMPDIR=/tmp/bazel/ ./configure, which solved the issue.

This global variable sets the overall cache directory as described here: https://bazel.build/versions/master/docs/output_directories.html.

Update,

I tried the same thing again: compile latest version of TF with bazel while using NFS file system after adjusting .config "bazel clean --expunge_async".
It did not work. After sometime at a random step server just hangs. The NFS consumes full network capacity while process is doing nothing. Most astonishingly, kill -9 does not help the process.

So, I would not recommend doint that on NFS unless you are free to restart your servers.

I tried some bazel commands to use custom cache location, but it did not work either.

I confirm https://github.com/bazelbuild/bazel/issues/1970#issuecomment-282177022 works for me as well!

For anyone still having issues with this note that TEST_TMPDIR=/tmp/bazel/ should be used before any command related to compilation. For instance, bazel clean --expunge_async should also be TEST_TMPDIR=/tmp/bazel/ bazel clean --expunge_async using your chosen temp directory.

@PiranjaF Should I use TEST_TMPDIR before bazel build as well?

It'd better to set "export TEST_TMPDIR=/tmp/bazel" before installing TF

Was this page helpful?
0 / 5 - 0 ratings