Running bazel build on a fat java/scala project (several thousands of targets) fails when working on linux debian with user namespace enabled.
Trying to run bazel build with user namespace enabled:
$ sysctl kernel.unprivileged_userns_clone=1
The build runs alright but at some point it crashes with weird memory issue:
ERROR: <target-path>/BUILD:35:1: error executing shell command: '
rm -rf bazel-out/local-fastbuild/bin/<package>/<target>.jar_temp_resources_dir
set -e
mkdir -p bazel-out/local-fastbuild/bin/<target>' failed: Process terminated by signal 6 [sandboxed].
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007f094606874b, pid=5, tid=0x00007f09472e0700
#
# JRE version: (8.0_131-b11) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x96874b] PerfMemory::alloc(unsigned long)+0x7b
#
# Core dump written. Default location: /home/builduser/.cache/bazel/_bazel_builduser/bc0e462ab01ac9379d22ad058ca1cb1f/bazel-sandbox/4864102460254154064/execroot/__main__/core or core.5
#
# An error report file with more information is saved as:
# /home/builduser/.cache/bazel/_bazel_builduser/bc0e462ab01ac9379d22ad058ca1cb1f/bazel-sandbox/4864102460254154064/execroot/__main__/hs_err_pid5.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
The machine is docker container based on debian image
$ uname -a
Linux 167-docker99 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2 (2017-04-30) x86_64 GNU/Linux
builduser@167-docker99:~/ws/bazel-port-isolation$ cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"Bazel version
specs2 versions and test runner env preparation)unprivileged_userns_clone=0 (but clearly - that's not a solution)Can you please send the output of "cat /proc/mounts", "free -m", "df -h" and "df -i" from inside the Docker container where the SIGBUS happened?
I think this happens when /tmp does not have enough space or is a tmpfs and there's not enough free RAM to back all files that the JVM wants to create via mmap there, but I had a hard time tracking it down exactly.
@philwo - The machine should be crazy strong... Maybe even too strong? :-)
#cat /proc/mounts
rootfs / rootfs rw 0 0
none / aufs rw,relatime,si=bd2d63d83e471179,dio,dirperm1 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,nosuid,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /sys/fs/cgroup tmpfs rw,nosuid,nodev,noexec,relatime,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
/dev/xvda2 /etc/resolv.conf ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /mnt/tmp ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /tmp/config ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /opt/intel ext4 rw,relatime,data=ordered 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0
/dev/xvda2 /opt/bin ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /etc/hostname ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /opt/private ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /etc/hosts ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /home/builduser/.npmrc ext4 ro,relatime,data=ordered 0 0
tmpfs /run/docker.sock tmpfs rw,nosuid,relatime,size=52934720k,mode=755 0 0
/dev/xvda2 /home/builduser/.pypirc ext4 ro,relatime,data=ordered 0 0
/dev/xvda2 /home/builduser/.ssh/google_compute_engine ext4 rw,relatime,data=ordered 0 0
/dev/xvda2 /home/builduser/.jfrog/jfrog-cli.conf ext4 ro,relatime,data=ordered 0 0
/dev/xvda2 /home/builduser/.ssh/google_compute_engine.pub ext4 rw,relatime,data=ordered 0 0
devpts /dev/console devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
#free -m
total used free shared buffers cached
Mem: 258470 43545 214924 32 4771 28253
-/+ buffers/cache: 10520 247950
Swap: 0 0 0
# df -h
Filesystem Size Used Avail Use% Mounted on
none 99G 36G 60G 38% /
tmpfs 127G 0 127G 0% /dev
tmpfs 127G 0 127G 0% /sys/fs/cgroup
/dev/xvda2 99G 36G 60G 38% /mnt/tmp
tmpfs 127G 16M 127G 1% /dev/shm
tmpfs 51G 17M 51G 1% /run/docker.sock
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
none 6553600 1186838 5366762 19% /
tmpfs 33084198 135 33084063 1% /dev
tmpfs 33084198 14 33084184 1% /sys/fs/cgroup
/dev/xvda2 6553600 1186838 5366762 19% /mnt/tmp
tmpfs 33084198 16 33084182 1% /dev/shm
tmpfs 33084198 966 33083232 1% /run/docker.sock
Whoah, yes, resources are not an issue on that machine. :)
Still, the only thing I could ever find about JVMs crashing with SIGBUS is this: http://bugs.java.com/view_bug.do?bug_id=6563308
I wonder if it's the same issue. Do you use any --sandbox_* flags for your build? Maybe --sandbox_tmpfs_path=...?
No... I have some test targets where I set the java_flags with -Djava.io.tmpdir=/tmp but it fails on the build phase.
@philwo any ideas? the problem is that network-isolation on our machine (docker container) doesn't work without enabling user namespace, fails for a different reason on 0.5.1 when it's enabled and doesn't build on head when it's enabled.
We were able to verify on a very small java repo that on HEAD with user namespace enabled that network-isolation works for us.
If we could just get past this we could parallel our tests...
I'm trying to find out why this is happening now.
Can you please try using the flag "--sandbox_tmpfs_path=/tmp" in your "bazel build" (or "bazel test") command and see if the error still happens? This will mount an empty tmpfs on /tmp for each running action. It's generally not a bad idea, because it increases hermeticity (otherwise /tmp is a writable directory shared between all actions of a build, so they could create conflicting files or accidentally keep state there.)
An alternate idea might be to mount a tmpfs on /tmp inside the Docker container before running the bazel command, but I'd like to try the first one first, because I remember that it helped a different user.
It would be interesting to see if this makes the problem disappear.
A process gets SIGBUS in one of two conditions:
1) It tries to read an address that no longer exists from an mmap'd file (e.g. because the file was truncated by a different process).
2) It tries to write more bytes to an mmap'd file than the underlying device can hold.
The first thing might happen when multiple JVM processes try to use the same file via mmap and one of them truncates it, I guess. Is the failing action always the same kind, e.g. a Scala compilation?
The second possibility is more likely, but you have 60G free on / and /tmp is not a mount point, so it's hard to see how you can run out of disk space during a build... OTOH you have almost four times as much RAM as free disk space on that machine - maybe some process uses a heuristic like "let's allocate a temp file with the size of 1/4 the RAM, this should always be a reasonable number"?
I also don't understand why this only happens when you use the linux-sandbox and not when you use the standalone strategy. Do I understand correctly that the build works fine then?
@philwo using --sandbox_tmpfs_path=/tmp indeed passes the build, thanks!
It seems that it slows down the build a bit. Am I right? Didn't have time yet to run multiple benchmarks.
Also, this should be solved right? Shouldn't it be a release blocker?
using --sandbox_tmpfs_path=/tmp indeed passes the build, thanks!
Nice! :)
It seems that it slows down the build a bit.
It shouldn't.. actually it should make things a bit faster, because /tmp is now backed by RAM instead of disk. On the other hand, it might be possible that mounting the tmpfs incurs some overhead, too.
If you measure a noticeable difference, I'd be quite interested in it.
this should be solved right? Shouldn't it be a release blocker?
Absolutely! The problem is, I cannot reproduce this on my machine and the cause is completely unknown. :( It also doesn't seem to affect many people. If we had a clear repro case or someone more familiar with JVM internals, it might be easier to get to the bottom of this issue.
What I'd be really interested in is if simply mounting a tmpfs on /tmp and not using the --sandbox_tmpfs_path flag also helps, or if the issue then happens again. What I mean is:
# I'm in my Docker container!
$ mount -t tmpfs tmpfs /tmp
$ bazel build //...
If the issue reoccurs, I believe that we're seeing a race condition here, maybe related to this: https://stackoverflow.com/questions/76327/how-can-i-prevent-java-from-creating-hsperfdata-files
Maybe the JVMs in a highly parallel build accidentally create hsperfdata files with the same name and when one truncates the file of another running JVM, that JVM gets a SIGBUS because the file underlying its mmap went away. But this is really just a guess.
More info: http://www.evanjones.ca/jvm-mmap-pause.html
OMG, wait, I got it
@aehlig @ulfjack FYI.
root@ubuntu:~# strace -f -- java HelloWorld
[pid 1432] open("/tmp/hsperfdata_root", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
[pid 1432] fchdir(4) = 0
[pid 1432] open("1431", O_RDWR|O_CREAT|O_NOFOLLOW, 0600) = 5
[pid 1432] ftruncate(5, 0) = 0
[pid 1432] mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x7f54cad93000
Every JVM creates a temporary performance instrumentation file in /tmp/hsperfdata_$USERNAME/$PID. When we use sandboxing, we use PID namespaces, which means that the PIDs are virtualized and all running JVMs believe they are PID 2.
This means that they all open/ftruncate/mmap the same file and that gives you SIGBUS eventually, due to case 1 I mentioned above: "It tries to read an address that no longer exists from an mmap'd file".
When you use --sandbox_tmpfs_path=/tmp, each running sandbox gets its own /tmp, so the files don't conflict.
This means the solution is quite simple and I'll come up with something on Monday. For now, I'd recommend to use the --sandbox_tmpfs_path=/tmp flag.
Well done!
On Sat, Jun 24, 2017 at 12:21 AM Philipp Wollermann <
[email protected]> wrote:
@aehlig https://github.com/aehlig @ulfjack https://github.com/ulfjack
FYI.root@ubuntu:~# strace -f -- java HelloWorld
[pid 1432] open("/tmp/hsperfdata_root",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
[pid 1432] fchdir(4) = 0
[pid 1432] open("1431", O_RDWR|O_CREAT|O_NOFOLLOW, 0600) = 5
[pid 1432] ftruncate(5, 0) = 0
[pid 1432] mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) =
0x7f54cad93000Every JVM creates a temporary performance instrumentation file in
/tmp/hsperfdata_$USERNAME/$PID. When we use sandboxing, we use PID
namespaces, which means that the PIDs are virtualized and all running JVMs
believe they are PID 2.This means that they all open/ftruncate/mmap the same file and that gives
you SIGBUS eventually, due to case 1 I mentioned above: "It tries to read
an address that no longer exists from an mmap'd file".When you use --sandbox_tmpfs_path=/tmp, each running sandbox gets its own
/tmp, so the files don't conflict.This means the solution is quite simple and I'll come up with something on
Monday. For now, I'd recommend to use the --sandbox_tmpfs_path=/tmp flag.—
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/3236#issuecomment-310776024,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABUIFwM1GwYHur29dB7Ex-QEaStV-72Dks5sHCxXgaJpZM4OBNWF
.
Does it help if we set TMPDIR to a unique path for each action?
Is that targeted to me? If so how can I do that?
On Sat, Jun 24, 2017 at 10:27 PM Ulf Adams notifications@github.com wrote:
Does it help if we set TMPDIR to a unique path for each action?
—
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/3236#issuecomment-310860871,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABUIF53PNfGvdHgfy4dVVDdUlJnl_4Egks5sHWM9gaJpZM4OBNWF
.
I don't think there is any way to change this. "/tmp" seems to be the hard-coded location.
philwo@ubuntu:~$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
philwo@ubuntu:~$ strace -f \
-E TMP=/home/philwo/tmp \
-E TMPDIR=/home/philwo/tmp -- \
java -Djava.io.tmpdir=/home/philwo/tmp HelloWorld 2>&1 | \
fgrep hsperfdata
[pid 1589] open("/tmp/hsperfdata_philwo", O_RDONLY|O_NOFOLLOW) = 3
[pid 1589] open("/tmp/hsperfdata_philwo", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
[pid 1589] mkdir("/tmp/hsperfdata_philwo", 0755) = -1 EEXIST (File exists)
[pid 1589] lstat("/tmp/hsperfdata_philwo", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 1589] open("/tmp/hsperfdata_philwo", O_RDONLY|O_NOFOLLOW) = 3
[pid 1589] open("/tmp/hsperfdata_philwo", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
[pid 1589] unlink("/tmp/hsperfdata_philwo/1588") = 0
This is consistent with what I found on the Java bugtracker:
No further action or intend to change this behavior since then.
Some ideas:
1) Always mount tmpfs on /tmp. Problem: Some users might not want this behavior, e.g. on memory constrained environments or if they store huge temporary files in /tmp.
2) Make /tmp/hsperfdata_$USERNAME read-only inside the sandbox. This will prevent the JVM from creating the conflicting file. Problem: Adds JVM specific knowledge to our sandbox implementation.
3) Call Java with -XX:-UsePerfData or -XX:+PerfDisableSharedMem flag. Problem: Unclear how we can make sure that this will always be used, even when users manually launch Java from their own Skylark actions or something.
Any other ideas? I think I like 2) the best and it would be simple to implement.
I don't know what the constraints are here, but instead of mounting a tmpfs instance, could you mount --bind a fresh temporary directory onto /tmp? Should avoid the memory issues, though you'd still be paying the cost to mount the extra fs. (Given that this is a problem caused by PID namespaces, which are Linux-specific, I'm assuming this unportable solution would be acceptable.)
@jmmv In theory yes, but this will unfortunately break the people who put their workspace or output base inside /tmp again, because then the tmp mounted on top of /tmp will hide your input respectively output files / dirs.
I still have no clue why anyone would do that, but it comes up every single time I accidentally break it.
We could detect that case and construct a sequence of bind mounts that make the workspace / output base visible even though we're bind mounting an empty dir to /tmp. Alternatively, we could bind mount an empty directory to /tmp/hsperfdata_
I'll give the "empty dir on /tmp" idea a try today and if that doesn't work out go for the make "/tmp/hsperfdata_$USERNAME read-only in sandboxes" version.
Any update?
On Tue, 27 Jun 2017 at 10:49 Philipp Wollermann notifications@github.com
wrote:
I'll give the "empty dir on /tmp" idea a try today and if that doesn't
work out go for the make "/tmp/hsperfdata_$USERNAME read-only in sandboxes"
version.—
You are receiving this because you commented.Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/3236#issuecomment-311280957,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABUIF9vrOZAO8A97DnTuoowCBdVmfXlyks5sILQSgaJpZM4OBNWF
.
@ittaiz Unfortunately I got sick just after writing that comment and was out of office the entire week :| I'm fine again now and will be back on Monday. Current plan is to mkdir that directory and then make it read-only unless a tmpfs is mounted on /tmp (because that also solves the problem in a different way).
It should be a rather simple fix that I can get easily done on Monday.
Thanks! Glad to hear you're better.
On Sat, 8 Jul 2017 at 12:06 Philipp Wollermann notifications@github.com
wrote:
@ittaiz https://github.com/ittaiz Unfortunately I got sick just after
writing that comment and was out of office the entire week :| I'm fine
again now and will be back on Monday. Current plan is to mkdir that
directory and then make it read-only unless a tmpfs is mounted on /tmp
(because that also solves the problem in a different way).It should be a rather simple fix that I can get easily done on Monday.
—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/3236#issuecomment-313844421,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABUIF0V7f0PAG0mofhq3OzmKHvD3QHwgks5sL0apgaJpZM4OBNWF
.
I was just reminded by a colleague that this is still open - I'll try to fix it (once and for all!) tomorrow.
Hey @philwo ! Just confirmed that the issue persists on 0.5.3 (still requires the workaround).
Do we have plans to push a fix for that on 0.5.4?
Thanks!
I found out, that performance instrumentation directory hsperfdata_userid created by JVM is poisoning the build on Windows, and conflicting with Bazel's zipper utility: #3542.
My workaround is to add this the java_binary target:
jvm_flags = ["-XX:-UsePerfData"]
This will disable the hsprefdata directory in case you don't need it during a build.
Hi,
Any news?
Hi,
Any news?
Hi Ittai and others,
really sorry for the lack of action here. :(
The recommended fix for this is to use --sandbox_tmpfs_path=/tmp. On top of providing a fix for this issue, it also generally improves performance and hermeticity of builds. The only known reason why someone would not want to use that flag is if your actions produce so much output in /tmp that it would overflow your RAM, or that your workspace or output_base is located inside /tmp (I don't understand why people would do that, but it was a common complaint when I initially made that flag the default behavior).
All the other possible fixes have some other drawbacks. That said, I will still implement a "permanent" fix for this, even for the case where --sandbox_tmpfs_path cannot be used. If this is really blocking you, please let me know and I can prioritize this.
Philipp
We chatted this offline. This issue is made worse by the fact that Bazel 0.7 now ships with turbine as part of the java toolchain and running it in sandbox can cause sigbus. We talked about easy fixes to this issue such as:
Both will work but (2) is a bit intrusive if user has pointed hsprefdata to some other locations.
Fix is out: https://bazel-review.googlesource.com/c/bazel/+/23070
Does this look OK to you @ittaiz and @hhclam?
Hi,
will the fix be committed soon after half year? I ran into this problem pretty often recently. It will be great that a right fix can be merged soon. thanks!
We are frequently encountering this issue with 0.18.0 and JDK 9. I can open a new issue if that is more helpful.
Not sure on which version is was fixed, but today we run with bazel 0.24.0 and without the --sandbox_tmpfx_path workaround and the build works well.
@sdqali - you may want to open a new issue if it still happens to you
This is not fixed and it probably just worked due to luck, because this is a race condition. It is not safe to run Java based tools in a sandboxed Bazel build on Linux without this flag.
Sorry that it isn't fixed yet. :(
Any updates on this? I started hitting this issue very often on all linux machines in CI.
My current workaround is to add the following to my .bazelrc
build --enable_platform_specific_config
build:linux --sandbox_tmpfs_path=/tmp
I'm so sorry that we don't have a good automatic fix for this yet.
@nkoroste Your method is great, I would recommend using that to everyone for now.
Most helpful comment
Any updates on this? I started hitting this issue very often on all linux machines in CI.
My current workaround is to add the following to my
.bazelrc