Bazel: Make the sandboxed file system more strict

Created on 31 Jan 2019  ·  35Comments  ·  Source: bazelbuild/bazel

Description of the problem / feature request:

Make it possible to configure the sandbox to whitelist local directories. The sandbox will have read access to only these directories (and its execroot). No other local directories will be available.

Today it is possible to blacklist directories with option --sandbox_block_path=<directory>. This feature request adds the possibility to whitelist directories instead.

Feature requests: what underlying problem are you trying to solve with this feature?

The current sandbox has read permissions to its execroot and almost everything in /. If a rule reads a file with absolute path, bazel assumes it is a file provided by the operating system. Bazel will not rebuild the target if this file is updated.

My work group needs more hermetic builds. We have bad experience from a previous build system (IBM ClearCase) which did not track file accesses outside of the workspace (VOB). This is almost exactly the same limitation as in the current sandbox; rules can read any file on our distributed file systems with an absolute path, but the target will not be rebuilt if this file is updated. This limitation forced us to turn off the remote cache in ClearCase, and avoid using incremental builds in CI, since they were not reliable.

Any other information, logs, or outputs that you want to share?

This has been discussed in the bazel-discuss Google group.

Design Document: Bazel Sandboxing 2.0 describes the current sandbox well, and the reason for allowing read access to everything in /.

My work group is willing to implement this feature.

P2 team-Local-Exec feature request

All 35 comments

@benjaminp Could you share the link to your tweaked sandbox that can mount an image instead of the host / again? I remember that you posted it on an issue recently, but can't find it.

I wonder if that would already be enough to support this use case. We could spend some time to polish it and get it into Bazel mainline :)

I think that this feature request and mounting an image solve two different use cases.

As a first step we want to use parts the local system, but we want to make sure that no absolute paths to our distributed filesystems are available. This means that our builds will not be fully hermetical. That's OK for us. Our assumption is that the Linux distributions in our environment are similar enough, and that the differences in the local systems do not matter.

However, in the long run we want to migrate to fully hermetical builds. Then the possibility to mount a Docker image as / sounds very interesting.

Even with a rootfs, you can mount whatever you like with --sandbox_add_mount_pair.

Aha, I see. With your tweaked sandbox we can mount a dummy image as /, and then use --sandbox_add_mount_pair to mount our white-listed local directories. This will solve our use case!

It would be great to get the --sandbox_rootfs option into bazel baseline. This problem basically stops my work group from adopting bazel.

Would it be possible to clean it up and create a pull request? If so, do you have any rough estimate of how long time this will take?

I would be happy to get this into Bazel mainline and review a PR. :)

While the rootfs is a useful feature, I think It's worth taking a step back and deciding what the strategy for (Linux) sandboxing is before adding it. Maybe Bazel should switch to using a real container runtime like runc instead of slowly reinventing all container features in linux-sandbox. We'd also want to make sure whatever we do is compatible with sandboxfs.

We just want to use --sandbox_rootfs to avoid mounting /. A --[no]sandbox_mount_root would work just as fine for us.

Do you think that we can add the latter option instead? I agree that the number of sandbox tweaking options start becoming many, and that we should settle on a long-term strategy for Linux sandboxing, but perhaps a --[no]sandbox_mount_root option is acceptable anyway.

I agree with @benjaminp on rethinking how sandboxing should work if we are going to make it more strict. I can't tell yet if adding an extra option is a good idea though, but maybe it's fine in the interim. From what @emusand says in the last comment, it sounds "simple", so if you could share a PR, maybe we could take it from there? :)

Ok, then we will implement the --[no]sandbox_mount_root option and share a PR!

@emusand any update on this option?

I hope to get time to implement the option in the next few weeks.

emusand has been a bit buzzy the last couple of months, but others in the team have started to have a look at this. So maybe we will have something ready during August.

I'm the one who is looking at this, at least for the next week while the others are away on vacation.

Please correct me if I am mistaken, but it looks like the agreed upon solution is to implement the --[no]sandbox_mount_root option in order to avoid mounting /. We'd then use a whitelisting option, perhaps the pre-existing --sandbox_add_mount_pair, to add visibility to the tools/directories of our choice.

Are the changes to implement --[no]sandbox_mount_root likely to be limited to the files in the sandbox directory (such as LinuxSandboxedSpawnRunner.java and AbstractSandboxSpawnRunner.java)? Or are there other files I need to know about?

Would examining the code for the older (blacklisting) sandbox be helpful, and is a good representation of that release 0.5.2?

Where is / mounted? Is that done, for instance, at the start of the AbstractSandboxSpawnRunner.getWritableDirs method (I'm looking at release 0.25.0)? Does the local variable sandboxExecRoot contain the reference to /? If not, what holds that?

You can't just "not mount root" - you have to mount something on /. :) The question is only what do you mount there - your real "/" from the host, or an empty directory (this will usually not work, because shared libraries and certain tools are just assumed to exist), or a chroot that contains a minimum set of files that you need.

Next, you will have to completely change how the mounting is done in the linux-sandbox code. What we currently do is just remount things in-place to read-only except for the paths that we want to write to, then optionally bind mount whatever the user specified via --sandbox_add_mount_pair=source:target.

What this would have to look like if you want to implement mounting an alternate root is that you create a container directory, then mount all paths that are needed there, then pivot_root or chroot correctly into the container directory and run the command.

The Linux sandboxing code that does the actual mounts is implemented in C. The relevant code is here:

It's not much code and should be easy to understand.

The old sandbox code is here (this is the last revision before the rewrite): https://source.bazel.build/bazel/+/774553eea688338caae754c49fbfc66d9a3475b7:src/main/tools/linux-sandbox.c;bpv=;bpt=0

The old sandbox did use pivot_root and the container directory approach. It might be interesting to look at how it did it.

As mentioned in earlier comments on this issue, it might be a better approach to investigate adding a new SandboxedStrategy that uses "runc" instead.

Hi Mark,

Thank you for giving this a shot.

As @philwo explained in his excellent reply, the task is a bit more intricate than my previous posts might have indicated.

The old sandbox tried to give read access to the local system but nothing else, by only mounting a hard-coded list of local directories. Then users complained that bazel did not find tools installed in other directories. The new sandbox gives read access to everything under /.

For us the old sandbox worked better than the new. With the new sandbox, a user can add an implicit dependency to a file in our distributed file systems, for instance by adding a genrule(...) that reads a file in /proj with an absolute path. Bazel will not rebuild if this file is updated. Note that we still can, and need to, read files in /proj, but we should add these files to a bazel workspace and avoid reading them explicitly with absolute paths.

My idea was to add a --[no]sandbox_mount_root option, which basically toggles between the old and new sandbox behavior. If false, the sandbox would just mount a minimal part of the local system, like the old sandbox did. If true, everything will be mounted as read-only, like the new sandbox does. This means that we will have to bring back most of the code from the old linux-sandbox.c file that @philwo linked to, and place it in linux-sandbox-pid1.cc or in a separate file.

I notice when I run my bazel build on a simple c file, it uses the ProcessWrapper strategy instead of the "Linux" strategy (which seems to be a fallback). What is the ProcessWrapper strategy? Should I be making changes to that instead?

Ah, the process wrapper seems to be an old strategy, not used since Bazel 0.4.5. I will ignore this.

@emlrsua That's not correct - the ProcessWrapperSandboxedSpawnRunner is used in all cases where we run on a POSIX operating system (macOS, Linux, FreeBSD, ...), but can't use a better platform-specific sandboxing mechanism like the DarwinSandboxedSpawnRunner or the LinuxSandboxedSpawnRunner.

This can happen more often than you might think, for example running Bazel in a Docker container without --privileged means we can't use the Linux Sandbox (because it uses user, PID and mount namespaces, which isn't allowed without --privileged).

On macOS, the sandbox-exec mechanism that we're using doesn't support nesting, so if for example you sandbox the entire Bazel invocation, Bazel itself can no longer use the Darwin Sandbox for sandboxing individual build actions.

If you notice that you're using the ProcessWrapper strategy when building something on Linux, it usually means that Bazel determined that the Linux Sandbox isn't working (most likely case is that your Linux distribution prevents the use of user namespaces or you're running in Docker without --privileged).

For example, Debian and Arch Linux block the use of namespaces by non-root users by default, unless you set the sysctl kernel.unprivileged_userns_clone to 1 (see https://unix.stackexchange.com/questions/303213/how-to-enable-user-namespaces-in-the-kernel-for-unprivileged-unshare).

Hope that helps and explains things a bit. :)

Yes, that helps. So, what c-code is this using? Is it using the "process-wrapper..." code in src/main/tools?

The "linux-sandbox" strategy (default on Linux) uses the linux-sandbox.cc binary, which is driven by this Java code.

The "processwrapper-sandbox" (fallback strategy) uses the process-wrapper.cc binary, which is driven by this Java code.

I think you can ignore the process-wrapper stuff, because it won't be possible to implement the "sandbox root image" idea with it.

@philwo ,
I am working with @emlrsua on this feature.
What we have seen is that Bazel selects ProcessWrapperSandbox instead of LinuxSandbox in our development environment, which is RedHat 6.10 (kernel ver. 2.6.32) where the user namespaces are not supported.
This makes it difficult to avoid supporting anything other than ProcessWrapperSandbox, at least for the current version of RedHat. Could you explain why the sandbox root image idea won't work with ProcessWrapperSandbox, and why it could with the LinuxSandbox?

@egechir By all means, if you have an idea how to make it work with the ProcessWrapperSandbox, please go for it. It would be really nice if that feature also worked on systems where we can't use the LinuxSandbox. :)

Here's why I think it won't work though: When the "ProcessWrapperSandbox" wants to run a spawn, it simply creates an empty directory, creates symlinks for all the input files of the spawn inside that directory and then runs the spawn (using the process-wrapper wrapper, which just ensures that the process and its child processes get somewhat reliably killed), with its working directory set to that directory. That's it. I just don't see a mechanism how to bring the "sandbox image" into play here.

The LinuxSandbox runs the spawn using a different wrapper, it uses the linux-sandbox wrapper, which uses the above mentioned namespaces so that it can modify mount points and could run chroot or pivot_root even though we're not root.

We are trying to evaluate the sandboxfs on RHEL 7.3 with kernel version 3.10 and the user has sudo access. However, we got the the following error from sandboxfs.log
sandboxfs: Failed to mount /home/egechir/.cache/bazel/_bazel_egechir/57066e830434106ff94272c4669f2f6b/sandbox/sandboxfs: Caught signal 15

Although we have set the Rust env variable RUST_LOG='info,debug,error ', we do not see any info or debug logs from sandboxfs . We wonder why is that ?
Do you have some advice on how to go forward ?

@egechir I would recommend to file your bug in the sandboxfs repository: https://github.com/bazelbuild/sandboxfs. You could also try to run sandboxfs manually and see if it works or also crashes. That way you should be able to get all the output of the command.

@philwo ,
Thank you for the recommendation and I will post the issue there.
As you suggested I ran the sandboxfs manually and this time I managed to get more logs. The problem is that the sandboxfs hangs when trying to mount. This is the sandboxfs.log.

The last message is DEBUG 2019-08-29T15:48:30Z: fuse::request: INIT(1) response: ABI 7.8, flags 0x1, max readahead 131072, max write 16777216.
I am wondering if the fuse ABI 7.8 is incompatible with the rust::fuse ?

@egechir Thanks for filing the issue there about your specific problem. Let's continue the discussion about it there.

We've gotten sandboxfs to work, although so far only independently of Bazel.

We have tested the Bazel sandbox feature on linux redhat 7.3 and 7.5, and always get the processwrapper sandbox chosen, not the linux-sandbox that we would like. Investigation shows that the reason for this is that the system command "clone" does not work on our linux test machines. It appears that certain flags used with clone are not working: CLONE_NEWUSER | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWPID. We either get EINVAL or EPERM errors returned when we manually try to run the linux-sandbox command line generated by Bazel with all or some combinations of those flags.

What are the requirements for redhat to support the linux sandbox? For instance, what version number, kernel version number, permissions (root? sudo?), and the like. We are puzzled that redhat 7.5 didn't simply support linux-sandbox.

@emlrsua It seems like RHEL 7 / CentOS 7 disable user namespaces by default.

I tested with RHEL 7.7 and CentOS 7.6 and got it working like this (no other kernel parameters or anything required):

# Allow up to 10000 user namespaces (default is 0).
echo 10000 | sudo tee /proc/sys/user/max_user_namespaces

# Run a little test with Bazel.
curl -LO https://github.com/bazelbuild/bazel/releases/download/0.29.0/bazel-0.29.0-linux-x86_64
chmod +x bazel-0.29.0-linux-x86_64
mkdir test
cd test
touch WORKSPACE
cat > BUILD <<'EOF'
genrule(
    name = "sleep",
    outs = ["sleep.txt"],
    cmd = "sleep 60; date > \"$@\"",
)
EOF
./bazel-0.29.0-linux-x86_64 build :sleep

# You should now see Bazel using the "linux-sandbox" strategy:
[1 / 2] Executing genrule //:sleep; 2s linux-sandbox

Tested successfully as a regular user with

Linux philwo-centos-7 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

and

Linux philwo-rhel-7 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I couldn't find older versions of CentOS 7 on Google Cloud. It might be that 7.3 and 7.5 are too old or require additional tweaks to the kernel command-line or sysctl. You can find quite a lot of relevant hints / blog posts about this by searching for "centos user namespaces", e.g. https://github.com/opencontainers/runc/issues/1906 and https://luppeng.wordpress.com/2016/07/08/user-namespaces-with-cent-os-7-rhel/.

When you ran that test, did you run it in a shell with root privileges? Or was that unnecessary?

@emlrsua I ran it as a normal user. Running with “sudo” didn’t seem to change things.

Hello @philwo ,

We are trying to get hold of a RHEL 7.6 machine to continue our work. Meanwhile, we came across a statement in the bazel guide about the kernel versions needed for sandboxed execution. It says that linux sandbox currently works for 3.12 or newer. Now we are wondering if RHEL 7.6 whose kernel version 3.10-xxx will be good enough. On the other hand, you were able to use the sandbox on RHEL 7.7 whose kernel version is 3.10-yyy. We are wondering how to interpret the documentation on sandboxing ?
Do you know if RHEL 7.6/7.7 are good enough to use the linux sandbox ?

Can everyone interested in using sandbox on RHEL move to another issue? That's not what this issue is about.

Hi everyone,

After a long pause on our efforts regarding this issue, we are picking it up again.
Has work been done in this area since last 2 years? @philwo @benjaminp

We want to extend the linux-sandbox to makes it more hermetic, by allowing users to whitelist directories to be bind mounted (such as /bin) and don't expose other directories in the sandbox.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

laurentlb picture laurentlb  ·  76Comments

dslomov picture dslomov  ·  106Comments

laurentlb picture laurentlb  ·  130Comments

johnynek picture johnynek  ·  105Comments

keith picture keith  ·  71Comments