Nix: Chroot builds are slow

Created on 2 Dec 2013  Â·  99Comments  Â·  Source: NixOS/nix

Chroot builds have a significant overhead. For instance, this expression:

with import <nixpkgs> {};
with lib;
let deps = map (n: runCommand "depM-${toString n}" {} "touch $out") (range 1 100);
in runCommand "foo" { x = deps; } "touch $out"

(i.e. a trivial build with 100 trivial dependencies) takes 4.7s to build on my laptop without chroot, but 39.6s with chroot.

The main overhead seems to be in setting up the chroot (i.e. all the bind mounts), but the various CLONE_* flags also have significant overhead.

Unfortunately, there is not much we can do about this since it's all in the kernel, but it does mean we can't enable chroot builds by default on NixOS.

This is on Linux 3.4.70.

improvement

Most helpful comment

:tada:

All 99 comments

Oh, not good. Are there some other standard sandboxing options? (except for LD_PRELOADing some libc hooks)

I can imagine a cheaper chroot that just bind-mounts the entire Nix store. Maybe we could even put an ACL on /nix/store to deny "r" but not "x" permission to nixbld users. That way builds can only access store paths that they already know.

Also, maybe it's faster on newer kernels.

Well, I don't think packages try finding something by listing /nix/store. In general, maybe we could deny "r" on it for everyone, but I fail to see any significant gain.

Maybe providing some cheap variant of chroot by default could be a good compromise (with possibility to switch to stronger guarantees).

The benefits of chrooted builds are _far_ more significant than the performance cost. Chroot builds should be totally enabled on NixOS by default!!!

Are the bind mounts done in parallel?

No.

@edolstra I'd still prefer purity/determinism over performance and enable chroots on Linux by default.

On my machine (SDD, running kernel 3.14):

real 0m27.129s
user 0m0.139s
sys 0m0.038s

@iElectric: are you sure your measurement is correct? It shows mostly waiting and no real work. Or is that because the work is in fact done in another process?

:+1: for a mini chroot that has all of nix store. This could be reused, no? Same chroot for all builds?

@vcunat i'd say most of the time it's waiting for nix-daemon IO

Computers have become faster in the past 2 years. We should re-evaluate whether the speed is really worth the significant impurities.

Note that the fact that NixOS default Hydra not using chroot leads to packages "randomly" failing to build locally for those who do use it.

So at least Hydra should enable it.

Hydra does use it. It's the other way around, users like me might not have it enabled and think that a package builds properly when it doesn't. (Happened today with a PHP update, which turns out to do a download during its build.)

Yes, leaving our deterministic promise aside for a sake of some small overhead.

When nix sets up chroots, is most of the time spent setting up bind mounts? Or does it do a lot of file copying too? If the latter, have you considered using something like Cowdancer (http://www.netfort.gr.jp/~dancer/software/cowdancer.html.en) to get copy-on-write bind mounts? It's low-overhead and fast to set up. Debian uses it in cowbuilder/pbuilder, which makes for an excellent ephemeral-chrooted build system.

@benley: COW isn't needed, as all accessible in the chroot is read-only anyway. From the comments it seems noone has analyzed precisely what's the main cost, but bind mounts are suspected (and they probably were never meant to be used so massively).

Has anyone looked into proot for this purpose?

"PRoot uses the CPU emulator QEMU user-mode to execute transparently guest programs." I doubt that's faster than bind mounts :-)

Yeah, PRoot might be faster to _setup_, but it sounds significantly slower to _run_ longer builds (which happen a lot). Various other preloading solutions might also slow down system calls, although probably not so much.

@edolstra oh sorry, my understanding was that it only used QEMU when the guest was of a different architecture

I believe proot only uses qemu when it's running binaries from a non-native architecture. The proot website is fairly clear about that, unless I'm badly misinterpreting it: http://proot.me/

It does still intercept system calls in userland, and it's going to have some unavoidable speed overhead.

Isn't it documented to use ptrace? If so it will signal the controlling
process and wait for a command on every syscall that is intercepted.

On Tue, Jan 20, 2015 at 12:47 AM, Benjamin Staffin <[email protected]

wrote:

It does still intercept system calls in userland, and it's going to have
some unavoidable speed overhead.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nix/issues/179#issuecomment-70581168.

My sophisticated web searches (i.e. "proot benchmark") didn't show up
anything. Anybody tried it yet?

On Tue Jan 20 2015 at 4:17:27 AM Alexander Kjeldaas <
[email protected]> wrote:

Isn't it documented to use ptrace? If so it will signal the controlling
process and wait for a command on every syscall that is intercepted.

On Tue, Jan 20, 2015 at 12:47 AM, Benjamin Staffin <
[email protected]

wrote:

It does still intercept system calls in userland, and it's going to have
some unavoidable speed overhead.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nix/issues/179#issuecomment-70581168.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nix/issues/179#issuecomment-70598527.

Hello all,

I confirm that PRoot uses QEMU to run non-native binaries only, and
that it is currently based on ptrace; which is known to cause a
significant slowdown. However, in order to decrease this slowdown as
much as possible, PRoot uses process_vm_{readv/writev} (available on
Linux 3.2+) and seccomp mode 2 (available on Linux 3.5+). For
information, here follow figures I've published when I've enabled
seccomp mode 2 in PRoot:

https://github.com/cedric-vincent/PRoot/blob/v5.1.0/doc/proot/changelog.txt#L510

My suggestion is to give PRoot a try if your kernel version is equal
or greater than 3.5, and if it's not too difficult to replace in your
scripts calls to "chroot" and to "mount --bind" with a call to
"proot". If PRoot is not fast enough, this will be likely fixed in
the future using kernel namepaces (available on Linux 3.8+).

Regards,
Cedric.

Seems like using Linux namespaces would dovetail with the pure Darwin stdenv work. All the better if they are faster than chroots.

They actually already use Linux namespaces. chroot is a bad name for them.

Heh, in that case NixOS should call them Containers and pick up some buzzword publicity points. "Build all the things in containers!" containers containers containers containers containers. ;-)

@benley good point :smile:

@edolstra the chroot naming has bugged me for a while, especially given that neither of our platforms that support the feature actually use chroot. Sandbox/container builds seems like a more neutral term for the concept, and chroot could remain a deprecated alias for a few releases. I realize naming is the least of our concerns, but the naming also suggests that they're less secure/isolated than they actually are, and I think that's detrimental to perception. I do think trustability/auditability of builds and build machines is a key selling point for Nix, and one we should probably push harder (on the front page) than we currently do.

It's unshare + chroot. It's not only unshare, and not only chroot. Chroot is a valid term, when applicable.

_Sandbox_ is a good general term for this, IMO.

I agree that sandbox is a fine term. But that's really something for a separate issue.

Do we use unshare in such a way that this will work without root? Is the implementation shared with nix-container at all?

@Ericson2314 I currently have a ticket for that (I think it's what you mean): https://github.com/NixOS/nix/issues/625

@Ericson2314 no it needs root because NEWUSER is not used apparently, but @edolstra can tell better: http://lxr.devzen.net/source/xref/nix/src/libstore/build.cc#2131

It _should_ be possible to use NEWUSER and set a mapping for the build users. With NEWUSER it's also possible to chroot(). That means it would be possible to have sandboxing when nix is running as user without nix-daemon.

Thanks, I'll continue this in the other thread.

(extracted from #12596) I understand that the issue is not solved yet, but how about making nix.useChroot enabled by default on NixOS nevertheless? This is _de facto_ required to be sure that nothing is broken during packaging new things, so disabling it may be useful only if user has made some patch that is both (a) local for him, so he doesn't need to test it extensively and (b) mass rebuild-y. In other cases (typical PRs or just regular use) either number of built (not fetched) derivations would be too small to notice or the user would need to enable chroot for testing prior to submitting a PR anyway.

We might have a compromise taking the best of both by default: use chroots everywhere except for those tiny nixos builds that just create a couple config files.

I think that would be achieved by setting build-use-sandbox = relaxed in nix.conf, and having __noChroot = true; in those trivial builds. (The option is described in man nix.conf.)

When a package takes 5 minutes to compile, gaining 36 seconds seem like a micro-optimization. Is it worth losing the purity for that ?

Users should be getting binaries from the cache. The only slow thing would then be buildEnv, which we can probably safely unsandbox.

Developers should be running with sandbox enabled anyways. Right now we have a checkbox in our PRs that asks: did you run with sandbox enabled ? That would become one less thing to worry about.

@zimbatm: note that the increase is about 0.36 seconds per build. That seems bearable for larger buildEnv invocations, as they can require lots of I/O anyway. It's not good for loads of nixos derivations like unit-*-.service.drv.

:+1: on enabling by default

Any ideas on how to improve the perfs ? It would be cool if we could prove
that the derivation is safe and avoid the sandbox in that case.

On Sun, 6 Mar 2016 10:56 Robin Gloster, [email protected] wrote:

[image: :+1:] on enabling by default

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nix/issues/179#issuecomment-192870853.

For what it's worth in the calculus of "how much is our time worth", I lost ~4 hours yesterday debugging a problem in a build that was masked by the existence of /tmp/.git on my machine.

Also related, even though it wouldn't fix @chris-martin's problem, is https://github.com/NixOS/nix/issues/907. I.e., we can get a fair amount of isolation for free. The filesystem isolation would still be more expensive but turning off network access would be trivial.

Another thought for how to make the bind mounts faster would be a compromise. We whitelist all of /nix/store (rather than all the individual inputs, because it's honestly pretty rare for people to try reaching into unrelated store paths; we could also catch those on the Nix side by expanding the dependency scanning code), a couple of /dev nodes, the build directory, and that's it. I feel like that handful of bind mounts would get us 90% of the purity benefit for a lot less overhead. Am I missing something?

Oh, @edolstra already proposed that ages ago, I see:

I can imagine a cheaper chroot that just bind-mounts the entire Nix store. Maybe we could even put an ACL on /nix/store to deny "r" but not "x" permission to nixbld users. That way builds can only access store paths that they already know.

Also, maybe it's faster on newer kernels.

Anyway, I think that's the path forward. My main goal is to stop builds from reaching out to standard FHS paths on non-NixOS linux distros, or from doing shady things with temporary directories as @chris-martin described. We can get that easily! It would be lovely if the next Nix release had sandbox builds turned on by default.

we could also catch those on the Nix side by expanding the dependency scanning code

Just to elaborate on this point: currently Nix scans store outputs for store hashes of potential inputs based on dependency tracking on the Nix language side. An alternate implementation could be to scan for anything that looks like a store path/hash, and if that is a subset of the known dependencies from the Nix side, we register those as runtime dependencies. Otherwise, Nix barfs and says there was an illegal dependency. We could also reserve a sample Nix store path to use in documentation that would be allowed.

Edit: although I guess that wouldn't catch illegal uses of dependencies that don't end up in the outputs...

For sure, a cheap 90% solution is better than no solution…

Hopefully it won't end up only by us talking...

@vcunat hey hey now, you surely can't be proposing that we actually do work to improve our favorite tool can you?

A side question that popped in my mind: why do we have +x on /nix/store at all? Suppose we get rid of it -- we immediately solve some performance problems of chroot (by mounting the whole /nix/store as @edolstra proposed) and, as a bonus, get a (half-assed but not bad!) solution for secret Nix derivations!

Surely it was discussed somewhere and there is rationale for keeping it visible, but I can't find any immediate references. What are the bad points?

EDIT: I can remember myself actually using +x several times by running find on the store, but I, for one, won't mind doing it under sudo...

@abbradar I assume you mean removing +r (rather than +x) on /nix/store. We had that for a while, but it caused problems with KVM builds (https://github.com/NixOS/nixpkgs/commit/a38f130126c5d81a961b26ecaafb845a186a6d91). However @aszlig's recent QEMU patch might make this a non-issue.

It's not a solution for secret Nix derivations though, since it doesn't stop users from doing "nix-store --export".

Yes, you are right, I messed up +r and +x for directories.

--export needs a path, correct? This way one needs to know an exact hash of secret derivation (which is difficult to guess unless you can produce the very same derivation). Not much security, but I won't call it security by obscurity - it can help.

Of course it won't help with system-wide secrets like SSL certs (because one can trace the file from the root nixos derivation).

Nikolay.

I just did some benchmarks on a 4.9 kernel. (4.9 appears to be a bit faster than 4.4 in namespace operations.) It turns out the real killer is the private network namespace, not the bind-mounting.

For example, building the following expression:

with import <nixpkgs> {};
with lib;
let deps = map (n: runCommand "dep-${toString n}" { 
    x = [ pkgs.firefox pkgs.perlPackages.CatalystRuntime ]; 
  } "touch $out") (range 1 100);
in runCommand "foo" { x = deps; } "touch $out"

With -j8 and sandboxing disabled:

real    0m1.751s
user    0m3.029s
sys     0m1.244s

With full sandboxing:

real    0m7.607s
user    0m2.660s
sys     0m2.038s

With the network namespace disabled:

real    0m3.682s
user    0m3.017s
sys     0m2.638s

I also experimented with overlayfs (i.e. creating /nix/store in the sandbox as an overlay with the host's /nix/store as the lower directory - this allows removing read permission and redirecting outputs). This gives an okay improvement if the network namespace is disabled:

real    0m2.679s
user    0m3.083s
sys     0m1.657s

But with the network namespace enabled, the improvement is less:

real    0m7.230s
user    0m2.602s
sys     0m1.344s

So there appears to be some weird interaction between network and mount namespaces. In any case, overlayfs is probably a good idea because it makes the time to setup the sandbox O(1) rather than O(n) in the size of the input closure.

BTW, the slowness of network namespaces can be demonstrated easily:

$ for i in $(seq 1 100); do time unshare -U -n true; done 2>&1 | grep real
real    0m0.002s
real    0m0.061s
real    0m0.001s
real    0m0.002s
real    0m0.001s
real    0m0.001s
real    0m0.001s
real    0m0.001s
real    0m0.001s
real    0m0.254s
real    0m0.002s
real    0m0.001s
real    0m0.001s
real    0m0.001s
real    0m0.002s
real    0m0.001s
real    0m0.001s
real    0m0.268s
...

So creating the network namespace is usually fast (~1 ms) but sometimes much slower (> 200 ms). It might be worthwhile to poke around in the kernel to see where this delay comes from.

A workaround might be to reuse network namespaces between builds, using setns().

@edolstra still slightly confused: in your tests of network namespaces, you see delays of ~200-300ms, whereas it seems to be in the realm of a few seconds in a real Nix build. Any idea what causes the difference there?

Because the Nix test is running 100 derivations (though with -j8).

Aha, okay!

It seems interesting that the real time is getting larger than user and sys together (with network namespaces). It's as if waiting for some I/O.

A workaround might be to reuse network namespaces between builds, using setns().

That actually got me thinking, why _wouldn't_ we reuse it? Our derivation builds are completely offline so the namespace doesn't need any sort of fanciness to initialize it or cache across builds. We'd just have one persistent namespace that can speak to the internet for fixed-output derivations (perhaps they wouldn't even detach from the parent netns), and another namespace with no interfaces in it for regular derivations.

@edolstra I'm using kernel 4.10.12 (NixOS 17.03) and I don't see the outliers running the slowness of network namespaces can be demonstrated easily: https://gist.github.com/anonymous/9c29444acb17c2bf5b8270f5252fb092

Running your 100 derivations with $ time nix-build foo.nix -j 8 --option use-binary-caches false :

sandboxing on:

real    0m4.995s
user    0m0.312s
sys     0m0.033s

sandboxing off:

real    0m1.805s
user    0m0.307s
sys     0m0.031s

With "finger in the air" statistics, that's 3s extra per 100 derivations. My system closure has ~1000 derivations, so even if I did build all my derivations from source, that would be 30s added on top of a few hours of compile time.

EDIT: a reminder, this issue was opened when the difference was 4.7s vs 39.6s

I just did a test on 4.9 and 4.10, and unfortunately 4.10 shows a dramatic slowdown in the Nix expression shown in https://github.com/NixOS/nix/issues/179#issue-23604548:

| Kernel | -j | Sandbox | Elapsed time (s)
| - | - | - | ---:
| 4.9.20 | -j1 | - | 3.24
| 4.9.20 | -j1 | ✓ | 11.45
| 4.9.20 | -j4 | - | 1.56
| 4.9.20 | -j4 | ✓ | 5.27
| 4.10.15 | -j1 | - | 3.15
| 4.10.15 | -j1 | ✓ | 36.55
| 4.10.15 | -j4 | - |1.59
| 4.10.15 | -j4 | ✓ | 30.07

This is on an m4.xlarge EC2 instance (Xeon E5-2686, 4 cores).

Network namespaces seem to have become much slower:

# for i in $(seq 1 100); do time unshare -U -n true; done 2>&1 | grep real
real    0m0.018s
real    0m0.301s
real    0m0.002s
real    0m0.002s
real    0m0.002s
real    0m0.002s
real    0m0.002s
real    0m1.392s
real    0m0.002s
real    0m0.002s
real    0m0.002s
real    0m0.002s
real    0m0.002s
real    0m1.659s
...

Linux 4.11 shows similar results.

This is with Nix 1.11.8 and --option build-use-substitutes false.

I noticed the same thing with 4.10 and 4.11 but it looks like 4.12 has improved things fairly significantly although still 1.5x slower than 4.9

Anyone have any idea what's going on behind the scenes here?

Based on https://github.com/moby/moby/issues/26435#issuecomment-247235726, it appears that network namespaces have been hacked together in Linux. The solution appears to be to reimplement the feature completely.

Looking a bit further, someone has made it two times faster: https://lkml.org/lkml/2017/4/24/339, but the root cause is still present.

I apologize if I'm missing something, but as far as I can tell the problem encountered in the linked moby issue results from their use of iptables (since the namespaces need to be routed but also isolated). I think in Nix it's sufficient to create an empty namespace connected to nothing, so there's no iptables involvement?

Whether or not that's the case depends on the internals, but in the extreme case, nix could fork the drivers responsible for linux namespaces and make it such that only empty namespaces connected to nothing are supported.

It seems more useful to fix the Linux kernel, but perhaps there is no political interest in making Docker suck less.

Since I was curious, found myself looking into gathering network namespace numbers.
Here's a one-liner that uses "bench" to make this easy and produce nice graphs:

$ nix run -f channel:nixos-unstable bench -c bench "unshare -U -n true" -o unshare.html
[1 copied (3.5 MiB), 10.7 MiB DL]
benchmarking unshare -U -n true
time                 45.35 ms   (41.69 ms .. 48.89 ms)
                     0.975 R²   (0.942 R² .. 0.991 R²)
mean                 41.26 ms   (36.43 ms .. 43.99 ms)
std dev              6.663 ms   (3.749 ms .. 11.28 ms)
variance introduced by outliers: 63% (severely inflated)

Which also creates a report 'unshare.html' which looks like this: https://pste.eu/p/p6Pg.html

Above is my laptop on 4.9, here's a server runing 4.14.15: https://pste.eu/p/T9s4.html (~17ms)

Curiously I'm not seeing the crazy outliers reported earlier, including a number of runs of the posted "run this 100 times" script on various machines. Not sure what that's about.


Are network namespaces still the most likely/significant cause of slowness?

I use https://github.com/edolstra/benchmark for that :-)

I get ~13ms on kernel 4.14.32:

$ nix run -f ~/dev/nixpkgs bench -c bench "unshare -U -n true" -o unshare.html
benchmarking unshare -U -n true
time                 13.45 ms   (12.03 ms .. 15.07 ms)
                     0.941 R²   (0.890 R² .. 0.978 R²)
mean                 13.32 ms   (12.71 ms .. 13.96 ms)
std dev              1.663 ms   (1.301 ms .. 2.134 ms)
variance introduced by outliers: 60% (severely inflated)

On 4.15.15, I get:

$ nix run -f channel:nixos-unstable bench -c bench "unshare -U -n true" -o unshare.html   
benchmarking unshare -U -n true
time                 24.22 ms   (20.45 ms .. 27.83 ms)
                     0.923 R²   (0.852 R² .. 0.975 R²)
mean                 27.06 ms   (24.77 ms .. 29.90 ms)
std dev              5.522 ms   (3.768 ms .. 7.606 ms)
variance introduced by outliers: 77% (severely inflated)
$ nix run -f channel:nixos-unstable bench -c bench "nsenter -U -n -t $unsharedPid --preserve-credentials true" -o unshare.html                                                  
benchmarking nsenter -U -n -t 5440 --preserve-credentials true
time                 4.566 ms   (4.465 ms .. 4.673 ms)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 4.530 ms   (4.475 ms .. 4.606 ms)
std dev              195.9 μs   (151.5 μs .. 259.5 μs)
variance introduced by outliers: 23% (moderately inflated)

Where $unsharedPid is the pid of a persistent process that was created with unshare -U -n bash

So it looks like entering existing namespaces is about 6x faster than creating new ones.

Can somebody explain me what these numbers mean in practice?

If one unshare takes 24 ms on average, how many of those do we typically do to get to a bad total time as written in the issue description? Do we have to do one unshare per derivation build?

Yes, it's per derivation.

Then I'm missing the problem somehow; doesn't building a derivation usually take much longer than the 24 ms measured here?

It depends on the type of derivation. writeText and writeScript for example are fast and the overhead is not negligible. If Nix wants to compete with project-level build systems like Bazel then this is going to be a limitation.

@nh2 Currently, most derivations are large, but there are many situations where breaking things up more finely would be good. For example, I use nix for precompressing assets to be served in my company's web apps. To do this incrementally, it makes sense to have a derivation per file (or even per (file, compression method) pair), but for small files, that's quite slow. I doubt the sandboxing is the only overhead there, but it is definitely a non-trivial factor, given that, e.g. gzipping a 3kb file is ~1 ms.

It would be good to look at how Bazel does it as they are facing similar problems.

Thanks for the explanations!

Maybe we should use a selective approach until Linux namespaces are very fast. While sandboxing ver every derivation is certainly desirable, it would already be a huge benefit if we could, for starters, sandbox "the average build" of nixpkgs libraries and applications. For example, I'd be very happy to pay a 24 ms overhead if in turn my 5 hour Chromium build is guaranteed to be pure. But right now it's full sandboxing or no sandboxing.

Another point: The nsenter benchmark at https://github.com/NixOS/nix/issues/179#issuecomment-383395434 measures 4 ms mean time. However, we are already dangerously close to Linux's process startup overhead that this number probably is not very meaningful. For example, just running the help text with time nsenter --help > /dev/null takes anything between 1 and 4 ms on my computer.

We should probably benchmark whatever nsenter does in a loop in C to get meaningful numbers for that.

FWIW, here are the results on my machine (the same one I used for the prior benchmarks), for nsenter --help >/dev/null:

» nix run -f channel:nixos-unstable bench -c bench "nsenter --help >/dev/null" -o unshare.html 
[4 copied (3.8 MiB), 11.5 MiB DL]
benchmarking nsenter --help >/dev/null
time                 3.108 ms   (3.088 ms .. 3.126 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 3.222 ms   (3.188 ms .. 3.289 ms)
std dev              161.1 μs   (93.24 μs .. 275.8 μs)
variance introduced by outliers: 31% (moderately inflated)

And here it is for true:

benchmarking true
time                 2.470 ms   (2.452 ms .. 2.492 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.456 ms   (2.445 ms .. 2.469 ms)
std dev              37.67 μs   (31.60 μs .. 46.29 μs)

So sandboxing is about an order of magnitude slower than running a minimal command. I definitely agree that this amount of time is not important for most use cases today.

And building a stdenv.mkDerivation is also going to execute bash which stat(2) for rc and profile files all over the place.

@zimbatm Yes, but Nix does not require the use of stdenv.mkDerivation.

BTW on Linux 4.17 I get a 37% slowdown in the test mentioned in https://github.com/NixOS/nix/issues/179#issuecomment-273178956. That's a big improvement over the 742% slowdown in 2013...

Any idea of a good threshold for acceptable? I doubt it'll ever be zero cost, but purity-by-default is a big win IMO and I'd be willing to pay a slight cost on it. Especially since the benchmark you're citing mostly affects tiny derivations and not big builds. One even smallish build will completely eclipse a ton of small slowdowns on unit files and NixOS configuration files.

Having sandboxing turned on by default would be great. It would reduce the number of issues with nixpkgs submissions that don't compile and user reports. We'll be able to trim the PR and Issue templates. That being said, if nix is running inside of a docker container it won't work as docker containers don't support cgroups by default.

Back on the subject of sandboxing, is it possible to re-use sandboxes between runs? if sandboxes could be re-used then they could also be pooled where the pool size = maxJobs.

We must be sound now. We must compete with the likes of Bazel on granularity soon. That's how I see it.

@copumpkin I think the 37% slowdown is okay-ish, though obviously not ideal.

@zimbatm No, I don't think sandboxes can be reused. The main overhead seems to be setting up the mount namespace, which is necessarily different for each build (since they have different input closures). Of course, you could bind-mount all of /nix/store, but that reduces security a bit (since even if it has -wx permission, builders would be able to access path contents if they know the store path).

The main overhead seems to be setting up the mount namespace, which is necessarily different for each build (since they have different input closures).

Setting up the namespace or setting up the mounts? I.e., would reusing the namespace, and mounting/unmounting only the paths that differ (keeping the stdenv paths untouched) help in any appreciable way?

@7c6f434c Good question! I think we would need to benchmark bind mounting to see.

Another motivation to enforce the sandboxing is that we could get rid of the nixbld\d+ users. Each sandbox gets it's own pid namespace so they could all run with the same uid/gid. That would be great to limit the footprint nix has on non-nixos systems.

We-ell, some namespace bugs (kernel-level) could be mitigated by the fact that the real UID outside of namespace is different, so for maximum isolation we might still want to have these users…

In https://github.com/NixOS/nix/commit/ad1c827c0d7c96075e8e820fc66be5ea849497c9 I implemented automatic UID allocation from a range. You would still like to ensure that the UID range doesn't clash with any existing accounts, though it's unlikely people have UIDs in the range 872415232+...

Well, if the range is configurable it should be easy to move outside the ranges used by other tools; definitely simpler than listing eight build users in global passwd. Thanks.

@edolstra I wanted to implement sandboxing to be on for Docker after @garbas talk, but really that road leads back to Nix doing it by default for overall good experience. Given that we're at the okayish threshold now, and kernel 4.19 was released that will be next LTS, can we make sandboxing by default on? :)

Why docker? I missed the talk but intuitively it feels like a step backwards

@copumpkin just to enable sanboxing for https://github.com/NixOS/docker, since it helps sandbox networking during Nix builds.

Does the Nix sandboxing work inside of Docker now?

I believe it's worked for quite some time --
but requires permissions/capabilities that aren't
added by default. Don't remember the appropriate flags, sorry,
others might be able to say more.

On Fri, 02 Nov 2018 16:56:08 -0700, zimbatm notifications@github.com wrote:

Does the Nix sandboxing work inside of Docker now?

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/NixOS/nix/issues/179#issuecomment-435541624Non-text part: text/html

Oh sorry, I misread and thought you wanted to change our sandboxing mechanism to _use_ docker, rather than get docker to work from inside one of our sandboxes 😄 sorry!

Based on the talk (IIRC) I think it's actually neither of those xD,
but rather getting our sandbox (and the Nix "story") to work in Docker
(which to some extent is how people expect to "get started" these days).

On Fri, 02 Nov 2018 18:54:38 -0700, Daniel Peebles notifications@github.com wrote:

Oh sorry, I misread and thought you wanted to change our sandboxing mechanism to _use_ docker, rather than get docker to work from inside one of our sandboxes 😄 sorry!

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/NixOS/nix/issues/179#issuecomment-435551868Non-text part: text/html

oh, I see, thanks!

:tada:

https://github.com/NixOS/nix/issues/2759 A wildly different idea for maybe-even-faster sandboxing.

Was this page helpful?
0 / 5 - 0 ratings