Linux: Enable user namespaces and seccomp

Created on 19 Oct 2015  路  22Comments  路  Source: raspberrypi/linux

Hello.

I am trying to use Firejail[0] on my RaspberryPi running Raspbian but it shows two warnings:

Warning: user namespaces not available in the current kernel.
Warning: seccomp disabled, it requires a Linux kernel version 3.5 or newer.

Can you enable this features on the next kernel release?

My current uname -a is:
Linux RaspberryPi 4.1.7+ #817 PREEMPT Sat Sep 19 15:25:36 BST 2015 armv6l GNU/Linux

Thanks in advance,
C茅sar

[0] https://l3net.wordpress.com/projects/firejail/

Waiting for internal comment

All 22 comments

It is also worth noting that both of these are required for Chromium and Google Chrome to properly implement sandboxing of their various sub-components.

@CTassisF has your issue been resolved? If so, please close this issue. Thanks.

User namespaces are disabled from the current kernel:

root@pi4:~# uname -r
4.4.13-v7+
root@pi4:~# modprobe configs
root@pi4:~# zgrep -E 'CONFIG_(USER_NS|SECCOMP)' /proc/config.gz
# CONFIG_USER_NS is not set
CONFIG_SECCOMP=y

With all kernel options that aren't simply loadable modules we are concerned about increased kernel size and reduced performance. If you were to present "before and after" comparisons of free memory and performance (using some standard benchmarks) it may strengthen your case.

@pelwell what kind of benchmark are you after? Disk block IO, file system IO, CPU cycles, etc.?

I want seccomp for Docker on the Raspberry Pi, it is an important protection. And user namespace as well, this allow for mapping user ID within a container to different user ID on the host. This is important for example for user ID 0 (aka root)! This is not limited to Docker, but LXC and LXD requires user namespace for running unprivileged containers.

So I'm ready to measure memory before and after, but for benchmark, what do you expect?

How about running dd from /dev/zero to /dev/null with three block sizes - 1, 4k and 1024k - and a count set to make each one take at least 10 seconds? I'm concerned about performance for non-sandboxed processes, but I would be curious to compare with sandboxed as well, so a 3x3 grid of results (MB/s for each of three sizes in a kernel without seccomp, with seccomp, and sandboxed) would be great.

Ok @pelwell I can do this. Give me a week in order to build the kernel and do the benchmarks. I'm a young dad so I need time! ;-)
If time allows I will try to activate the AppArmor LSM as another possible benchmark.

Hi @pelwell

I haven't finish testing but I can already give a status update on the following:

  • Recompilation of the kernel with seccomp and user namespace was a success;
  • There was no noticeable increase in memory (I've been using dmesg | grep Memory and basically the Kernel code, rwdata, etc. are all less than 1% difference, it is in the order of a few KB);
  • Early tests with dd are identical for 1 and 1024k, but a slight decrease of performance with 4k (from 1.1GB/s to 0.9GB/s) however using apache bench (from my desktop) to a nginx static site (on the Raspberry Pi) there was no difference or improved performance (up to 22% in request/s)
  • BUT (and it is a big one), seccomp support for docker is only possible with the user space tool at version 2.2.1 or above which excludes Debian Jessie. Docker provides static binary for Debian Jessie with seccomp support (probably the seccomp lib is statically linked to this binary) but only for x86 (32 and 64 bit), not for ARMv7. So I would need to get the source and compile myself the seccomp user space binaries and then docker binaries. Too much work for me right now :-(
  • I will not drop this issue. Before this weekend, I will try to get a new Kernel with only the user namespace and AppArmor compiled in. That should be already a big improvement in terms of container security. With this new kernel I can perform again the benchmark as proposed.

PS: I've added a small benchmark which run a static web page (a simple HTML page) with nginx. The benchmark is done from my desktop using apache bench, my desktop is powerful enough to overcome the rpi if necessary ;-) I will try to test also with a generated website maybe something like ghost or wordpress, which ever is easier to install on the rpi and in a container. I will give more details when I'm done with testing.

That's looking good so far. The dd test is almost a worst case, with very large numbers of userspace-to-kernel round-trips. As such, a 10-20% performance drop doesn't sound too bad, but others may disagree - that isn't a final decision.

It seems I cannot test further, I was successful at compiling a new kernel with AppArmor support and to run it. It works well but not with Docker (https://github.com/docker/docker/issues/27351), I'm investigating with some support from Docker.

Anyway, the impact of having AppArmor installed has decreased the performance a bit further. Now with AppArmor, seccomp filtering and user namespace activated I am getting the following results:

  • The dd '1 byte' test:

    • Defaut kernel: 521 kB/s

    • My Kernel: 411 kB/s (-21%)

  • The dd '4kB' test:

    • Default kernel: 991 MB/s

    • My kernel: 854 MB/s (-16%)

  • The dd '1MB' test:

    • Default: 1.1GB/s

    • Mine: 1.1GB/s

My changes to the default config are in this branch on a fork I made: https://github.com/jcberthon/linux/tree/rpi-sec-apparmor-seccomp-userns

I will write later to described other tests I have conducted and also if I have any update on Docker with AppArmor on ARM.

Just a quick update. The problem of Docker when AppArmor is active on ARM has been solved and merged. The fix will be available in Docker 1.12.3. I've installed the patch and can now run Docker successfully on Raspberry Pi with my improved kernel.

In the coming days I'll provide a pull request with the changes. So if it is decided that this issue should be fixed, it will simply a matter of reviewing and possibly merging my changes.

I now have to find the time to do some benchmarking inside the container with the official and my kernel. Although trivial I'm lacking time, so do not expect much feedback from me in the next 10-20 days.

Thanks for the update - take your time, we'll still be here.

Just for information: the CONFIG_USER_NS=y is set in rpy-4.6.y and newer branches since Jul 28.

See these commits on 4.6 and 4.7 by @popcornmix: 4.6 https://github.com/raspberrypi/linux/commit/39f02ddd541cf79a7baba2c9c3a0a7fd64dde270#diff-d578de903015b334ab3f9f22d7055058 and 4.7 https://github.com/raspberrypi/linux/commit/c2b66ab6c9239e7aedc9946e16696f5b8ab669ca#diff-d578de903015b334ab3f9f22d7055058

And it is in the baseline config from 4.8 on.

My other proposed changes are not included in newer branches (4.5 to 4.9)

Hi

I have concluded the benchmarking using dd as suggested by @pelwell.

I have tested dd in 3 settings with 1 byte (test1), 4kB (test2) and 1MB (test3) blocks and configure it so that each tests run within 20-30s. Each tests was run 3 times and I computed the average. My platform was a Raspberry Pi 2 headless (using SSH, no monitor or keyboard or X11).

The tests were run in 5 different environments, with the Raspbian vanilla kernel (4.4.27-v7+), with the Raspbian kernel configs and the User Namespace and SECCOMP filters active, and then adding also AppArmor. So 3 different kernels, and on the vanilla kernel and the kernel with UserNS+SECCOMP-filters+AppArmor, I run the benchmark in a Docker container. So in total that's 5 environments.

For Docker, I used 1.12.3-rc1 which contains a patch allowing it to run on ARM with AppArmor, and I used the Debian:jessie image from armhf (https://hub.docker.com/r/armhf/debian/). Note that since yesterday the final 1.12.3 has been published, but I did not retest it.

tldr; Performance impact is +-7% when using UserNS+SECCOMP-filters compare to the vanilla Kernel. But it is up to -23% impact when using UserNS+SECCOMP-filters+AppArmor compare to the vanilla Kernel. Within Docker, the benchmark is always about 2% faster than on the host itself.

Detailed results:

| Bench | Vanilla | Vanilla Docker | Impact | +UserNS +SECCOMP | Impact | +AppArmor | Impact | +AA Docker | Impact wrt host | Impact wrt Vanilla Docker |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Test1 (kB/s) | 507,33 | 516,33 | 1,77聽% | 508,00 | 0,13聽% | 395,00 | -22,14聽% | 399,33 | 1,10聽% | -22,66聽% |
| Test2 (MB/s) | 1.014,13 | 1.026,13 | 1,18聽% | 935,67 | -7,74聽% | 771,00 | -23,97聽% | 860,67 | 11,63聽% | -16,13聽% |
| Test3 (MB/s) | 1.058,13 | 1.092,27 | 3,23聽% | 1.126,40 | 6,45聽% | 1.126,40 | 6,45聽% | 1.126,40 | 0,00聽% | 3,13聽% |

The columns named "Impact" are the amount in percent of change between the previous column and the baseline which is the vanilla Raspbian kernel, except when specified otherwise.

Conclusion: including UserNS and SECCOMP filters does not seem to have much impact. Activating AppArmor can produce up to 22% performance impact in worth case scenarios, but in normal use, the impact should not be felt. Using these flags has not impacted the Kernel stability, my Raspberry Pi has been up and running during the last weeks with the self generated kernels and I did not have a single application or system crash or unexpectedly not running.

@popcornmix @pelwell Not a huge impact to performance, do we want to include this?

Yes, please? Why the heck not?

It seems this issue can be closed since the requested changes are present in the latest kernel:

$ cat /etc/issue; uname -r; apt-cache policy raspberrypi-kernel; zgrep 'SECCOMP\|_NS=' /proc/config.gz
Raspbian GNU/Linux 8 \n \l

4.9.35-v7+
raspberrypi-kernel:
Installed: 1.20170703-1
Candidate: 1.20170703-1
Version table:
* 1.20170703-1 0
500 http://archive.raspberrypi.org/debian/ jessie/main armhf Packages
100 /var/lib/dpkg/status
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_NF_CONNTRACK_NETBIOS_NS=m

Hi @iam-TJ

Very strange as it is not visible in the config file: https://github.com/raspberrypi/linux/blob/rpi-4.9.y/arch/arm/configs/bcm2835_defconfig

Perhaps it is now automatically included by other flags.

That's now almost a year that I'm maintaining and running on my own build Kernel. So I can't really check.

I believe that this stuff is now included in our standard kernel. Closing.

Hi @JamesH65

I just checked again, I have now another Raspberry Pi and I installed a clean Raspbian. When checking if all the flags in my Push Request (PR) are there, that is not the case but this is true that the SECCOMP one are now activated.

Here is the output:

$ cat /etc/issue; uname -r; sudo modprobe configs; zegrep "SECCOMP|_NS=|CG_|CGROUP|APPARMOR" /proc/config.gz
Raspbian GNU/Linux 9 \n \l
4.14.34-v7+
CONFIG_CGROUPS=y
# CONFIG_MEMCG_SWAP is not set
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_SLUB_MEMCG_SYSFS_ON is not set
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_SECCOMP=y
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
CONFIG_NET_CLS_CGROUP=m
# CONFIG_CGROUP_NET_PRIO is not set
CONFIG_CGROUP_NET_CLASSID=y
# CONFIG_TCG_TPM is not set

Compare to my PR, we can see that the SECCOMP and NS (name spaces) are now set. However most control groups (CGROUP or MEMCG) are still not set.

For instance when running Docker, it is not possible to support many resource control. When doing docker info at the end it warns;

WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support

With my PR, Docker is happy. But this is not limited to Docker, other container technologies (rkt, cri-o, Kubernetes, etc.) make use of them and even things like systemd can use them (and potentially, but I could be mistaken, snap and flatpak could use them).

I could use the issue #1605 to update my PR and get it merged. Or you could re-open this issue and I update my PR. But before I am putting some effort in this PR, will it be considered? (I'm asking because I have 4 very young kids and my free time is often very limited)

Probably best on another PR/Issue, this specific issue (NS and SECCOMP) on Firejail appears to be solved.

Alright, and #1605 is also solved w.r.t. systemd 231. So I will create a new issue and PR. Thank you for the feedback and advice.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ensarkarabudak picture ensarkarabudak  路  7Comments

dkerr64 picture dkerr64  路  7Comments

mi-hol picture mi-hol  路  8Comments

incyi picture incyi  路  9Comments

kucharskim picture kucharskim  路  7Comments