Firejail: Document the high-level architecture of firejail

Created on 17 Dec 2016  路  8Comments  路  Source: netblue30/firejail

I mean it'd be nice to have a doc describing:
0 what threat model we have;
1 what isolation technologies and their components are (not) used and why: lxc, lxd, docker, cgroups, different filesystems, firewalls;
2 how exactly they are used and what the goals to use each of them are;
3 in what sequence they are used, what is nested in what;
4 plans on what technology it should/canworth//would be nice to use and what the gains are;
5 what the alternatives to the technologies already used are and what their pros and cons for firejail are;
6 why we use C instead of bash scripts and cli tools.

enhancement

Most helpful comment

0 what threat model we have;

I can't comment on this specifically.

1 what isolation technologies and their components are (not) used and why: lxc, lxd, docker, cgroups, different filesystems, firewalls;
2 how exactly they are used and what the goals to use each of them are;
3 in what sequence they are used, what is nested in what;

LXD/LXC and Docker aren't sandbox technologies, they're container software that happens to use sandboxing technology. Containers require sandbox technology, but sandboxes (like firejail) aren't inherently containers because they don't require partitioning of resources (nad firejail explicitly does not partition system resources).

Last I checked, firejail uses a combination of:

  • Seccomp-BPF. This is what does the system call filtering, and explaining how it works is well beyond the scope of such a document. It's used because it's the only good option (almost anywhere in fact).
  • Linux namespaces. These provide most of the actual isolation. Without namespaces, almost none of what firejail does would be possible, including the network isolation, IPC isolation, and filesystem isolation.
  • Independent filesystems. Pretty self explanatory.

Notably missing from this list are cgroups, and they're not involved for a couple of reasons:

  1. They aren't usable by non-root users and require writing to arbitrary files that may be located almost anywhere on the system. Using them would therefore make it easier to screw up the system itself with a badly configured profile.
  2. Systemd expects to be the only thing manipulating them.
  3. They don't provide isolation in the same sense that most people would expect out of something like firejail. Firejail is oriented towards limiting access, cgroups are oriented towards limiting resource usage.

As far as the nesting, I'm pretty sure the order is:

  1. Namespaces.
  2. Independent filesystems.
  3. Seccomp-BPF

Namespaces have to come first because you need an independent mount namespace to be able to manipulate the filesystems without affecting the rest of the system. Seccomp-BPF has to come last, since it may be filtering some of the system calls needed to set everything up.

4 plans on what technology it should/canworth//would be nice to use and what the gains are;

I can't exactly comment on this either.

5 what the alternatives to the technologies already used are and what their pros and cons for firejail are

From a practical perspective, when discussing Linux, there really aren't any alternatives to any of the things Firejail uses if you want it to be usable in the same way. In theory, it's possible with the VServer functionality in iptables to do the same kind of network isolation, but that's way more difficult to set up and provides less complete isolation. There really aren't any practical alternatives to Seccomp-BPF or the other stuff that firejail uses though.

6 why we use C instead of bash scripts and cli tools.

This is actually really easy to explain, but begets a discussion all it's own. Among other reasons though:

  1. About half the stuff that firejail does is impractical (or completely impossible) to do from shell script. At least some of it (Seccomp-BPF) isn't even possible without a lot of work from most other interpreted languages.
  2. Shell script is horribly inefficient. For something like firejail to be practical, it needs to be as efficient as possible.
  3. It's somewhat easier to write secure C code than secure shell script (mostly because of better verification tooling).
  4. C code is a lot more portable than shell script. Bash has had a lot of odd and subtle changes over the years, and you can't exactly be sure what version a given system will have. C on the other hand changes almost never, and you can easily keep using old standard versions without any effort.

All 8 comments

6 why we use C instead of bash scripts and cli tools.

Man I'm a great fan of bash but firejail in bash script? O_o

I was wondering the same related issue, there's need for some architecture documentation.

0 what threat model we have;

I can't comment on this specifically.

1 what isolation technologies and their components are (not) used and why: lxc, lxd, docker, cgroups, different filesystems, firewalls;
2 how exactly they are used and what the goals to use each of them are;
3 in what sequence they are used, what is nested in what;

LXD/LXC and Docker aren't sandbox technologies, they're container software that happens to use sandboxing technology. Containers require sandbox technology, but sandboxes (like firejail) aren't inherently containers because they don't require partitioning of resources (nad firejail explicitly does not partition system resources).

Last I checked, firejail uses a combination of:

  • Seccomp-BPF. This is what does the system call filtering, and explaining how it works is well beyond the scope of such a document. It's used because it's the only good option (almost anywhere in fact).
  • Linux namespaces. These provide most of the actual isolation. Without namespaces, almost none of what firejail does would be possible, including the network isolation, IPC isolation, and filesystem isolation.
  • Independent filesystems. Pretty self explanatory.

Notably missing from this list are cgroups, and they're not involved for a couple of reasons:

  1. They aren't usable by non-root users and require writing to arbitrary files that may be located almost anywhere on the system. Using them would therefore make it easier to screw up the system itself with a badly configured profile.
  2. Systemd expects to be the only thing manipulating them.
  3. They don't provide isolation in the same sense that most people would expect out of something like firejail. Firejail is oriented towards limiting access, cgroups are oriented towards limiting resource usage.

As far as the nesting, I'm pretty sure the order is:

  1. Namespaces.
  2. Independent filesystems.
  3. Seccomp-BPF

Namespaces have to come first because you need an independent mount namespace to be able to manipulate the filesystems without affecting the rest of the system. Seccomp-BPF has to come last, since it may be filtering some of the system calls needed to set everything up.

4 plans on what technology it should/canworth//would be nice to use and what the gains are;

I can't exactly comment on this either.

5 what the alternatives to the technologies already used are and what their pros and cons for firejail are

From a practical perspective, when discussing Linux, there really aren't any alternatives to any of the things Firejail uses if you want it to be usable in the same way. In theory, it's possible with the VServer functionality in iptables to do the same kind of network isolation, but that's way more difficult to set up and provides less complete isolation. There really aren't any practical alternatives to Seccomp-BPF or the other stuff that firejail uses though.

6 why we use C instead of bash scripts and cli tools.

This is actually really easy to explain, but begets a discussion all it's own. Among other reasons though:

  1. About half the stuff that firejail does is impractical (or completely impossible) to do from shell script. At least some of it (Seccomp-BPF) isn't even possible without a lot of work from most other interpreted languages.
  2. Shell script is horribly inefficient. For something like firejail to be practical, it needs to be as efficient as possible.
  3. It's somewhat easier to write secure C code than secure shell script (mostly because of better verification tooling).
  4. C code is a lot more portable than shell script. Bash has had a lot of odd and subtle changes over the years, and you can't exactly be sure what version a given system will have. C on the other hand changes almost never, and you can easily keep using old standard versions without any effort.

@Ferroin : Thanks - that's a very good explanation, IMHO!

Regarding

It's somewhat easier to write secure C code than secure shell script

Do you think that rewriting Firejail in Rust would be beneficial in order to make it even more secure? Rewriting it from scratch would be a lot of work, of course, and certainly not reasonable for @netblue30. I wonder, though, if it would be worth considering once corrode is mature enough.

I really don't know. I have very limited knowledge of Rust (I'm mostly a Python person myself, the primary reason I know C is more for my resume (although it has been useful on quite a few occasions)). Firejail has (from what I can see) very well written code in terms of secure design, so I don't think that much could be gained by using Rust. The two big things to really worry about with firejail or any sandbox tool for Linux are the kernel (if the kernel side has bugs, you generally can't do anything about it from userspace, and it's generally a much bigger security issue) and the high level structure.

@Ferroin : Thanks for your reply!

Firejail has (from what I can see) very well written code in terms of secure design, so I don't think that much could be gained by using Rust.

That's good to read!

The two big things to really worry about with firejail or any sandbox tool for Linux are the kernel (if the kernel side has bugs, you generally can't do anything about it from userspace,

Yes, agreed, but seccomp-bpf mitigates this problem to some extent as it reduces the attack surface of the kernel (unless that part of the code is broken itself).

(EDIT: I think there was once a kernel bug using the futex syscall)

Independent filesystems. Pretty self explanatory.

I meant "Why was overlayfs chosen from a bunch of different cow FSs (unionfs, aufs)?"

@KOLANICH Ah, I misunderstood that that's what you meant. That's a pretty easy one too though, OverlayFS is supported upstream as part of the mainline kernel, so you can depend on it being in pretty much every distro with a new enough kernel, AUFS and UnionFS are both out-of-tree modules, so they lag a bit relative to kernel releases and aren't guaranteed to be around on any arbitrary distro.

Also, just to clarify, the term you're looking for is 'overlay filesystem', not 'COW filesystem'. Overlay filesystems are inherently COW, but not all COW filesystems are overlay filesystems (both ZFS and BTRFS, among others, are COW, but not overlay filesystems).

@curiosity-seeker Yeah, seccomp-BPF does help, but like you said, it's not bulletproof. There have been many bugs over the years in system calls that functionally can't be blocked by firejail simply because too many things need them (such as futex(), which is used all over the place by the glibc threading implementation).

I think this can be wrapped into #2090.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Vincent43 picture Vincent43  路  3Comments

crass picture crass  路  3Comments

kmotoko picture kmotoko  路  3Comments

fl-chris picture fl-chris  路  4Comments

ghost picture ghost  路  3Comments