This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/good-filesystem-for-the-nix-store/3566/12
How is the compression enabled? Is it not done transparently on the filesystem level?
A mount option. That's how most people use it, I believe.
Normally it's transparent, so Nix most likely does something unusual, possibly around the read-only remounts and mount namespaces. I'm affected even though nix.readOnlyStore = false;, but I suppose the inner read-write remount was a more likely culprit anyway.
How can Nix disable/ignore the compression, if it happens at the filesystem level?
Nix most likely does something unusual, possibly around the read-only remounts and mount namespaces.
Ah I see! Do you have an idea where in the code this might be happening?
I recently had an issue that was maybe related: I enabled zstd compression on /nix/store, which meant the kernels were compressed, but I was on grub 2.02, so in the end I couldn't boot anymore. Trying to force decompression via btrfs CLI didn't work, and I'm wondering if Nix was doing something weird with the mounts.
Seems like that only triggers when the store is read-only though, which wasn't my case (and seems like it wasn't @vcunat's case either).
Anyway, I wonder if the call to mount should preserve the mount options the filesystem is mounted with, but I don't see a way to do that with mount(2), so I must be missing something.
I suppose the inner read-write remount was a more likely culprit anyway
With nix.readOnlyStore = false there is no additional inner mount point for /nix/store. And newly substituted store paths still have no compression. Proof:
$ mount | grep "on / type btrfs"
/dev/mapper/vg-fs_tree on / type btrfs (rw,noatime,compress-force=zstd,ssd,space_cache,subvolid=274,subvol=/fs_tree)
$ mount | grep -c /nix/store
0
$ nix-shell -p adbfs-rootless
these paths will be fetched (3.78 MiB download, 21.67 MiB unpacked):
/nix/store/mkhq81nll904clsx020p42mqvsdawixd-platform-tools-28.0.1
/nix/store/v4y99snaxs97aahmqzw6048557g536xk-adbfs-rootless-2016-10-02
copying path '/nix/store/mkhq81nll904clsx020p42mqvsdawixd-platform-tools-28.0.1' from 'https://cache.nixos.org'...
copying path '/nix/store/v4y99snaxs97aahmqzw6048557g536xk-adbfs-rootless-2016-10-02' from 'https://cache.nixos.org'...
$ sudo compsize /nix/store/mkhq81nll904clsx020p42mqvsdawixd-platform-tools-28.0.1
Processed 781 files, 778 regular extents (778 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 23M 23M 23M
none 100% 23M 23M 23M
$ sudo compsize /nix/store/v4y99snaxs97aahmqzw6048557g536xk-adbfs-rootless-2016-10-02
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 92K 92K 92K
none 100% 92K 92K 92K
Right. I assumed the remount was done anyway, but that link to code seems to indicate otherwise.
Anyway, I wonder if the call to mount should preserve the mount options the filesystem is mounted with, but I don't see a way to do that with mount(2), so I must be missing something.
Constructing the new mount options from stat.f_flag seems easy enough and I expect it will be better than the current code, but by looking at docs I don't think advanced stuff like compression are expressed in this bitmap (needs confirmation). ATM I can't see how to manipulate that by C calls.
EDIT: man mount.8 says
It's also possible to change nosuid, nodev, noexec, noatime, nodiratime and relatime VFS entry flags by "remount,bind" operation. The another (for example filesystem specific flags) are silently ignored.
I enabled zstd compression on /nix/store
By which means? There are several.
which meant the kernels were compressed
Seems not to be as clear. In my case having compression=zstd property on /nix/store and compress-force=zstd mount option still did not affect NixOS _stock_ kernel. Only when I added a custom out-of-tree kernel module to boot.extraModulePackages grub2 2.02 failed booting NixOS.
Trying to force decompression via btrfs CLI didn't work
How exactly did you force this? Basically you need to recreate the affected files again. I'm not aware of other ways. Even btrfs filesystem defragment is unable to uncompress files:
-c[<algo>]
compress file contents while defragmenting. Optional argument selects the compression algorithm, zlib (default), lzo or zstd. Currently it’s not possible to select no compression.
Also, twiddling the compression attribute (chattr +c / chattr -c) of non-empty files makes no sense for existing data. Only new extents will be affected. I guess the same applies to btrfs compression property (btrfs property).
How is the compression enabled? Is it not done transparently on the filesystem level?
A mount option. That's how most people use it, I believe.
How can Nix disable/ignore the compression, if it happens at the filesystem level?
The logic is not quite easy. There are several conditions for compression to happen or not:
Compression to newly written data happens:
- always -- if the filesystem is mounted with -o compress-force
- never -- if the NOCOMPRESS flag is set per-file/-directory
- if possible -- if the COMPRESS per-file flag (aka chattr +c) is set, but it may get converted to NOCOMPRESS eventually
- if possible -- if the -o compress mount option is specified
Note, that mounting with -o compress will not set the +c file attribute.
So it looks the most reliable way to order btrfs compression is the compress-force mount option. This is what I use and still have no compression for substituted /nix/store paths.
How can Nix disable/ignore the compression ...?
So one way to ignore the mount compression option (i.e. compress=zstd) is the (abstract?) NOCOMPRESS flag which can be set with chattr -c or btrfs property set <FILE> compression none. Maybe nix first creates empty files and then clears compression file attribute before writing to it. It may be the reason for no compression. But I still do not understand how compress-force gets ignored.
Maybe it has relation with download -> decompression streaming used when getting /nix/store paths.
To see whether the remount of the /nix/store bind mount point changes the mount flags, you could try:
$ nix-build -E 'with import <nixpkgs> {}; runCommand "foo" {} "grep /nix/store /proc/mounts"'
...
/dev/disk/by-uuid/f44f9cf9-b347-4bd8-ba99-854fb560bec6 /nix/store ext4 rw,relatime 0 0
...
$ nix-build -E 'with import <nixpkgs> {}; runCommand "foo" {} "grep /nix/store /proc/mounts"'
these derivations will be built:
/nix/store/h2nq2gxz3dv57i15lkh5zy426pg67shq-foo.drv
building '/nix/store/h2nq2gxz3dv57i15lkh5zy426pg67shq-foo.drv'...
/dev/mapper/vg-fs_tree / btrfs rw,noatime,compress-force=zstd,ssd,space_cache,subvolid=460,subvol=/fs_tree/nix/store/h2nq2gxz3dv57i15lkh5zy426pg67shq-foo.drv.chroot 0 0
/dev/mapper/vg-fs_tree /bin/sh btrfs rw,noatime,compress-force=zstd,ssd,space_cache,subvolid=460,subvol=/fs_tree/nix/store/zcbx2vf7zj9ih0fzvpm80cn8bb3lbb94-busybox-1.30.1/bin/busybox 0 0
/dev/mapper/vg-fs_tree /nix/store/3kqc2wmvf1jkqb2jmcm7rvd9lf4345ra-coreutils-8.31 btrfs rw,noatime,compress-force=zstd,ssd,space_cache,subvolid=460,subvol=/fs_tree/nix/store/3kqc2wmvf1jkqb2jmcm7rvd9lf4345ra-coreutils-8.31 0 0
/dev/mapper/vg-fs_tree /nix/store/4l35nqpaiwzhfafrpby1xf7kfik7ai7c-gcc-8.3.0-lib btrfs rw,noatime,compress-force=zstd,ssd,space_cache,subvolid=460,subvol=/fs_tree/nix/store/4l35nqpaiwzhfafrpby1xf7kfik7ai7c-gcc-8.3.0-lib 0 0
...
There are much more similar lines. All of them retain my compress-force=zstd btrfs mount option.
What caught my eye is:
subvol=/fs_tree/nix/store/<path> option added for every store path listed/nix/store mount point contrary to the ext4 provided example@edolstra: actually the problem does not happen for local builds but only for files from binary cache, so I don't think this is a good approach to debug that part. I expect we got carried away by ::makeStoreWritable().
Also, it's worth noting that /nix/store/*.drv files do honor btrfs compression.
And I think all other files do that are direct descendants of /nix/store/ (i.e. find /nix/store -maxdepth 1 -type f), size of which can be reduced by the active compression algorithm. (Cannot prove it quickly.) btrfs heuristics can keep file with no compression if compression makes it bigger.
Currently, I have to remount /nix/store read-write and then execute:
# btrfs filesystem defragment -r -czlib /nix/store
What's the meaning of mounting each store path individually as subvolume when /nix/store is on btrfs (when inside nix-build)?
The meaning is that only explicit dependencies are visible. (Or at least that's one I know about.)
And I think all other files do [honor compression] that are direct descendants of /nix/store/ (i.e. find /nix/store -maxdepth 1 -type f).
Very likely true. True for paths fetched with nix-prefetch-url and /nix/store/*.drv.
nix-prefetch-url results in compressed file, however compression attribute is unsetconsole
$ nix-prefetch-url file:///tmp/test.tar
[3.5 MiB DL]
path is '/nix/store/jmagpnxvlgww7f5hbmwmpfrpg1iwxw9a-test.tar'
1nic4fvhfxnpa0f9hvrk9wv9imhmh1sdvj38fvg62g38nmj7nz5h
$ sudo compsize /nix/store/jmagpnxvlgww7f5hbmwmpfrpg1iwxw9a-test.tar
Type Perc Disk Usage Uncompressed Referenced
TOTAL 26% 960K 3.5M 3.5M
zstd 26% 960K 3.5M 3.5M
$ lsattr /nix/store/jmagpnxvlgww7f5hbmwmpfrpg1iwxw9a-test.tar
-------------------- /nix/store/jmagpnxvlgww7f5hbmwmpfrpg1iwxw9a-test.tar
nix-prefetch-url --unpack results in a directory tree without compression, compression attribute is not setconsole
$ nix-prefetch-url --unpack file:///tmp/test.tar
unpacking...
[3.5 MiB DL]
path is '/nix/store/xfmqixlxbw16z31nz4lfri1cbzm2lxx0-test.tar'
1kvdcbr78bhbw2zngwi69h8y9dsfm3kvlcxym8lfp9k25c19f0a5
$ sudo compsize /nix/store/xfmqixlxbw16z31nz4lfri1cbzm2lxx0-test.tar
[sudo] password for alex:
Processed 799 files, 796 regular extents (796 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 5.1M 5.1M 5.1M
none 100% 5.1M 5.1M 5.1M
$ lsattr /nix/store/jmagpnxvlgww7f5hbmwmpfrpg1iwxw9a-test.tar
-------------------- /nix/store/jmagpnxvlgww7f5hbmwmpfrpg1iwxw9a-test.tar
Also, nix-prefetch-url creates temporary directory in /tmp with the fetched contents which all do honor btrfs compression (both attribute and factual compression) (boot.tmpOnTmpfs = false).
The value of nix.readOnlyStore does not affect results. btrfs mounted with compress-force=zstd.
I come to a conclusion that only "dumped" (opposed to locally built) directories inside /nix/store are affected by the issue. But it's strange why behavior is different whether the path is directory or not.
nix-prefetch-url and nix-store --realise behave differently in case fixed store path is a file with the latter ignoring compression anyway:
$ nix-store --realise /nix/store/d2l3h1aj1s0pfvvimm802lmg4jm34qs0-qtbase-openssl_1_1.patch
these paths will be fetched (0.02 MiB download, 0.17 MiB unpacked):
/nix/store/d2l3h1aj1s0pfvvimm802lmg4jm34qs0-qtbase-openssl_1_1.patch
copying path '/nix/store/d2l3h1aj1s0pfvvimm802lmg4jm34qs0-qtbase-openssl_1_1.patch' from 'https://cache.nixos.org'...
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/d2l3h1aj1s0pfvvimm802lmg4jm34qs0-qtbase-openssl_1_1.patch
$ sudo compsize /nix/store/d2l3h1aj1s0pfvvimm802lmg4jm34qs0-qtbase-openssl_1_1.patch
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 172K 172K 172K
none 100% 172K 172K 172
Perhaps, we can narrow down to addToStoreFromDump.
Hi there,
I just started using btrfs myself and this has been annoying me as well.
Like everyone else I started doubting the remount done to make /nix/store rw but the answer is (unfortunately) much simpler than that: btrfs doesn't compress if you fallocate the file early and... we do :/
Tested with that program (change the first if 0 if you want to use a path in /nix/store, the second if 1 to try without fallocate to confirm file is compressed without fallocate)
#define _GNU_SOURCE
#include <sys/mount.h>
#include <sched.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define ERROUT(fmt, args...) { printf(fmt, ##args); exit(1); }
int main(int argc, char *argv[]) {
char *path;
if (argc < 2)
ERROUT("usage: %s filename", argv[0]);
path = argv[1];
#if 0
if (unshare(CLONE_NEWNS) == -1)
ERROUT("unshare %d", errno);
if (mount(0, "/nix/store", "none", MS_REMOUNT | MS_BIND, 0) == -1)
ERROUT("mount %d\n", errno);
#endif
int fd;
fd = open(path, O_CREAT|O_WRONLY|O_EXCL, 0644);
if (fd < 0)
ERROUT("open %d\n", errno);
#define BUFSIZE (1024*100)
ssize_t n;
char buf[BUFSIZE];
memset(buf, 0, BUFSIZE);
#if 1
if (fallocate(fd, 0, 0, BUFSIZE) < 0)
ERROUT("fallocate %d", errno);
#endif
if ((n = write(fd, buf, BUFSIZE)) != BUFSIZE)
ERROUT("write %zd %d\n", n, errno);
fsync(fd);
close(fd);
char *compsize;
if (asprintf(&compsize, "compsize %s", path) < 0)
ERROUT("asprintf %d\n", errno);
(void)system(compsize);
free(compsize);
if (unlink(path) < 0)
ERROUT("unlink %d", errno);
}
I'm not sure what the way forward is from there; with that info in hand it's fairly obvious where in the code it is (not so many fallocate in there, it's most likely https://github.com/NixOS/nix/blob/3fcbe30eea1c7a9ea7c2431fcb68b7fdff0dcf0e/src/libstore/local-store.cc#L138 ); but that fallocate is there to improve performance as we know the file size beforehand.
OTOH I can understand why btrfs doesn't compress if we fallocated the space: it just does what it's being told, so it's not really a btrfs bug either, just a non-obvious behaviour pattern.
Do we have a way to add options to control nix behaviour and inhibit that fallocate only if requested? Or should we check the filesystem type and disable on btrfs?
By the way how does zfs behave with that?
(just answering myself for zfs: its fallocate implementation is ... well... rather lazy, it doesn't actually preallocate anything, and compression works even with fallocate if it is supported (currently only in 2.0 rc, before that you'd just get ENOTSUP) -- so that'd explain nobody complained about a similar behaviour with zfs and we can focus on btrfs only for now)
We could add an option to disable preallocation, though I guess it would be better to detect btrfs compression and automatically disable preallocation.
It's also possible preallocation isn't really worth it anymore on modern filesystems so maybe we can get rid of it altogether.
@martinetd The preallocation is done in preallocateContents() in src/libutil/archive.cc. A new setting to control preallocation could be added to struct ArchiveSettings at the top of that file.
Ah, right it must be that one; thanks.
I agree autodetection would be better than a plain setting, I'm not sure the settings struct can be initialized with a function call? (it'd be good to check only once when nix-daemon starts and not everytime a file needs to be written... In LocalStore::LocalStore's init would be everytime the store is solicited I guess it's a start if it's legal from an abstraction point of view?)
Anyway; I'll try to just add a plain setting first but I need to figure out how to override the nix package in my environment first -- Or could I just kill the existing nix-daemon and start a new one from a nix-shell as described in the hacking page?.. Well, I'll give it a shot.
Thanks for the pointer, I've opened #4057 for the option. I don't consider that a fix for this issue unless the fallocate is automatically toggled off for btrfs if option isn't specified (or we get rid of it altogether) -- I honestly think the option makes sense for non-cow filesystems, knowing the file size in advance allows for better filesystem fragmentation in general so I'd say it's a good thing.
At the very least if we merge the option it'll allow for easier testing to see if toggling the option makes any difference on performance in more scenarii and provide some data point to decide; I wouldn't be comfortable just deciding to remove it without benchmark (but well it's just my opinion :P)
Thanks for the quick merge!
So as a workaround folks will be able to set nix.extraOptions = "preallocate-contents = false"; in their configuration when that commit gets there.
I'm copying the relevant part from the PR to keep the discussion about autodetection here; that can wait a bit longer.
It's easy to call statfs(realStoreDir.c_str(), &statfsbuf) and check if statfsbuf.f_type == BTRFS_SUPER_MAGIC ; the question is how to make the setting "auto" ?
I'm thinking of:
- changing from a Bool to a tristate true/false/auto enum; but would that work with the command-line --option switch? (I'm not too comfortable about introducing such a new setting type...)
- if option is set to auto, check in localStore next to makeStoreWritable (e.g. some new checkFsQuirks method?), and if btrfs force the option somehow.. globalConfig.set? not really good with code compartimentization :/
Needless to say I don't like what I just suggested very much, so suggestions or comments welcome.
For an example of a tristate enum, have a look at the SandboxMode enum in globals.hh (and its related implementation): https://github.com/NixOS/nix/blob/8ee779da7dabeef935ec61667120aa26743e472a/src/libstore/globals.hh#L14
And for new enums also check out C++ enum struct, which has the advantage of namespacing and fewer implicit coercions. :)
Thanks for the pointers. I've opened #4094 as a draft last week but I assume drafts just don't get as much attention -- would you be able to answer on where to put a generic tristate option if it's desirable, or should I just make the option specific to preallocation and roll with it with code as currently written (would rename variables a bit then)?
Cheers.
Thanks for the pointers. I've opened #4094 as a draft last week but I assume drafts just don't get as much attention -- would you be able to answer on where to put a generic tristate option if it's desirable, or should I just make the option specific to preallocation and roll with it with code as currently written (would rename variables a bit then)?
It's been a couple of weeks -- I've honestly had forgotten about it until it got mentioned on discourse today :P
Given the lack of answer I'll just try to make the option generic somewhere possibly appropriate and remove the draft tag; should update within a couple of days.
Now that this in nixos-unstable. Should we do a clean install to have everything compressed or would running a btrfs command be good enough?
I think a btrfs filesystem defragment -r -czstd /nix/store should be enough. You may have to mount the FS /nix lives on again onto a separate mountpoint; not sure how defrag interacts with our read-only mount.
Right. You can't run the command on a RO mount. You could remount RW temporarily (store is normally root-owned anyway) or use nix.readOnlyStore = false... many ways.
Right. You can't run the command on a RO mount. You could remount RW temporarily (store is normally root-owned anyway) or use
nix.readOnlyStore = false... many ways.
(even easier if the mountpoint is /nix and not /nix/store in the first place)
Right. You can't run the command on a RO mount. You could remount RW temporarily (store is normally root-owned anyway) or use
nix.readOnlyStore = false... many ways.
Or just mount it rw on a separate mountpoint in addition ?
I just did sudo mount /nix/store -o remount,rw. The btrfs command took 1h9m20s.
For the curious, here's the compsize before and after:
before
❯ sudo compsize -x /nix/store
Processed 1500696 files, 1481452 regular extents (1481474 refs), 23053 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 99% 78G 79G 79G
none 100% 78G 78G 78G
zstd 27% 281M 1.0G 1.0G
after
❯ sudo compsize -x /nix/store
Processed 1500696 files, 1062024 regular extents (1062024 refs), 918726 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 41% 31G 77G 77G
none 100% 5.9G 5.9G 5.9G
zstd 36% 26G 71G 71G
I might have cleaned a bit and optimized the store too.
Most helpful comment
It's been a couple of weeks -- I've honestly had forgotten about it until it got mentioned on discourse today :P
Given the lack of answer I'll just try to make the option generic somewhere possibly appropriate and remove the draft tag; should update within a couple of days.