There are many tests in nixos to test some functionality but only a few of them test activating or deactivating that functionality. Because of that, NixOS tends to be "don't forget to reboot machine after nixos-rebuild switch"
Specifically, avahi service works, but enabling it doesn't https://github.com/NixOS/nixpkgs/issues/19034 . Fix is proposed (https://github.com/NixOS/nixpkgs/pull/20871), but I think there should be an automated way to test enabling/disabling/upgrade of a service.
I tried some time ago to write such test (for https://github.com/NixOS/nixpkgs/pull/15815), but was hit with immutable /nix/store in test machine, so I have no idea how to do that properly.
cast @kampfschlaefer for suggestions
(of course, not everything can be enabled/disabled on switch. Perhaps this can be documented as test too)
Hey @danbst, what about https://nixos.org/nixos/manual/index.html#sec-writing-nixos-tests where it says:
virtualisation.writableStore
By default, the Nix store in the VM is not writable. If you enable this option, a writable union file system is mounted on top of the Nix store to make it appear writable. This is necessary for tests that run Nix operations that modify the store.
Isn't that what you are looking for in terms of a writable store?
Do you have your trials for this kind of test available somewhere?
Thanks for a pointer. Looks like I skipped large part of documentation. Also, looking at firewall test, I do understand now that my intent can be implemented. But first let's consider two cases:
/etc/nixos/configuration.nix and applying changes with $machine->succeed("NIXOS_CONFIG=/etc/nixos/configuration.nix nixos-rebuild switch"virtualisation.writableStore should come handy.switch-to-configuration instead of nixos-rebuild switch. This is very much the same, because we care mostly about activation step.So, first case is closer to user experience, but hard to implement (how would you know what is configuration.nix content in virtual environment? and how would you express the change?)
As for the second case, it is doable. Here is an example that shows problem with networking.nat:
let execSwitchTo = cfg: "${cfg.config.system.build.toplevel}/bin/switch-to-configuration test";
in import <nixpkgs/nixos/tests/make-test.nix> rec {
nodes.machine = {
networking.firewall.enable = false;
};
nodes.enableNat = {
networking.firewall.enable = false;
networking.nat.enable = true;
};
testScript = { nodes, ...}: ''
$machine->start;
$machine->waitForUnit("default.target");
# This shows that NAT is started automatically after boot
$enableNat->start;
$enableNat->waitForUnit("default.target");
$enableNat->succeed("systemctl status nat");
# This shows that NAT isn't started after nixos-rebuild switch, it
# should be started manually
$machine->succeed("${execSwitchTo nodes.enableNat}");
$machine->waitForUnit("default.target");
$machine->fail("systemctl status nat");
$machine->succeed("systemctl start nat");
$machine->succeed("systemctl status nat");
'';
}
But it looks hacky.
imports = [ ... ] do)Regarding your second solution: I don't think its as hacky as you think. There is no difference between switching to a modified version of your current profile and switching to a completely different profile. Both are treated as a new profile and nix/nixos only cares about the changes as a convenience to not restart the whole system.
Of course your current implementation of 2) does the IP-address change, but that is because of the magic the test-system does with the defined nodes. If there is a way to define the same node with your changes maybe within the 'let' block without the IP-address change, then its perfectly valid. Maybe you can mimic the stuff the test system does to the nodes (basically its configuration for eth0). If we found a way to define several states for one node, we could even add something like machine->switchTo() to the test API to make this more official.
Maybe you are onto something here, maybe all tests should be of the kind:
I really like the idea of always starting a minimal nixos iso, then switching to the intended config. This should really be baked into the testing procedure. This would be tremendous !
related, when i was trying to fix xen, a nixos-rebuild switch would restart all xen related services, but messed something up
about 2 minutes after that, a watchdog within the hypervisor would trip, and the machine would hard-reset
it made it rather difficult to debug stuff until i had figured that out, and switched to nixos-rebuild boot && shutdown -r now
but automatically testing that with nixos's qemu framework would be even more difficult
Proposal:
Some more insights on this issue.
nesting.clone which actually solves the problem with IP and network devices. It just needs to be nicely integrated into test-runner APInesting.clone accepts list. attrsOf nixosSubmodule would be better, because you can specify name (which can be used as configuration name later). Do you agree, @nbp ?One more datapoint for such tests - https://stackoverflow.com/questions/44220338/taskserver-fails-to-start-on-nixos
The test should:
(cc @aszlig btw ---^)
Testing for switch to a new configuration is already done in the Taskserver test, but the stackoverflow issue could have been triggered more easily if we'd have a test from an older version to a newer one, not just stable/unstable.
I personally have a similar use-case for something like this in my deployments, where I test whether database migrations work successfully. Unfortunately, this needs a lot of imports from derivations and I usually have eval times of several minutes on these deployments :-/
So I guess testing between releases should probably end up in a dedicated jobset, which has two inputs like nixpkgs_old and nixpkgs_new, so we have a low eval time. Gong further, this could probably be automated as well, if we just save the disk state for each test in nixpkgs_old and run the same test with that disk state against nixpkgs_new.
Unfortunately, I think this won't work for more complicated tests, so I'd propose something like a snapshot command in the Perl test driver, which would save the current machine state (including a memory dump).
Having such a snapshot command would also help for speeding up boot times in tests, because all that's needed is to restore the state instead of going through the full boot. The idea of starting with a basic machine would also have that advantage.
Also, I've been thinking about making it easier to switch between system configurations, but didn't yet come up with a very good solution, but these are the ones I had in mind:
$machine->switch(10);.Use a separate attribute, which in turn contains an attribute set mapping from generation name to the corresponding configuration.
Pros:
Cons:
key attribute to define conditions for specific generations, for example:nix
{
machine = { lib, ... }: {
imports = lib.singleton {
key = "disable-foo";
services.foo.enable = lib.mkForce false;
};
services.foo.enable = true;
};
}
foo enabled, but if you do $machine->switch('disable-foo'); the foo service is disabled.config.generations which reference the configuration values to merge.lib.filterList <prio> <fun>) so you need to mkOverride whole lists.nix
{
machine = { generation, lib, ... }: {
generations = [ "enabled-foo" ];
services.foo.enable = generation.enabled-foo;
};
}
Had some more ideas, but I'm currently writing-impaired (broken shoulder and typing with one hand) so I won't list all of them as I didn't come up with a good solution that's both simple and declarative yet.
Another case that should have been caught by rebuild-switch tests: https://github.com/NixOS/nixpkgs/issues/56134
I tried the suggested methods to write a test that tests switch-to-configuration to aid fixing #60180 but with no success.
The problem is that both webserver and webserver2 add a DNS entry for a.example.com so
when trying to request a letsencrypt cert, sometimes letsencrypt tries to contact webserver2 which
is only there for building up the switchToNewServer command, and isn't actually running,
in which the test fails.
I want a way to build a config which I can switch to, but not create a bogus node for that config, as that
is actually undesired behaviour in this case, as it screws with the tests
let
commonConfig = ./common/letsencrypt/common.nix;
in import ./make-test.nix {
name = "acme";
nodes = rec {
letsencrypt = ./common/letsencrypt;
webserver = { config, pkgs, ... }: {
imports = [ commonConfig ];
networking.firewall.allowedTCPPorts = [ 80 443 ];
networking.extraHosts = ''
${config.networking.primaryIPAddress} a.example.com
'';
services.nginx.enable = true;
services.nginx.virtualHosts."a.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
};
webserver2 = { config, pkgs, ... }: {
imports = [ webserver ];
networking.extraHosts = ''
${config.networking.primaryIPAddress} b.example.com
'';
services.nginx.virtualHosts."b.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
};
client = commonConfig;
};
testScript = {nodes, ...}:
let
newServerSystem = nodes.webserver2.config.system.build.toplevel;
switchToNewServer = "${newServerSystem}/bin/switch-to-configuration test";
in
''
$client->waitForUnit("default.target");
$letsencrypt->waitForUnit("default.target");
$letsencrypt->waitForUnit("boulder.service");
$webserver->waitForUnit("nginx.service");
# This step fails already, as "a.example.com" points to two servers
$webserver->waitForUnit("acme-a.example.com.service");
$client->succeed('curl https://a.example.com/ | grep -qF "hello world"');
$webserver->succeed("${switchToNewServer}");
$webserver->waitForUnit("acme-b.example.com.service");
$client->succeed('curl https://b.example.com/ | grep -qF "hello world"');
'';
}
methods doesn't work in this particular case, because both nodes register their domain names and now I have two nodes with the same domain name running, which obviously breaks requesting certificates. So now the tests already fail before doing switch-to-configuration
Any way to have a "new config" and "old config" without creating two nodes?
@danbst thanks for pointing me to nesting.clone that very much seems to be what we're looking for here.
However, it doesn't seem to work in this case. build-vms.nix injects the evaluated nodes argument
into the nixos modules it is evaluating (https://github.com/NixOS/nixpkgs/blob/master/nixos/lib/build-vms.nix#L27)
whilst nesting.clone doesn't preserve this argument (it calls eval-config.nix manually, whilst build-vms-nix does too, but passes different arguments)
https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/activation/top-level.nix#L13-L23
So when you try to use it in a NixOS test, you'll get the following error:
while evaluating the module argument `nodes' in "/home/arian/Projects/nixpkgs/nixos/tests/common/letsencrypt/common.nix":
while evaluating the attribute '_module.args.nodes' at undefined position:
attribute 'nodes' missing, at /home/arian/Projects/nixpkgs/lib/modules.nix:163:28
Example test here:
let
commonConfig = ./common/letsencrypt/common.nix;
in import ./make-test.nix {
name = "acme";
nodes = {
letsencrypt = ./common/letsencrypt;
webserver = { config, pkgs, ... }: {
imports = [ commonConfig ];
networking.firewall.allowedTCPPorts = [ 80 443 ];
networking.extraHosts = ''
${config.networking.primaryIPAddress} a.example.com
'';
services.nginx.enable = true;
services.nginx.virtualHosts."a.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
nesting.clone = [
({config, pkgs, ... }: {
networking.extraHosts = ''
${config.networking.primaryIPAddress} b.example.com
'';
services.nginx.virtualHosts."b.example.com" = {
enableACME = true;
forceSSL = true;
locations."/".root = pkgs.runCommand "docroot" {} ''
mkdir -p "$out"
echo hello world > "$out/index.html"
'';
};
})
];
};
client = commonConfig;
};
testScript =
''
$client->waitForUnit("default.target");
$letsencrypt->waitForUnit("default.target");
$letsencrypt->waitForUnit("boulder.service");
$webserver->waitForUnit("nginx.service");
$webserver->waitForUnit("acme-a.example.com.service");
$client->succeed('curl https://a.example.com/ | grep -qF "hello world"');
# Incrementally enabling virtual hosts should just work
$webserver->succeed("/run/current-system/fine-tune/child-1/bin/switch-to-configuration test");
$webserver->waitForUnit("acme-b.example.com.service");
$client->succeed('curl https://b.example.com/ | grep -qF "hello world"');
'';
}
I just created a PR that fixes these issues.
Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:
Most helpful comment
Testing for switch to a new configuration is already done in the Taskserver test, but the stackoverflow issue could have been triggered more easily if we'd have a test from an older version to a newer one, not just stable/unstable.
I personally have a similar use-case for something like this in my deployments, where I test whether database migrations work successfully. Unfortunately, this needs a lot of imports from derivations and I usually have eval times of several minutes on these deployments :-/
So I guess testing between releases should probably end up in a dedicated jobset, which has two inputs like
nixpkgs_oldandnixpkgs_new, so we have a low eval time. Gong further, this could probably be automated as well, if we just save the disk state for each test innixpkgs_oldand run the same test with that disk state againstnixpkgs_new.Unfortunately, I think this won't work for more complicated tests, so I'd propose something like a snapshot command in the Perl test driver, which would save the current machine state (including a memory dump).
Having such a snapshot command would also help for speeding up boot times in tests, because all that's needed is to restore the state instead of going through the full boot. The idea of starting with a basic machine would also have that advantage.
Also, I've been thinking about making it easier to switch between system configurations, but didn't yet come up with a very good solution, but these are the ones I had in mind:
$machine->switch(10);.Cons:
Use a separate attribute, which in turn contains an attribute set mapping from generation name to the corresponding configuration.
Pros:
Cons:
Cons:
keyattribute to define conditions for specific generations, for example:nix { machine = { lib, ... }: { imports = lib.singleton { key = "disable-foo"; services.foo.enable = lib.mkForce false; }; services.foo.enable = true; }; }So the initial generation has service
fooenabled, but if you do$machine->switch('disable-foo');thefooservice is disabled.config.generationswhich reference the configuration values to merge.Pros:
Cons:
lib.filterList <prio> <fun>) so you need tomkOverridewhole lists.nix { machine = { generation, lib, ... }: { generations = [ "enabled-foo" ]; services.foo.enable = generation.enabled-foo; }; }Had some more ideas, but I'm currently writing-impaired (broken shoulder and typing with one hand) so I won't list all of them as I didn't come up with a good solution that's both simple and declarative yet.