Nixpkgs: VM tests should not depend on any specific VM technology

Created on 6 Dec 2014 · 23Comments · Source: NixOS/nixpkgs

I was trying to run some of the tests, which run some virtual machines with a virtual network automatically, but since the virtualbox module is already loaded, those don't work.

The way this should work is that the system should detect which virtualization technologies are supported on the host system (which _might_ be NixOS) and then it should use whatever candidate works best.

I am thinking of Xen, Virtualbox, and KVM.

enhancement stale nixos testing

Source

aragnon

Most helpful comment

Even not considering lxc/xen/containers and tests in the cloud, switching to libvirt could make test-driver simpler. libvirt-qemu already has good half of test-driver functionality:

it maintains monitor socket (for sendkey) and backdoor shell over serial port
it has own bridge and can authomatically connect VMs to openvswitch if more advanced networking required, so VDE might be deprecated.
it has Perl API richer than test-driver Machine.pm's sendkey/takescreenshot

volth on 6 Mar 2017

👍3

All 23 comments

This happened to me as well. Btw I would be even fine to manually specify
what virtualization to use from command line.

On Sat, Dec 6, 2014 at 12:58 PM, aragnon [email protected] wrote:

I was trying to run some of the tests, which run some virtual machines
with a virtual network automatically, but since the virtualbox module is
already loaded, those don't work.

The way this should work is that the system should detect which
virtualization technologies are supported on the host system (which
_might_ be NixOS) and then it should use whatever candidate works best.

I am thinking of Xen, Virtualbox, and KVM.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nixpkgs/issues/5241.

NixOS Linux http://nixos.org

lethalman on 6 Dec 2014

+1 also add lxc/docker as test backend

offlinehacker on 7 Dec 2014

What would be the benefit? This would add quite some maintenance overhead to the tests (which are already sometimes hard to debug/fix).

domenkozar on 7 Dec 2014

In software, features means code complexity and we can't just implement every feature. We have to be selective and think about not only the values but also (and mostly) QA (maintenance, tests, documentation).

domenkozar on 7 Dec 2014

👍2

@iElectric evidently it never happened to you that once you opened 4 vagrant instances with vbox you had to shutdown all of them to run nixpkgs tests. It's annoying.

lethalman on 7 Dec 2014

@lethalman I did. The issue can be circumvented with documentation, it's more of a tedious limitation than a blocker for using qemu. Maybe it can even be addresses in another way than implementing another backend for tests.

cc @edolstra

domenkozar on 7 Dec 2014

Another possibility is to use libvirt, will enable all of the virtualization backends for free.

lethalman on 7 Dec 2014

If someone wants to tackle this, then start here: http://wiki.libvirt.org/page/QEMUSwitchToLibvirt

We have quite some infrastructure based on nixos test driver, so this will involve quite some work.

domenkozar on 7 Dec 2014

I plan in the _future_ to work on libvirt porting if @edolstra agrees.

lethalman on 7 Dec 2014

@goodwillcoding I think you lack a sense for timing your comment. Additionally, you have no clue whether or not you should interfere in a discussion, considering that you have contributed nothing to this discussion.

aragnon on 8 Dec 2014

@aragnon Insulting other developers (e.g. calling them trolls or "stupid") is not acceptable on this project. Please refrain from doing so in the future.

Regarding the issue: obviously it would be nice to have other VM backends, but as @iElectric says, it would also be rather substantial amount of work. Also, we already have enough non-determinism issues in VM tests without having to worry about different VM backends being selected on different build machines.

edolstra on 8 Dec 2014

One thing is that QEMU has proven hard to debug several times in the past when we've done upgrades of it, and people (like me) like those upgrades, so that sucks a bit. But I'm not sure another backend would fare any better here - it seems pretty reasonable to assume that e.g. Dom0 upgrades could just as easily break tests, and VirtualBox always seems to break something somewhere. In general these tests are always going to be brittle, to some degree; adding more complexity is madness.

We should at least document that you'll need QEMU/KVM drivers and that other modules are incompatible in the documentation.

And on that note, I'll expand on the madness:

You have the people who have issues, the people who prioritize those issues, and the people who solve issues. You are not in the position to discuss the "benefit" at this point anymore, because there are already three people who do see value in it. The only option you have at this point is to give this issue a lower priority (for example by not working on it). Any other action is just delaying the process. Please, now explain to me why I had to explain this to you. Assuming you can write software, you are not stupid, right? (This naturally only leaves as a possibility that you were trolling me, although I cannot really figure out why you would want to do that.)

That's not software development. That's called 'insanity'. Here is an actual lesson in software development: For every component X in your system, if you add an alternative component Y, you must also add a third component Z at minimum: Z allows you to switch between X and Y. Now count the moving parts: you now have three moving parts instead of one. But you forgot the other three new moving parts: not just X, Y, and Z, but also the _interactions_ between them: X to Z, Y to Z, and X to Y. This, in essence, increases the failure rate _by a factor of six_ versus the original setup.

But you may say "Y and Z are not that big", but that's besides the point: ignore the constant factors, and play this scenario out with three choices instead of two. And then four instead of three. You should pretty quickly see a pattern - one that has unmanageable amounts of complexity.

So, in short, you're basically asking us to - instead of shipping a set of _working_ tests - ship an unmanageable set of tests to users, and rather than perhaps having some glitches, instead let a user decide how it all explodes in flames so they can come complain to us like you have. The only difference being the software is now several times more complex, and thus even more difficult to fix. Considering the developer effort we have currently, this is not only impractical, it is pretty much _laughable_.

The cure is worse than the disease.

thoughtpolice on 8 Dec 2014

@edolstra what about my proposal of using libvirt?

lethalman on 8 Dec 2014

_I have changed some of the comments and removed some that are non-productive. Please keep to discussing of the the original issue from here on out._

It seems like the issue is a valid issue, we can keep it open as @lethalman has stated he might spend some time working on this.

Personally I agree with the comments that think this might introduce too much complexity or extra non-determinism in the tests. However, perhaps someone who implements this will show us wrong.

rbvermaa on 9 Dec 2014

:+1: :+1: :+1:

My approach wouldn't be to auto-detect based on host, since as others have mentioned it could introduce nondeterminism. However, even the ability to explicitly state that I'd like to run the suite using one particular backend would be a big plus, or even doing so on a per-test basis.

Ultimately, I'd probably just want to run many of my tests in NixOS containers, since I'm using a lot of AWS and I can't run full-fledged virtualization software efficiently inside an EC2 instance.

copumpkin on 19 Apr 2015

A lot of these like the installer tests inherently depend on being inside a
VM or some kind of mutable container and would be specific to the tech.
Networking tests rely on being able to add kernel devices for testing
network configuration. Otherwise, this could probably work for most other
test cases.
On Apr 19, 2015 11:21 AM, "Daniel Peebles" [email protected] wrote:

[image: :+1:] [image: :+1:] [image: :+1:]

My approach wouldn't be to auto-detect based on host, since as others have
mentioned it could introduce nondeterminism. However, even the ability to
explicitly state that I'd like to run the suite using one particular
backend would be a big plus, or even doing so on a per-test basis.

Ultimately, I'd probably just want to run many of my tests in NixOS
containers, since I'm using a lot of AWS and I can't run full-fledged
virtualization software efficiently inside an EC2 instance.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nixpkgs/issues/5241#issuecomment-94303226.

wkennington on 19 Apr 2015

Yeah, I was just thinking a related (though independent) feature would be
to have isContainer = true configs throw errors if you try to do silly
things in them like change kernels or bootloaders.

On Sunday, April 19, 2015, William A. Kennington III <
[email protected]> wrote:

A lot of these like the installer tests inherently depend on being inside a
VM or some kind of mutable container and would be specific to the tech.
Networking tests rely on being able to add kernel devices for testing
network configuration. Otherwise, this could probably work for most other
test cases.
On Apr 19, 2015 11:21 AM, "Daniel Peebles" <[email protected]

[image: :+1:] [image: :+1:] [image: :+1:]

My approach wouldn't be to auto-detect based on host, since as others
have
mentioned it could introduce nondeterminism. However, even the ability to
explicitly state that I'd like to run the suite using one particular
backend would be a big plus, or even doing so on a per-test basis.

Ultimately, I'd probably just want to run many of my tests in NixOS
containers, since I'm using a lot of AWS and I can't run full-fledged
virtualization software efficiently inside an EC2 instance.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nixpkgs/issues/5241#issuecomment-94303226.

—
Reply to this email directly or view it on GitHub
https://github.com/NixOS/nixpkgs/issues/5241#issuecomment-94307896.

copumpkin on 19 Apr 2015

Now that https://github.com/NixOS/hydra/issues/201 is starting to work, having this is even more appealing!

copumpkin on 27 Aug 2015

If it's useful to anyone else, here's the patch to be able to ssh into NixOS tests and debug the state:

diff --git a/nixos/lib/test-driver/test-driver.pl b/nixos/lib/test-driver/test-driver.pl
index 8ad0d67..838fbdd 100644
--- a/nixos/lib/test-driver/test-driver.pl
+++ b/nixos/lib/test-driver/test-driver.pl
@@ -34,7 +34,7 @@ foreach my $vlan (split / /, $ENV{VLANS} || "") {
     if ($pid == 0) {
         dup2(fileno($pty->slave), 0);
         dup2(fileno($stdoutW), 1);
-        exec "vde_switch -s $socket" or _exit(1);
+        exec "vde_switch -tap tap0 -s $socket" or _exit(1);
     }
     close $stdoutW;
     print $pty "version\n";

domenkozar on 19 Jan 2016

👍1

Even not considering lxc/xen/containers and tests in the cloud, switching to libvirt could make test-driver simpler. libvirt-qemu already has good half of test-driver functionality:

it maintains monitor socket (for sendkey) and backdoor shell over serial port
it has own bridge and can authomatically connect VMs to openvswitch if more advanced networking required, so VDE might be deprecated.
it has Perl API richer than test-driver Machine.pm's sendkey/takescreenshot

volth on 6 Mar 2017

👍3

I've added vm test debugging machinery in https://github.com/NixOS/nixpkgs/pull/47418

domenkozar on 27 Sep 2018

👍1

@edolstra Would libvirt be an option? Any other way to fix this issue?

Or should we just close this issue?

davidak on 9 Nov 2019

Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:

Search for maintainers and people that previously touched the
related code and @ mention them in a comment.
Ask on the NixOS Discourse. 3. Ask on the #nixos channel on
irc.freenode.net.