Lxd: Add virtual machine support

Created on 15 Sep 2019 · 20Comments · Source: lxc/lxd

As some may have figured out by now through looking at some of the database and devices refactoring we've been doing for the past few months, we'll be adding support for virtual machines directly in LXD.

This isn't going to be some kind of virtual-machine/container hybrid but instead be fully fledge virtual machine support using the same API and user experience as LXD currently provides for containers.

This will be done by adding direct support for virtual machines as an instance and image type in LXD. The runtime will be done through driving qemu directly.

Much more details will become available as we go through this work, but some initial limitations will be:

Modern machine type only (q35, virtio, uefi with secureboot)
Limited initial architecture support (x86_64 and aarch64)
Limited initial support for agent-enabled actions (lxc exec only)
No initial live migration support
Limited set of supported devices

As far as the workflow, the main two options at release time will be:

lxc launch ubuntu:18.04 vm1 --vm => Ubuntu 18.04 VM
lxc launch vm2 --empty => PXE booted VM

We expect to start adding VM images to the community image server, eventually reaching parity with container images, but this may take quite a few months to go through them all.

Documentation Feature

Source

stgraber

👍18 ❤8 🎉8 👀2

Most helpful comment

@karlfiabeschi we've found that many users are running both VMs and system containers on the same systems, this is a bit annoying because you end up having to define network and storage twice, it also means using two distinct set of tools and prevents getting a good complete solution for small teams using clustering.

For a while we were hoping we'd see libvirt or other solutions provide a cloud-like user experience for virtual machines, good REST API and tooling similar to what we did for containers through LXD, after 3 years, this hasn't happened and given our API and design is very well aligned with virtual machines, we decided to finally take that step and implement running virtual machines directly.

This will allow for a single configuration for storage and network, for profiles applying to both containers and VMs and API clients to drive both instance types through the same API.

stgraber on 16 Sep 2019

👍13 🎉2

All 20 comments

Current status of master is:

New /1.0/instances API (will be unified endpoint for container & vm)
New Go API functions for instance management
Update CLI and internal codebase to use the new functions
Support for downloading and storing VM images

Next steps are:

Replace internal container interface with a new instance interface
Re-shuffle functions in the instance interface to suit both containers and VMs
Update any function that breaks as a result of loosing access to container-only data
Introduce a new vmQemu struct, similar to our existing containerLXC struct
Add support for basic lifecycle functions on vmQemu
Get a VM to work with plain PXE boot
Get a VM to work from an image (and get to behave with all backends)
Introduce agent in VM and integrate for exec API

We'll no doubt find a number of issues and needed refactoring along the way, but a lot of the roadblocks are now gone and we expect to make rapid progress on this over the next few weeks.

stgraber on 15 Sep 2019

I am so excited about this feature!
Can I suggest also to rename the command lxc to lxd or something like that?;)

tomposmiko on 16 Sep 2019

I am so excited too, do you mind to explain why are you taking this step? (add vm support)

karlfiabeschi on 16 Sep 2019

This will allow for a single configuration for storage and network, for profiles applying to both containers and VMs and API clients to drive both instance types through the same API.

stgraber on 16 Sep 2019

👍13 🎉2

Thanks a lot you for the explanation, I'm looking forward to try it on version 3.19

karlfiabeschi on 16 Sep 2019

@stgraber Can this be somehow optional? Not everyone want/need/allowed to install qemu on servers with LXD.

s3rj1k on 17 Sep 2019

@s3rj1k yeah, the snap will bundle qemu but for those manually building/packaging LXD, if qemu isn't around, we'll just skip vm support.

stgraber on 17 Sep 2019

For a while we were hoping we'd see libvirt or other solutions provide a cloud-like user experience for virtual machines, good REST API and tooling similar to what we did for containers through LXD, after 3 years, this hasn't happened and given our API and design is very well aligned with virtual machines, we decided to finally take that step and implement running virtual machines directly.

I'm not sure why you expected libvirt to develop a REST API out of nowhere... The closest right now would probably be the Cockpit API interface for libvirt that uses libvirt-dbus underneath it.

If you wanted one that badly, why not use libvirt with LXD instead of reimplementing everything?

Conan-Kudo on 17 Sep 2019

libvirt has been playing with remote access in the past and so could have decided to adopt a more modern remote API, similar to what LXD and cloud providers provide. Instead the project decided to remain as a low level abstraction between multiple hypervisors. That's perfectly fine and even reasonable though it is surprising that we have seen so much development around improved user experience for containers and their management at scale, but short of deploying OpenStack, there hasn't been a lot of this going on in the VM space.

My understanding of Cockpit is that it's primarily a system management interface which does let you manage local virtual machines. Correct me if I'm wrong, but it doesn't provide a way of managing VM workloads across multiple systems, handling image publishing, centralized configuration, ... as users are now accustomed from their user of the public cloud offerings.

For the past few years, we haven't really needed this ourselves that much, the addition of LXD clustering changed that somewhat as we're seeing more and more users who want to deploy a hybrid environment, using containers whenever possible and VMs for the few remaining workloads, being able to deploy those on the same machines using the same storage and network and sharing configuration profiles between the two.

Our existing API maps pretty naturally to virtual machines so we figured it was time to give it a go.

As for using libvirt for this. We looked into it as we've all been using libvirt for our virtual machine needs until now but the fact is, we don't actually need any of the abstraction that libvirt does.
We have no intention of supporting ESX, Xen, ... we only care about Linux kvm and even for that, only care about running modern virtual machines.

LXD aims to be a tiny local cloud with pre-made images that just work, as we're coming up with this now, we don't need to care about legacy VM hardware, older firmware, old device emulation or even older qemu versions. With that in mind, the benefits offered by using libvirt would be pretty limited and there would be quite a lot of pain caused by having to integrate LXD's view of networking, storage and ownership (projects) with libvirt's own ceoncepts.

The bulk of the work for this feature is around re-shuffling our API, database and internal models to support both container and vm side by side. The logic to actually start and operate the VM is a very tiny part of this, so us generating libvirt xml rather than a qemu config file really wouldn't make things that much easier (if at all) but would come with all that complexity in aligning storage/network/ownership models between the two and the added external dependency chain.

stgraber on 18 Sep 2019

👍6

Any plans for something lightweight such as kata containers or like?

nakatadim on 4 Oct 2019

No. LXD focuses on running full Linux distributions. When running inside regular containers, we achieve that by running an unmodified image of that distro but using the host's kernel as that's what containers are. When running virtual machines, we're taking the same approach of running as much of the unmodified Linux distribution as possible, which in this case, involves running their bootloader, kernel, initrd, ... just as one would expect from a VM.

When providing containers to our users, we want to provide the fastest and most featureful experience that we can on modern Linux, similarly for virtual machines, we will be providing what we consider to be the fastest and most featureful VM experience one can get today.

While Kata may makes sense for some users of application containers who traditionally have a much more limited feature set and scope, to us, it feels like a bad compromise for both our intended targets. It's not a container runtime (despite its confusing name) as it uses its own kernel and requires virtualization extensions, therefore preventing a lot of the resource and device sharing we get for free with containers and it's also not a full fledged VM that can be operated like a physical machine would.

stgraber on 4 Oct 2019

❤1

libvirt has been playing with remote access in the past

Can you please shed more light what do you mean by this? I don't remember seeing any patches that'd modify our view on remote access.

and so could have decided to adopt a more modern remote API, similar to what LXD and cloud providers provide. Instead the project decided to remain as a low level abstraction between multiple hypervisors. That's perfectly fine and even reasonable though it is surprising that we have seen so much development around improved user experience for containers and their management at scale, but short of deploying OpenStack, there hasn't been a lot of this going on in the VM space.

My understanding of Cockpit is that it's primarily a system management interface which does let you manage local virtual machines. Correct me if I'm wrong, but it doesn't provide a way of managing VM workloads across multiple systems, handling image publishing, centralized configuration, ... as users are now accustomed from their user of the public cloud offerings.

For the past few years, we haven't really needed this ourselves that much, the addition of LXD clustering changed that somewhat as we're seeing more and more users who want to deploy a hybrid environment, using containers whenever possible and VMs for the few remaining workloads, being able to deploy those on the same machines using the same storage and network and sharing configuration profiles between the two.

Our existing API maps pretty naturally to virtual machines so we figured it was time to give it a go.

As for using libvirt for this. We looked into it as we've all been using libvirt for our virtual machine needs until now but the fact is, we don't actually need any of the abstraction that libvirt does.
We have no intention of supporting ESX, Xen, ... we only care about Linux kvm and even for that, only care about running modern virtual machines.

As I've learned recently this is fairly common misconception about libvirt. The fact we support multiple hypervisors doesn't mean you have to use them all. Even RPMs are split (and now the monolithic daemon too) so you can install only those parts that you want. Also backward compatibility that libvirt offers doesn't stand in the way for new features. It's fairly common that libvirt has to wait for qemu to push patches first.

But I've learned some more time ago that apps that try to avoid libvirt and implement everything themselves usually end up with a compatibility layer to talk to qemu which resembles libvirt more and more.

zippy2 on 5 Oct 2019

For remote libvirt access, I was referring to the various methods available to make the libvirt socket available over the network. Most commonly people seem to be using the ssh+netcat method though a direct TLS socket is also supported by libvirtd using PKI based authentication from what I remember.

I'm indeed aware of the fact that libvirt can be split such that only the relevant backends are included, I believe we looked into using that some time ago to ship the xen backend as a separate package in Ubuntu which would then allow us to move xen to universe rather than keeping everything in main. Not sure where that effort went though.

I do expect us to have to do some amount of feature detection and change our qemu configuration accordingly, it's something we're quite used to doing for LXC and the kernel already with us running a wide variety of feature checks on startup and then informing our users when they want to use an unsupported feature, similarly including that data in our internal migration protocol to properly handle live and cold migrations between LXD versions.

We have had to solve a lot of those issues already for containers, we have what we think is a pretty solid model around storage, networking and containers, which spans multiple hosts (clustering), support creating pool of resources (projects) and do access control on top of those. We've recently gone through some effort to re-design the way we handle all devices attached to containers and did this work with virtual machines in mind, allowing us to easily start rendering qemu config from this, sharing 90% of the code with what we do for LXC today, keeping the user facing API identical.

stgraber on 6 Oct 2019

I'm surprised this interesting discussion is coming so late in the development process.

As a user of both libvirt and lxd I'd expect libvirt to drive lxd rather than lxd to drive qemu...
It's not clear to me what's wrong with the libvirt API design.

As a user of libvirt I'm following the development process. I see how hard it can be to maintain a API on top of virtual machines, including qemu. AFAICT, the libvirt community is very active upstream and close to qemu. With this in mind I don't understand how the lxd community hopes to support qemu/VMs in a better way. IMHO, lxd is going to enter the joy of handling qemu hardwares/drivers and the clustering of qemu machines without all the flexibility that libvirt provides.

I'd expect lxd to remain containers based only and to focus on containers with better user experience. E.g. : improve user XP about uid/gid maps for containers, easier bind mounts, mixed storage drivers in a pool (LVM/BTRFS/dir/etc at container level in the same pool), more choices on snapshots, improved exports/imports between clusters, etc). Also, I'm lacking some basic features like lxc snapshot list CONTAINER_NAME with the dependency tree (if any), the date and the type of snapshot (hot vs cold).

At least, I'm not going to trust lxd for my qemu machines any time soon. In the past last years I've had more issues with lxd than with libvirt.

Of course, there will be duplicate efforts. OTOH, this might be a good thing to have overlapping features between libvirt and lxd while they are not direct concurrent softwares. We'll see what works best in the long run. Also, this might be the path required for lxd to improve the reliability of lxd, especially regarding breakages from releases to releases.

I wish you the best!

nicolas33 on 7 Oct 2019

@stgraber I know it's far fetched but any chance this will support Windows VM's?

Silentphantom62 on 9 Oct 2019

@Silentphantom62 it will, though not in an easy way initially. The initial pass will not have VGA output which makes a lot of the installation process of Windows a bit tricky. Also, Windows would need virtio drivers be exposed to it to be able to install properly which we're not planning on doing in the initial pass.

But it is absolutely our goal to eventually have a pretty straightforward way of installing Windows, including having our agent work on it.

stgraber on 9 Oct 2019

👍4

That's great news, looking forward to this new direction!

Silentphantom62 on 9 Oct 2019

Maybe https://cloudbase.it/cloudbase-init/ can help here a lot unless a full installation is supported.

jkroepke on 12 Oct 2019

I think that would be useful once you've done a manual install and added it. At that point you can publish the vm as an image and vms created from it would then get to use it for provisioning.

We're unlikely to ever be allowed to distribute prebuilt Windows images so instructions on building them is probably the best we can do.

stgraber on 13 Oct 2019

Thank you so much for this feature. Finally, I can get rid of virt-manager and VBox!