Lxd: ceph storage driver

Created on 1 Nov 2015 · 7Comments · Source: lxc/lxd

Using ceph thinly provisioned block devices as container rootfs would be awesome and another step forward for container migration due tot ceph's distributed nature.
Ceph also has full openstack Swift and glance support, making it a good fit for lxd

Documentation Feature

Source

jsimonetti

👍15

Most helpful comment

I posted this on #2875 , but I didn't realise that issue was closed.

I believe ceph is the most popular storage backend used in openstack, for cinder and glance.
If nova-lxd wants to become a first class citizen in openstack, I would say that ceph support for lxd is a must!
+1 from me :)

jocado on 14 Mar 2017

👍3

All 7 comments

ceph feature would be really appreciated.

nlienard on 24 Oct 2016

I posted this on #2875 , but I didn't realise that issue was closed.

jocado on 14 Mar 2017

👍3

@jocado, check out this, https://github.com/OpenNebula/addon-lxdone. It's a project I'm currently working on. It's something like nova-lxd but for OpenNebula instead of OpenStack. It supports Ceph as storage backend, for rootfs and extra devices. The thing is LXD is not aware it's using Ceph, it's a workaround.

dann1 on 14 Mar 2017

in my opinion if nova-lxd is something more than a playground, then support of ceph volumes should be a must, at least for cinder volume attachments
we have ceph serving cinder and glance
I have recently deployed a compute node with ubuntu (16.04), the rest of our OS infra is centos7, and decided to try nova-lxd, the others are qemu/kvm, and was disappointed when I realized I could not use ceph to attach volumes or to boot from ceph block devices.
I understand that quite a large number of OS deployments is using ceph, so I had guessed that supporting it in all (most) nova-drivers would be high priority
I can't evaluate the best strategy to do that, and I don't want to sound as negative criticism, but help push this to the light of day

mariojmdavid on 23 Mar 2017

@mariojmdavid yes, though this has nothing to do with LXD's own ceph storage backend. To be able to attach OpenStack cinder volumes or similar OpenStack managed (as opposed to LXD managed) volumes, what you need is support for ceph rbd and mounts in nova-lxd itself rather than any change in LXD.

I believe the nova-lxd team is actively working on this for the reasons you described.

stgraber on 23 Mar 2017

I won't be able to start working on this properly until 25th May I assume.

brauner on 9 May 2017

Ceph storage driver

Preconditions

These are steps that LXD will assume have been taken by the user. LXD will not take care of these since most of them actually require sensitive and often interactive modifications to the underlying system:

Ceph tools must be installed and functional
A ceph-deploy user must have been created:
ceph-deploy must have been installed
an admin node must have been configured
The admin node must have password-less sudo access to all nodes
The admin node must have all password-less ssh access to all nodes
Ensuring that the ceph daemons can communicate over the network and are brought up during boot
Ceph ports are open
If selinux is used, setenforce == 0 must be set
A key with sufficient privileges (e.g. the admin keyring) must have been copied to the node (/etc/ceph is the default location that ceph expects) where the LXD daemon is running and must be readable.

In general LXD will not be concerned with any ceph-deploy setup steps! Specifically, LXD will not be concerned with creating ceph clusters, deploying and configuring new nodes. LXD will only make use of existing clusters and nodes.
The LXD daemon will thus start interacting with ceph on the osd pool level:

Request a key from ceph
Create new osd pools

ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] [crush-ruleset-name] [expected-num-objects]

Create new (rados) block devices

 rbd create --size {megabytes} {pool-name}/{image-name}

Create new storage volumes in those block devices

rbd create --pool {pool-name} {rbd-name} --size
rbd map --pool {pool-name} {rbd-name}
mkfs.{type} /dev/rbd/{pool-name}/{rbd-name}

Not using any ceph-deploy level administration will enable LXD to run on any arbitrary node in the ceph cluster and still create storage pools and volumes.

Ceph Storage Pool Properties

Applicable configuration keys that already exist:

volume.block.filesystem (default=”xfs”)
volume.block.mount_options (default=”discard”)
volume.size (default=10GB)

New configuration keys

ceph.cluster_name (string, default=”ceph”, the default value is identical to ceph’s)
This property, once set, is immutable.
ceph.osd.pool_name
This property, once set, will be immutable for now. This might change later.
ceph.osd.pg_num (int, default=”100”, number of placement groups)
This property, once set, will be immutable.
ceph.sparse (book, default=”true”, whether space-efficient but dependency-introducing copies should be used. Similar to the zfs.clone_copy” property.)

Ceph Storage Volume Properties

Applicable configuration keys that already exist:
- block.mount_options (string, default=”discard”, Mount options for block devices)
- block.filesystem (string, default=”xfs”, Filesystem to use for this volume)
  
  New configuration keys do not seem to be necessary for now.

User Experience

ubuntu@xenial:~$ lxc storage create pool1 ceph

create a new osd pool named “lxd” in which we create rados block devices for its storage entities (images, containers, snapshots)
The “source” property will be set to “pool1”.
The “ceph.osd.pool_name” property will be set to “pool1”.
The “ceph.cluster_name” property will be set to “ceph”.

This will issue the following commands behind the scene:

ceph osd pool create lxd 100

ubuntu@nuturo:~$ ceph osd pool create --cluster ceph lxd 100
pool 'lxd' created
ubuntu@nuturo:~$ ceph osd lspools
0 rbd,1 .rgw.root,2 default.rgw.control,3 default.rgw.data.root,4 default.rgw.gc,5 default.rgw.log,6 lxd,

ubuntu@xenial:~$ lxc storage create pool1 ceph ceph.osd.pool_name=foo



md5-459045a231d9ed53613791aba4fe6c8f



ubuntu@xenial:~$ lxc storage create pool1 ceph source=foo

use existing osd pool “foo”
The “source” property will be set to “foo”.
- The “ceph.osd.pool_name” property will be set to “foo”.
- The “ceph.cluster_name” property will be set to “ceph”.

Combinations of the “source” property and “ceph.osd.pool_name” do currently not make sense. This behavior is different from the zfs driver where “lxc storage create pool1 zfs source=/dev/sdX zfs.pool_name=foo” means to create a new zfs pool named “foo” on block device /dev/sdX. However, as mentioned above LXD is not concerned with setting up block devices or storage nodes for ceph itself. This is up to the user.
Notes on LXD internals
The “source” property for ceph storage pools will be set to the name of the osd pool if we have created it or been given it’s name (e.g. “ceph”. Note that we always record the name of the cluster in “ceph.cluster_name”. the default value for ceph.cluster_name will be “ceph”. In case we are given an existing pool and the name of the pool is ambiguous or we are given the name of an osd pool that doesn't exist in the default “ceph” cluster users need to give us the name of the cluster it belongs to.
Notes on ceph internals
On a new cluster only the “rbd” pool exists
You can have multiple clusters on the same host. In order to address a specific cluster the “--cluster” argument must be passed to all “ceph osd” and “rbd” commands.
rbd images are conceptually identical to lvm logical volumes and provide at least feature parity with them
rbd images support snapshotting in case the origin has been marked as “protected”. The “protected” property is an explicit marker set on a given rbd image to mark it as having dependent datasets. In contrast to (non-thinpool) logical volumes and zfs clones where this is implicitly enforced behavior, rbd explicitly requires you to enable this behavior.
rbd images seem to support full copy behavior. Issuing “rbd cp /dev/rbd/lxd/bla /dev/rbd/lxd/bla-copy” seems to create a new rbd image with identical contents without introducing dependencies between images. We should thus give ceph pools a Boolean property cep.sparse which allows to determine whether users want to use space-efficient but dependency-introducing copies or not.
When you try to create an already existing osd pool via
ceph --cluster {cluster-name} osd pool create {pool-name} {pg-num}
Then ceph will give a warning that the given osd pool already exists but exit with 0.