Lxd: cluster: Proper handling of ceph storage for containers

Created on 7 Mar 2018  路  8Comments  路  Source: lxc/lxd

A container which is backed by ceph doesn't really depend on its host, so moving it from one host to another is effectively just a matter of updating some database records.

We therefore need to make sure that this is the case (bypass of migration code) and that we also allow you to move a container from one host to another even if the source host is offline.

Bug

All 8 comments

I don't know if this is related, but would it be feasible to have a setting like "ceph.auto.migrate = 1" as a container config key, so that if a system goes down it automatically gets rescheduled to a new instance? e.g. it just picked one at random that had available resources and started it back up. (the clause being, you have to explicitly state that you want it auto-migrated if the system goes down).

Also, are there any plans to add a ceph live migration feature?

I've been testing the clustering and its a bit rough around the edges, but seems to work quite well.. e.g. adding ceph cluster storage via the cli isn't really documented (or I wasn't able to find it), but I sort of just faffed my way into getting it working.. so it must be somewhat intuitive.

@Mantwon re the live migration feature see the PR's description and the linked issue. TL;DR: it will come in follow-up branches.

We can surely add docs specific to ceph. Note that the workflow for creaating a pool is basically the same for all storage drivers, as described in https://github.com/lxc/lxd/blob/master/doc/clustering.md

Hm, I might actually misinterpreted re ceph live migration. AFAICT, no, for now that won't be supported, but @stgraber will be able to clarify that better.

So I think we'll be staying away from attempting to automatically move containers when a node goes down as there is a serious risk of data loss there which would be better handled by an operator.

The case I'm concerned about is the one where LXD thinks the node is gone, potentially due to network/routing failures blocking the internal LXD communication. However this could happen without that node being actually shut down and without it having lost access to ceph.

If LXD was to automatically restart those containers somewhere else, you might then end up with the same block device mounted on two different machines at the same time, which is a pretty good recipe for data corruption.

I agree, I was just a little too hot4clustering :) I've been playing with this a lot, and I really like it.. great work.

AFAIK, what we still have left in here is:

  • Make moving containers within the cluster to only be a DB change (extension to rename API)
  • Make it possible to join a new cluster node to a cluster which is using a CEPH storage pool

@freeekanayaka am I missing anything?

@stgraber not sure what you mean with the second point. It should now be possible to a cluster which has ceph storage pools, see the code and integration tests in the branch that got merged.

Oh right, indeed.

Was this page helpful?
0 / 5 - 0 ratings