Lxd: Use migration sink/source mechanism for local copies/moves

Created on 17 May 2019 · 20Comments · Source: lxc/lxd

Required information

Distribution: CentOS 7
lxc-info

...
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  driver: lxc
  driver_version: 3.1.0
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.0.11-1.el7.elrepo.x86_64
  lxc_features:
    mount_injection_file: "true"
    network_gateway_device_route: "false"
    network_ipvlan: "false"
    network_l2proxy: "false"
    seccomp_notify: "false"
  project: default
  server: lxd
  server_clustered: false
  server_name: killer-queen
  server_pid: 31367
  server_version: "3.13"
  storage: btrfs
  storage_version: "4.4"
...

I have two storage volumes, both are using btrfs. I moved an existing container from one to another using lxc move container_name --target volume_2 container_2, and after that docker stopped working on that machine.

After closer inspection I found that subvolumes were treated as regular directories during move.

Steps to reproduce

Initialize two btrfs pools, e.g C1 and C2.

lxc launch -s C1 ubuntu box1
In box1: btrfs su cr /testVol. Make sure subvolume is created via btrfs su sh /testVol
lxc move box1 -s C2 box2
In box2: btrfs su sh /testVol. No subvolume will be found.

Bug

Source

cab404

Most helpful comment

This has been done now for all drivers except CEPH (which doesn't really have that concept anyway but will be ported in 3.20).

stgraber on 14 Jan 2020

❤1 🎉1 👍1

All 20 comments

Hmm, that suggests that the move didn't use the btrfs migration codepath, I don't really remember how that logic works.

@brauner do you?

stgraber on 17 May 2019

@stgraber https://github.com/lxc/lxd/blob/master/lxd/storage_btrfs.go#L2593
This looks pretty much like a culprit. I can check (tomorrow), whether adding error in place of fallback to rsync triggers it.

cab404 on 18 May 2019

If it does, then there should probably be a more complex decision logic, because rsync is clearly destructive to subvolumes.

cab404 on 18 May 2019

tomorrow

I am failing to build it for half an hour now, so probably not today.

cab404 on 18 May 2019

Not very familiar with this, does it mean that if the source and target storage pools are btrfs we should always use the specialized btrfs migration code instead of rsync? Regardless of whether the container is running in a user namespace or not.

One thing that puzzles me a bit is that the storage.MigrationType() interface method only takes into account one storage pool and not the combination of two storage pools (source and destination). But perhaps it's just me not understanding the whole mechanics.

freeekanayaka on 4 Jun 2019

@freeekanayaka So the check for user namespace should stay as I don't believe the btrfs send/receive works at all if either side is running in a user namespace.

The way the migration code works normally is that one side connects to the other, sends a migration struct describing what's to be migrated and what its ideal migration handler is (in this case btrfs), the target then considers that information and assembles its own migration struct, if the handler matches it puts that one in, if it doesn't match, it puts rsync as a generic fallback.

To be fair, this entire logic is rather complex, mixes some protobuf stuff and isn't the easiest to read and wrap your head around. Let me know if something is unclear and I'll see if I remember it somehow :)

@cab404 can you confirm that both LXD servers are physical servers or virtual machines and aren't themselves containers?

stgraber on 4 Jun 2019

@stgraber thanks for the explanation, it's pretty clear and doesn't struck me as particularly complex logic (high-level-wise). I assume that the actual implementation is somewhat convoluted though, and perhaps mixes some concerns/layers (since you mention protobuf). But I'd need to give it a look.

If btrfs send/receive don't work at all within user namespaces, it's not obvious to me what we can do to solve this.

freeekanayaka on 4 Jun 2019

can you confirm that both LXD servers are physical servers

@stgraber I am using one physical server with two btrfs pools

cab404 on 4 Jun 2019

@freeekanayaka so based on above, neither LXD instances are running inside a userns so send/receive should work fine.

stgraber on 5 Jun 2019

@stgraber and also it is installed via snap)

cab404 on 5 Jun 2019

@stgraber hm then I'm confused. If the issue is not that LXD is running in a userns, what's going on?

freeekanayaka on 6 Jun 2019

@freeekanayaka so what we need to do here is:

Reproduce the issue ourselves, add two btrfs storage pools to a LXD instance, copy a container between the two after creating a subvolume inside the container, confirm the subvolume is turned into a plain old directory on the copied container
Check what migration sink/source was used, so we know if the issue is that we're using rsync when we shouldn't or if we're using btrfs send/receive but the nested subvols aren't being copied properly

stgraber on 6 Jun 2019

@stgraber the issue is reproducible using the steps indicated by @cab404 and by you. The culprit seems to be that as part of the move we first copy the container, and when copying a btrfs container between two pools we use rsync. See:

https://github.com/lxc/lxd/blob/master/lxd/storage_btrfs.go#L1163

and

https://github.com/lxc/lxd/blob/master/lxd/storage_btrfs.go#L1134

what do you think would be the best way to fix this? It feels it's mainly a matter of refactoring, so we could swap rsync with some other equivalent btrfs-based logic that we have already in place. But I'm not familiar with the code base.

freeekanayaka on 8 Jun 2019

Indeed, this looks like the source of the issue which also means that annoyingly moving a container between two hosts would do the right thing but doing it locally won't.

I think what we need to do is make our cross-pool copy logic use the same migration sink/source code as network migration so we can pick the best transfer mechanism based on source and target.

stgraber on 10 Jun 2019

Ok thanks, I'll check that, hope it won't require to much re-plumbing/re-factoring.

freeekanayaka on 10 Jun 2019

So yeah, as @stgraber expected this requires a bit more work than some simple logical switches and rewiring for changing code paths, since the logic to run btrfs send/receive is a bit coupled-with/hard-coded-into the migration logic that we use over the network. It doesn't look too terrible, and from what I understand the protobuf part is only relevant for metadata, not for the raw btrfs data, but still it requires a certain amount of refactoring that we might want to defer at some point, since we're so close to release and we plan to introduce the new storage and storage driver interfaces.

freeekanayaka on 11 Jun 2019

Moved to 3.15

stgraber on 11 Jun 2019

ooof

cab404 on 5 Nov 2019

@cab404 ?

Note that this is currently being worked on with the storage rework that @tomponline is doing. We already have the dir backend using the new migration logic there which will solve this issue.

stgraber on 5 Nov 2019

👍2

This has been done now for all drivers except CEPH (which doesn't really have that concept anyway but will be ported in 3.20).

stgraber on 14 Jan 2020

❤1 🎉1 👍1

Was this page helpful?

0 / 5 - 0 ratings