(Moved from https://github.com/lxc/lxc/issues/2654)
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses: []
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
[OUTPUT TRUNCATED]
-----END CERTIFICATE-----
certificate_fingerprint: [OUTPUT TRUNCATED]
driver: lxc
driver_version: 3.0.2
kernel: Linux
kernel_architecture: x86_64
kernel_version: 4.15.0-34-generic
server: lxd
server_pid: 3818
server_version: "3.5"
storage: ""
storage_version: ""
server_clustered: false
server_name: Notebook
There was a btrfs storage pool named local and with source /var/lib/lxd/disks/local.img. It used to work perfectly fine, until the file /var/lib/lxd/disks/local.img was deleted outside of LXC (cannot be recovered). It is used by 5 containers, and the storage pool and the containers appear in the output of lxc storage list and lxc list, respectively. Now, none of the storage pool local or its associated containers can be deleted in LXC.
Executing lxc delete container1 (container1 is a container in the missing local storage pool) outputs Error: no such file or directory. Using the --force flag does not change the output. There doesn't seem to be any way to remove container1 from LXC. This, I believe, is a bug or is at least undesirable behavior. Perhaps the only real negative effect is that it limits the namespace for new containers and storage pools.
I think that, perhaps through a --delete_from_db flag, LXC could offer an option to remove any references to a container/storage pool from the LXC database.
btrfs storage pool with an image file as a source.bash.Executing lxc storage delete local outputs: Error: storage pool "local" has volumes attached to it.
Executing lxc storage volume list local outputs:
+-----------+-----------+-------------+---------+
| TYPE | NAME | DESCRIPTION | USED BY |
+-----------+-----------+-------------+---------+
| container | container1 | | 1 |
+-----------+-----------+-------------+---------+
| container | container2 | | 1 |
+-----------+-----------+-------------+---------+
| container | container3 | | 1 |
+-----------+-----------+-------------+---------+
| container | container4 | | 1 |
+-----------+-----------+-------------+---------+
| container | container5 | | 1 |
+-----------+-----------+-------------+---------+
The container storage volumes cannot be deleted outside of lxc delete: executing lxc storage volume delete local container/container1 outputs Error: storage volumes of type "container" cannot be deleted with the storage api.
Hmm, so normally I'd expect lxc delete container1 to succeed, we have code that should have LXD move on if the storage volume is gone. I guess there's a bug in the btrfs driver somehow, will have to reproduce the issue and fix it.
I couldn't reproduce the exact behavior above but I could reproduce some issues which I've now fixed in my branch.
Note that this will still not allow you to just do lxc delete container1 when you're missing the btrfs storage pool, you will need at a minimum to slot in an empty pool so that LXD moves on:
I considered changing our checks to not require this part, but the problem with doing so is that this exact same behavior may happen during boot on systems that use encrypted or external storage and we wouldn't want a user to think they've deleted something to then have it show up on disk a minute later causing all kind of annoying conflicts.
I believe our current behavior of allowing an empty storage pool to be put in place of the existing one is sufficient and will generally match with what would happen should a drive go bad and be replaced or a loop file be corrupted. It lets you repair (potentially loosing some data) or just outright replace the partition/file with another one, then try to use whatever was on there and anything which doesn't behave can be deleted from LXD just fine.
The code from @stgraber's last post, with some minor changes, solved my problem. My full code for deleting container1 from local was:
lxd_dir=/var/lib/lxd
pool=local
container=container1
truncate -s 1G $lxd_dir/disks/$pool.img
mkfs.btrfs $lxd_dir/disks/$pool.img
mkdir $lxd_dir/storage-pools/$pool
mount $lxd_dir/disks/$pool.img $lxd_dir/storage-pools/$pool
mkdir $lxd_dir/storage-pools/$pool/$container
lxc delete $container
After deleting all containers from local, the pool can be deleted with
lxc storage delete $pool
umount $lxd_dir/storage-pools/$pool
rm $lxd_dir/storage-pools/$pool
I faced another minor bug. When container1 and local were still in the database, I ran lxc launch container1 -s different_pool, which resulted in an Already exists error, but container1 would mistakenly appear in the output of lxc storage volume list different_pool even though it wasn't created. Running lxc launch container1 -s different_pool after removing container1 from local fixes the problem: container1 no longer appears from lxc storage volume list different_pool, and the name container1 is reusable again.
@mlaradji Ah, I think the extra steps above likely wouldn't have been needed if you had restarted LXD after creating the new btrfs loop file.
As for the already exists and leftover volume issue, I believe I've fixed that in master already a few days ago, we were basically missing a bunch of volume deletion code on error.