Required information
Distribution: Ubuntu
Distribution version: Bionic 18.04
The output of "lxc info":
```
config:
core.https_address: REDACTED
core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses:
- REDACTED:8443
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
REDACTED
-----END CERTIFICATE-----
certificate_fingerprint: REDACTED
driver: lxc
driver_version: 3.2.1
kernel: Linux
kernel_architecture: x86_64
kernel_features:
netnsid_getifaddrs: "true"
seccomp_listener: "true"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.0.0-1021-gcp
lxc_features:
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
seccomp_notify: "true"
project: default
server: lxd
server_clustered: false
server_name: REDACTED
server_pid: 34441
server_version: "3.18"
storage: zfs
storage_version: 0.8.2-2~18.04.york0
```
Issue description
I think this is probably two bugs, but I don't have any idea how to reproduce the first, I'll just include it as it's important to the setup:
Occasionally, it seems an lxc delete <container> can fail. The ZFS dataset is destroyed, the only thing left is an empty dataset under
snapshots, but the container remains present in LXD's database in the "STOPPED" state. In most cases a subsequent lxc delete <container> cleans things up without issues.
However lately we've had a further issue (the one this issue is about) where the further lxc delete <container> fails as well. I think this is because the dataset is destroyed, and unmounted, but LXD is dropping a backup.yml file in the directory for the container. I think (I have not checked the code) that LXD doesn't check if this directory is empty, it only checks if the dataset is unmounted, then tries to unlink the directory, which fails because it's not empty.
It'd be great if, until the former issue is tracked down (working on it), LXD gracefully handled this situation... because at the moment with this issue there's no way LXD can recover on its own and someone has to shell in, check everything is correct (the container really doesn't exist any more), then remove the file and re-issue the delete command.
Any ideas on how to track down the first issue would be appreciated too, but I'll keep trying to figure it out.
Steps to reproduce
I don't really have good steps to reproduce (can't work out how to get into the first situation or I'd file a bug for that too), but here's the flow on an affected server:
root@lxd:~# zfs list | grep aaa-container
lxd/snapshots/aaa-container 96K 1.27T 96K none
root@lxd:~# lxc delete aaa-container
Error: remove /var/snap/lxd/common/lxd/storage-pools/default/containers/aaa-container: directory not empty
root@lxd:~# ls /var/snap/lxd/common/lxd/storage-pools/default/containers/aaa-container/
backup.yaml
root@lxd:~# rm /var/snap/lxd/common/lxd/storage-pools/default/containers/aaa-container/backup.yaml
root@lxd:~# lxc delete aaa-container
root@lxd:~#
Information to attach
I don't think any of this information is relevant, there's no container logs or anything because the container is deleted. Let me know if that assumption is incorrect.
Most helpful comment
https://github.com/lxc/lxd/pull/6560/commits/7199afba981ece28b40d5230e832307f3b3e0823 in https://github.com/lxc/lxd/pull/6560 handles this type of races. So we've literally written a fix for this accidentally earlier today :)