Lxd: Memory issue with lxd proxy (possible leak)

Created on 5 Apr 2019  路  12Comments  路  Source: lxc/lxd

  • Distribution: Ubuntu
  • Distribution version: 18.04
  • The output of "lxc info" or if that fails:

    • Kernel version:

    • LXC version:

    • LXD version:

    • Storage backend in use:

Issue description

After implementing LXD proxying HTTP & HTTPS, these forkproxy processes increasingly use more memory to the point the oomkiller kills processes to clear memory.

[319347.277419] Out of memory: Kill process 2635 (lxd) score 104 or sacrifice child
[319347.277541] Killed process 2635 (lxd) total-vm:1198088kB, anon-rss:527396kB, file-rss:0kB, shmem-rss:0kB
[319347.354124] oom_reaper: reaped process 2635 (lxd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Steps to reproduce

  1. Just wait, the lxd forkproxy process will continuously eat up more memory

Information to attach

syslog.txt
lxclog.txt
lxcconfig.txt

Bug

Most helpful comment

New test run for 100s, did 15116 requests, memory started at 20MB, ended at 35MB, memory flush gets it back down to 20MB, goroutine count is back to the original 6.

No more goroutine leak and no more memory leak, sending branch.

All 12 comments

Which proxy are you seeing leaking memory? TCP only, UDP only, both?

On a HTTP and a HTTPS proxy so that's TCP. It also uses the proxy_protocol.

ok, thanks, that should help us reproduce this.

Please show the output of lxc info so we get all needed version info.

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    etcetera..
    -----END CERTIFICATE-----
  certificate_fingerprint: etcetera
  driver: lxc
  driver_version: 3.1.0
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.15.0-46-generic
  server: lxd
  server_pid: 1332
  server_version: "3.11"
  storage: zfs
  storage_version: 0.7.5-1ubuntu16.4
  server_clustered: false
  server_name: host
  project: default

Right now, one of the proxy processes is increasing RAM usage with ~75MB per day.
The pmap output of the process :

13953:   /snap/lxd/current/bin/lxd forkproxy 1332 tcp:xxx.xxx.xxx.xxx:80 13840 tcp:localhost:80 /var/snap/lxd/common/lxd/logs/web/proxy.proxy-http.log /var/snap/lxd/common/lxd/devices/web/proxy.proxy-http      true
0000000000400000  25032K r-x-- lxd
0000000001e71000      4K r---- lxd
0000000001e72000    376K rw--- lxd
0000000001ed0000    152K rw---   [ anon ]
0000000003193000    132K rw---   [ anon ]
000000c000000000 102400K rw---   [ anon ]
000000c006400000   2048K rw---   [ anon ]
000000c006600000 137216K rw---   [ anon ]
000000c00ec00000  20480K rw---   [ anon ]
00007f68f8000000    132K rw---   [ anon ]
00007f68f8021000  65404K -----   [ anon ]
00007f68fc000000    132K rw---   [ anon ]
00007f68fc021000  65404K -----   [ anon ]
00007f69033dd000   4232K rw---   [ anon ]
00007f69037ff000      4K -----   [ anon ]
00007f6903800000   8192K rw---   [ anon ]
00007f6904000000    132K rw---   [ anon ]
00007f6904021000  65404K -----   [ anon ]
00007f6908000000    132K rw---   [ anon ]
00007f6908021000  65404K -----   [ anon ]
00007f690c000000    132K rw---   [ anon ]
00007f690c021000  65404K -----   [ anon ]
00007f6910000000    132K rw---   [ anon ]
00007f6910021000  65404K -----   [ anon ]
00007f6914000000    132K rw---   [ anon ]
00007f6914021000  65404K -----   [ anon ]
00007f6918116000   2948K rw---   [ anon ]
00007f69183f7000      4K -----   [ anon ]
00007f69183f8000   8192K rw---   [ anon ]
00007f6918bf8000      4K -----   [ anon ]
00007f6918bf9000   8192K rw---   [ anon ]
00007f69193f9000     44K r-x-- libnss_files-2.27.so
00007f6919404000   2044K ----- libnss_files-2.27.so
00007f6919603000      4K r---- libnss_files-2.27.so
00007f6919604000      4K rw--- libnss_files-2.27.so
00007f6919605000   1432K rw---   [ anon ]
00007f691976b000      4K -----   [ anon ]
00007f691976c000   8192K rw---   [ anon ]
00007f6919f6c000      4K -----   [ anon ]
00007f6919f6d000   8192K rw---   [ anon ]
00007f691a76d000      4K -----   [ anon ]
00007f691a76e000   8192K rw---   [ anon ]
00007f691af6e000      4K -----   [ anon ]
00007f691af6f000   8192K rw---   [ anon ]
00007f691b76f000      4K -----   [ anon ]
00007f691b770000  43076K rw---   [ anon ]
00007f691e181000     28K r-x-- libffi.so.6.0.4
00007f691e188000   2044K ----- libffi.so.6.0.4
00007f691e387000      4K r---- libffi.so.6.0.4
00007f691e388000      4K rw--- libffi.so.6.0.4
00007f691e389000     28K r-x-- librt-2.23.so
00007f691e390000   2044K ----- librt-2.23.so
00007f691e58f000      4K r---- librt-2.23.so
00007f691e590000      4K rw--- librt-2.23.so
00007f691e591000    440K r-x-- libpcre.so.3.13.2
00007f691e5ff000   2048K ----- libpcre.so.3.13.2
00007f691e7ff000      4K r---- libpcre.so.3.13.2
00007f691e800000      4K rw--- libpcre.so.3.13.2
00007f691e801000    508K r-x-- libgmp.so.10.3.0
00007f691e880000   2044K ----- libgmp.so.10.3.0
00007f691ea7f000      4K r---- libgmp.so.10.3.0
00007f691ea80000      4K rw--- libgmp.so.10.3.0
00007f691ea81000    200K r-x-- libhogweed.so.4.2
00007f691eab3000   2044K ----- libhogweed.so.4.2
00007f691ecb2000      4K r---- libhogweed.so.4.2
00007f691ecb3000      4K rw--- libhogweed.so.4.2
00007f691ecb4000    208K r-x-- libnettle.so.6.2
00007f691ece8000   2044K ----- libnettle.so.6.2
00007f691eee7000      8K r---- libnettle.so.6.2
00007f691eee9000      4K rw--- libnettle.so.6.2
00007f691eeea000     68K r-x-- libtasn1.so.6.5.1
00007f691eefb000   2048K ----- libtasn1.so.6.5.1
00007f691f0fb000      4K r---- libtasn1.so.6.5.1
00007f691f0fc000      4K rw--- libtasn1.so.6.5.1
00007f691f0fd000    196K r-x-- libidn.so.11.6.15
00007f691f12e000   2048K ----- libidn.so.11.6.15
00007f691f32e000      4K r---- libidn.so.11.6.15
00007f691f32f000      4K rw--- libidn.so.11.6.15
00007f691f330000    356K r-x-- libp11-kit.so.0.1.0
00007f691f389000   2044K ----- libp11-kit.so.0.1.0
00007f691f588000     40K r---- libp11-kit.so.0.1.0
00007f691f592000      8K rw--- libp11-kit.so.0.1.0
00007f691f594000    100K r-x-- libz.so.1.2.8
00007f691f5ad000   2044K ----- libz.so.1.2.8
00007f691f7ac000      4K r---- libz.so.1.2.8
00007f691f7ad000      4K rw--- libz.so.1.2.8
00007f691f7ae000    136K r-x-- libuv.so.1.0.0
00007f691f7d0000   2044K ----- libuv.so.1.0.0
00007f691f9cf000      4K r---- libuv.so.1.0.0
00007f691f9d0000      4K rw--- libuv.so.1.0.0
00007f691f9d1000     12K r-x-- libdl-2.23.so
00007f691f9d4000   2044K ----- libdl-2.23.so
00007f691fbd3000      4K r---- libdl-2.23.so
00007f691fbd4000      4K rw--- libdl-2.23.so
00007f691fbd5000     16K r-x-- libattr.so.1.1.0
00007f691fbd9000   2044K ----- libattr.so.1.1.0
00007f691fdd8000      4K r---- libattr.so.1.1.0
00007f691fdd9000      4K rw--- libattr.so.1.1.0
00007f691fdda000     16K r-x-- libcap.so.2.24
00007f691fdde000   2048K ----- libcap.so.2.24
00007f691ffde000      4K r---- libcap.so.2.24
00007f691ffdf000      4K rw--- libcap.so.2.24
00007f691ffe0000    184K r-x-- libseccomp.so.2.3.1
00007f692000e000   2048K ----- libseccomp.so.2.3.1
00007f692020e000     88K r---- libseccomp.so.2.3.1
00007f6920224000      4K rw--- libseccomp.so.2.3.1
00007f6920225000    124K r-x-- libselinux.so.1
00007f6920244000   2044K ----- libselinux.so.1
00007f6920443000      4K r---- libselinux.so.1
00007f6920444000      4K rw--- libselinux.so.1
00007f6920445000      8K rw---   [ anon ]
00007f6920447000   1164K r-x-- libgnutls.so.30.6.2
00007f692056a000   2044K ----- libgnutls.so.30.6.2
00007f6920769000     44K r---- libgnutls.so.30.6.2
00007f6920774000      8K rw--- libgnutls.so.30.6.2
00007f6920776000      4K rw---   [ anon ]
00007f6920777000   1792K r-x-- libc-2.23.so
00007f6920937000   2048K ----- libc-2.23.so
00007f6920b37000     16K r---- libc-2.23.so
00007f6920b3b000      8K rw--- libc-2.23.so
00007f6920b3d000     16K rw---   [ anon ]
00007f6920b41000    100K r-x-- libdqlite.so.0.0.1
00007f6920b5a000   2044K ----- libdqlite.so.0.0.1
00007f6920d59000      4K r---- libdqlite.so.0.0.1
00007f6920d5a000      4K rw--- libdqlite.so.0.0.1
00007f6920d5b000    740K r-x-- libsqlite3.so.0.8.6
00007f6920e14000   2044K ----- libsqlite3.so.0.8.6
00007f6921013000     12K r---- libsqlite3.so.0.8.6
00007f6921016000     12K rw--- libsqlite3.so.0.8.6
00007f6921019000     28K r-x-- libacl.so.1.1.0
00007f6921020000   2044K ----- libacl.so.1.1.0
00007f692121f000      4K r---- libacl.so.1.1.0
00007f6921220000      4K rw--- libacl.so.1.1.0
00007f6921221000     96K r-x-- libpthread-2.23.so
00007f6921239000   2044K ----- libpthread-2.23.so
00007f6921438000      4K r---- libpthread-2.23.so
00007f6921439000      4K rw--- libpthread-2.23.so
00007f692143a000     16K rw---   [ anon ]
00007f692143e000      8K r-x-- libutil-2.23.so
00007f6921440000   2044K ----- libutil-2.23.so
00007f692163f000      4K r---- libutil-2.23.so
00007f6921640000      4K rw--- libutil-2.23.so
00007f6921641000   1024K r-x-- liblxc.so.1.5.0
00007f6921741000   2048K ----- liblxc.so.1.5.0
00007f6921941000      4K r---- liblxc.so.1.5.0
00007f6921942000     16K rw--- liblxc.so.1.5.0
00007f6921946000    152K r-x-- ld-2.23.so
00007f6921972000   1952K rw---   [ anon ]
00007f6921b5a000     48K rw---   [ anon ]
00007f6921b6a000      4K rw---   [ anon ]
00007f6921b6b000      4K r---- ld-2.23.so
00007f6921b6c000      4K rw--- ld-2.23.so
00007f6921b6d000      4K rw---   [ anon ]
00007ffe21729000    132K rw---   [ stack ]
00007ffe21790000     12K r----   [ anon ]
00007ffe21793000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total           917184K

And the top memory usage:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
13953 165536    20   0  917180 245292   1776 S   0.0  6.1   6:31.81 lxd
13986 165536    20   0  658548 202264   1564 S   0.0  5.0   6:25.11 lxd

As you can see, that's an awful lot of memory for a proxy process.

Ok, spamming the proxy with https request makes it leak quite nicely.
I've added some code to trigger a manual garbage collection run to make sure the memory wouldn't get released eventually and it doesn't appear to help much.

So looks like we're leaking goroutines like crazy :)

I've added a ton of temporary profiling, allowing me to force flush memory, running a built-in pprof web server and ability to count goroutines at any point in time.

Process starts around 20MB of RAM, I then hit it for 100s or about 12500 requests, it's at 82MB at that point, garbage collector gets it down to 60MB but goroutines count went from 6 up to 12585, showing that every single request leaves a permanently running goroutine...

Every one of those goroutines show:

goroutine 62912 [chan send, 1 minutes]:
main.genericRelay.func1(0x12eaec0, 0xc4201823b8, 0x12eaec0, 0xc4201823d0, 0xc423a65b60)
    /home/stgraber/data/code/go/src/github.com/lxc/lxd/lxd/main_forkproxy.go:838 +0x75
created by main.genericRelay
    /home/stgraber/data/code/go/src/github.com/lxc/lxd/lxd/main_forkproxy.go:848 +0x30b

Tracked it down to bad channel handling, doing a test run now to confirm we don't leak goroutines and that this fixes the memory leak too.

New test run for 100s, did 15116 requests, memory started at 20MB, ended at 35MB, memory flush gets it back down to 20MB, goroutine count is back to the original 6.

No more goroutine leak and no more memory leak, sending branch.

Was this page helpful?
0 / 5 - 0 ratings