After implementing LXD proxying HTTP & HTTPS, these forkproxy processes increasingly use more memory to the point the oomkiller kills processes to clear memory.
[319347.277419] Out of memory: Kill process 2635 (lxd) score 104 or sacrifice child
[319347.277541] Killed process 2635 (lxd) total-vm:1198088kB, anon-rss:527396kB, file-rss:0kB, shmem-rss:0kB
[319347.354124] oom_reaper: reaped process 2635 (lxd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Which proxy are you seeing leaking memory? TCP only, UDP only, both?
On a HTTP and a HTTPS proxy so that's TCP. It also uses the proxy_protocol.
ok, thanks, that should help us reproduce this.
Please show the output of lxc info so we get all needed version info.
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses: []
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
etcetera..
-----END CERTIFICATE-----
certificate_fingerprint: etcetera
driver: lxc
driver_version: 3.1.0
kernel: Linux
kernel_architecture: x86_64
kernel_version: 4.15.0-46-generic
server: lxd
server_pid: 1332
server_version: "3.11"
storage: zfs
storage_version: 0.7.5-1ubuntu16.4
server_clustered: false
server_name: host
project: default
Right now, one of the proxy processes is increasing RAM usage with ~75MB per day.
The pmap output of the process :
13953: /snap/lxd/current/bin/lxd forkproxy 1332 tcp:xxx.xxx.xxx.xxx:80 13840 tcp:localhost:80 /var/snap/lxd/common/lxd/logs/web/proxy.proxy-http.log /var/snap/lxd/common/lxd/devices/web/proxy.proxy-http true
0000000000400000 25032K r-x-- lxd
0000000001e71000 4K r---- lxd
0000000001e72000 376K rw--- lxd
0000000001ed0000 152K rw--- [ anon ]
0000000003193000 132K rw--- [ anon ]
000000c000000000 102400K rw--- [ anon ]
000000c006400000 2048K rw--- [ anon ]
000000c006600000 137216K rw--- [ anon ]
000000c00ec00000 20480K rw--- [ anon ]
00007f68f8000000 132K rw--- [ anon ]
00007f68f8021000 65404K ----- [ anon ]
00007f68fc000000 132K rw--- [ anon ]
00007f68fc021000 65404K ----- [ anon ]
00007f69033dd000 4232K rw--- [ anon ]
00007f69037ff000 4K ----- [ anon ]
00007f6903800000 8192K rw--- [ anon ]
00007f6904000000 132K rw--- [ anon ]
00007f6904021000 65404K ----- [ anon ]
00007f6908000000 132K rw--- [ anon ]
00007f6908021000 65404K ----- [ anon ]
00007f690c000000 132K rw--- [ anon ]
00007f690c021000 65404K ----- [ anon ]
00007f6910000000 132K rw--- [ anon ]
00007f6910021000 65404K ----- [ anon ]
00007f6914000000 132K rw--- [ anon ]
00007f6914021000 65404K ----- [ anon ]
00007f6918116000 2948K rw--- [ anon ]
00007f69183f7000 4K ----- [ anon ]
00007f69183f8000 8192K rw--- [ anon ]
00007f6918bf8000 4K ----- [ anon ]
00007f6918bf9000 8192K rw--- [ anon ]
00007f69193f9000 44K r-x-- libnss_files-2.27.so
00007f6919404000 2044K ----- libnss_files-2.27.so
00007f6919603000 4K r---- libnss_files-2.27.so
00007f6919604000 4K rw--- libnss_files-2.27.so
00007f6919605000 1432K rw--- [ anon ]
00007f691976b000 4K ----- [ anon ]
00007f691976c000 8192K rw--- [ anon ]
00007f6919f6c000 4K ----- [ anon ]
00007f6919f6d000 8192K rw--- [ anon ]
00007f691a76d000 4K ----- [ anon ]
00007f691a76e000 8192K rw--- [ anon ]
00007f691af6e000 4K ----- [ anon ]
00007f691af6f000 8192K rw--- [ anon ]
00007f691b76f000 4K ----- [ anon ]
00007f691b770000 43076K rw--- [ anon ]
00007f691e181000 28K r-x-- libffi.so.6.0.4
00007f691e188000 2044K ----- libffi.so.6.0.4
00007f691e387000 4K r---- libffi.so.6.0.4
00007f691e388000 4K rw--- libffi.so.6.0.4
00007f691e389000 28K r-x-- librt-2.23.so
00007f691e390000 2044K ----- librt-2.23.so
00007f691e58f000 4K r---- librt-2.23.so
00007f691e590000 4K rw--- librt-2.23.so
00007f691e591000 440K r-x-- libpcre.so.3.13.2
00007f691e5ff000 2048K ----- libpcre.so.3.13.2
00007f691e7ff000 4K r---- libpcre.so.3.13.2
00007f691e800000 4K rw--- libpcre.so.3.13.2
00007f691e801000 508K r-x-- libgmp.so.10.3.0
00007f691e880000 2044K ----- libgmp.so.10.3.0
00007f691ea7f000 4K r---- libgmp.so.10.3.0
00007f691ea80000 4K rw--- libgmp.so.10.3.0
00007f691ea81000 200K r-x-- libhogweed.so.4.2
00007f691eab3000 2044K ----- libhogweed.so.4.2
00007f691ecb2000 4K r---- libhogweed.so.4.2
00007f691ecb3000 4K rw--- libhogweed.so.4.2
00007f691ecb4000 208K r-x-- libnettle.so.6.2
00007f691ece8000 2044K ----- libnettle.so.6.2
00007f691eee7000 8K r---- libnettle.so.6.2
00007f691eee9000 4K rw--- libnettle.so.6.2
00007f691eeea000 68K r-x-- libtasn1.so.6.5.1
00007f691eefb000 2048K ----- libtasn1.so.6.5.1
00007f691f0fb000 4K r---- libtasn1.so.6.5.1
00007f691f0fc000 4K rw--- libtasn1.so.6.5.1
00007f691f0fd000 196K r-x-- libidn.so.11.6.15
00007f691f12e000 2048K ----- libidn.so.11.6.15
00007f691f32e000 4K r---- libidn.so.11.6.15
00007f691f32f000 4K rw--- libidn.so.11.6.15
00007f691f330000 356K r-x-- libp11-kit.so.0.1.0
00007f691f389000 2044K ----- libp11-kit.so.0.1.0
00007f691f588000 40K r---- libp11-kit.so.0.1.0
00007f691f592000 8K rw--- libp11-kit.so.0.1.0
00007f691f594000 100K r-x-- libz.so.1.2.8
00007f691f5ad000 2044K ----- libz.so.1.2.8
00007f691f7ac000 4K r---- libz.so.1.2.8
00007f691f7ad000 4K rw--- libz.so.1.2.8
00007f691f7ae000 136K r-x-- libuv.so.1.0.0
00007f691f7d0000 2044K ----- libuv.so.1.0.0
00007f691f9cf000 4K r---- libuv.so.1.0.0
00007f691f9d0000 4K rw--- libuv.so.1.0.0
00007f691f9d1000 12K r-x-- libdl-2.23.so
00007f691f9d4000 2044K ----- libdl-2.23.so
00007f691fbd3000 4K r---- libdl-2.23.so
00007f691fbd4000 4K rw--- libdl-2.23.so
00007f691fbd5000 16K r-x-- libattr.so.1.1.0
00007f691fbd9000 2044K ----- libattr.so.1.1.0
00007f691fdd8000 4K r---- libattr.so.1.1.0
00007f691fdd9000 4K rw--- libattr.so.1.1.0
00007f691fdda000 16K r-x-- libcap.so.2.24
00007f691fdde000 2048K ----- libcap.so.2.24
00007f691ffde000 4K r---- libcap.so.2.24
00007f691ffdf000 4K rw--- libcap.so.2.24
00007f691ffe0000 184K r-x-- libseccomp.so.2.3.1
00007f692000e000 2048K ----- libseccomp.so.2.3.1
00007f692020e000 88K r---- libseccomp.so.2.3.1
00007f6920224000 4K rw--- libseccomp.so.2.3.1
00007f6920225000 124K r-x-- libselinux.so.1
00007f6920244000 2044K ----- libselinux.so.1
00007f6920443000 4K r---- libselinux.so.1
00007f6920444000 4K rw--- libselinux.so.1
00007f6920445000 8K rw--- [ anon ]
00007f6920447000 1164K r-x-- libgnutls.so.30.6.2
00007f692056a000 2044K ----- libgnutls.so.30.6.2
00007f6920769000 44K r---- libgnutls.so.30.6.2
00007f6920774000 8K rw--- libgnutls.so.30.6.2
00007f6920776000 4K rw--- [ anon ]
00007f6920777000 1792K r-x-- libc-2.23.so
00007f6920937000 2048K ----- libc-2.23.so
00007f6920b37000 16K r---- libc-2.23.so
00007f6920b3b000 8K rw--- libc-2.23.so
00007f6920b3d000 16K rw--- [ anon ]
00007f6920b41000 100K r-x-- libdqlite.so.0.0.1
00007f6920b5a000 2044K ----- libdqlite.so.0.0.1
00007f6920d59000 4K r---- libdqlite.so.0.0.1
00007f6920d5a000 4K rw--- libdqlite.so.0.0.1
00007f6920d5b000 740K r-x-- libsqlite3.so.0.8.6
00007f6920e14000 2044K ----- libsqlite3.so.0.8.6
00007f6921013000 12K r---- libsqlite3.so.0.8.6
00007f6921016000 12K rw--- libsqlite3.so.0.8.6
00007f6921019000 28K r-x-- libacl.so.1.1.0
00007f6921020000 2044K ----- libacl.so.1.1.0
00007f692121f000 4K r---- libacl.so.1.1.0
00007f6921220000 4K rw--- libacl.so.1.1.0
00007f6921221000 96K r-x-- libpthread-2.23.so
00007f6921239000 2044K ----- libpthread-2.23.so
00007f6921438000 4K r---- libpthread-2.23.so
00007f6921439000 4K rw--- libpthread-2.23.so
00007f692143a000 16K rw--- [ anon ]
00007f692143e000 8K r-x-- libutil-2.23.so
00007f6921440000 2044K ----- libutil-2.23.so
00007f692163f000 4K r---- libutil-2.23.so
00007f6921640000 4K rw--- libutil-2.23.so
00007f6921641000 1024K r-x-- liblxc.so.1.5.0
00007f6921741000 2048K ----- liblxc.so.1.5.0
00007f6921941000 4K r---- liblxc.so.1.5.0
00007f6921942000 16K rw--- liblxc.so.1.5.0
00007f6921946000 152K r-x-- ld-2.23.so
00007f6921972000 1952K rw--- [ anon ]
00007f6921b5a000 48K rw--- [ anon ]
00007f6921b6a000 4K rw--- [ anon ]
00007f6921b6b000 4K r---- ld-2.23.so
00007f6921b6c000 4K rw--- ld-2.23.so
00007f6921b6d000 4K rw--- [ anon ]
00007ffe21729000 132K rw--- [ stack ]
00007ffe21790000 12K r---- [ anon ]
00007ffe21793000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 917184K
And the top memory usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13953 165536 20 0 917180 245292 1776 S 0.0 6.1 6:31.81 lxd
13986 165536 20 0 658548 202264 1564 S 0.0 5.0 6:25.11 lxd
As you can see, that's an awful lot of memory for a proxy process.
Ok, spamming the proxy with https request makes it leak quite nicely.
I've added some code to trigger a manual garbage collection run to make sure the memory wouldn't get released eventually and it doesn't appear to help much.
So looks like we're leaking goroutines like crazy :)
I've added a ton of temporary profiling, allowing me to force flush memory, running a built-in pprof web server and ability to count goroutines at any point in time.
Process starts around 20MB of RAM, I then hit it for 100s or about 12500 requests, it's at 82MB at that point, garbage collector gets it down to 60MB but goroutines count went from 6 up to 12585, showing that every single request leaves a permanently running goroutine...
Every one of those goroutines show:
goroutine 62912 [chan send, 1 minutes]:
main.genericRelay.func1(0x12eaec0, 0xc4201823b8, 0x12eaec0, 0xc4201823d0, 0xc423a65b60)
/home/stgraber/data/code/go/src/github.com/lxc/lxd/lxd/main_forkproxy.go:838 +0x75
created by main.genericRelay
/home/stgraber/data/code/go/src/github.com/lxc/lxd/lxd/main_forkproxy.go:848 +0x30b
Tracked it down to bad channel handling, doing a test run now to confirm we don't leak goroutines and that this fixes the memory leak too.
New test run for 100s, did 15116 requests, memory started at 20MB, ended at 35MB, memory flush gets it back down to 20MB, goroutine count is back to the original 6.
No more goroutine leak and no more memory leak, sending branch.
Most helpful comment
New test run for 100s, did 15116 requests, memory started at 20MB, ended at 35MB, memory flush gets it back down to 20MB, goroutine count is back to the original 6.
No more goroutine leak and no more memory leak, sending branch.