The memory consumption of the agent has not been fully optimized yet. Optimization of the memory footprint has been a frequent user ask.
The memory consumption consists of:
cilium_metrics eBPF map (currently 64K)cilium_policy map (should be endpoint specific)cilium-cniRegarding binary size and just for the record: the Go toolchain seems to produce slightly bigger binaries with each new release. This is known upstream and tracked in golang/go#6853. There also seems to have been a significant increase with Go 1.11 (golang/go#27266)
Also, golang/go#36313 explains the big runtime blob we see in the output of the go-binsize-viz tool. It's the runtime.pclntab section which (among others) contains a data structures to map program counter to line numbers, see https://golang.org/src/debug/gosym/pclntab.go and the linked issue for details. It seems the Go team is working on reducing this section
The other big blobs are mostly from k8s.io and github.com/cilium/cilium so I'll start tackling these wrt. binary size.
Research what looks to be an invalid global cilium_policy map (should be endpoint specific)
If I read the code correctly, this is the map used for tail calls:
So it seems it's correct that this is a global map, but the name could probably be better.
Reduce the binary size of cilium-cni
Regarding the reduction of cilium-cni binary size: by far the biggest amount of additional dependencies is pulled in by type ENISpec from github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2. These additional dependcies make up for around 26M of the binary size (47M vs 21M). I don't yet understand enough about the k8s API packages but could imagine that moving all the types to a separate types package would help reducing binary size across the board.
Interesting comment here wrt. ctmap size:
Regarding the reduction of
cilium-cnibinary size: by far the biggest amount of additional dependencies is pulled in bytype ENISpecfromgithub.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2. These additional dependcies make up for around 26M of the binary size (47M vs 21M). I don't yet understand enough about the k8s API packages but could imagine that moving all the types to a separatetypespackage would help reducing binary size across the board.
@aanm What is your opinion? It seems plausible to aim for a simple type package that does not require us to pull in the entire autogenerated code.
Another simple short-term solution would be to copy type ENISpec into cilium-cni/types and add comments to make sure any future updates would be done to both structures.
Have we considered a totally different approach re CNI: all CNI ADD and DEL code is moved into the agent, and the CNI plugin is only doing an RPC call to the agent?
Have we considered a totally different approach re CNI: all CNI ADD and DEL code is moved into the agent, and the CNI plugin is only doing an RPC call to the agent?
The ENISpec is dragged into the NetConf struct to allow configuring ENI aspects via the CNI configuration file. So it's basically an absolutely trivial JSON spec we need. It must be possible to avoid dragging in massive dependency chains because of that.
Since we just use it for the JSON spec, I tried duplicating ENISpec into cilium-cni/types with #10282 as a low-effort short term solution. Currently, it's running through CI. I'm not sure whether I overlooked something?
Regarding the reduction of
cilium-cnibinary size: by far the biggest amount of additional dependencies is pulled in bytype ENISpecfromgithub.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2. These additional dependcies make up for around 26M of the binary size (47M vs 21M). I don't yet understand enough about the k8s API packages but could imagine that moving all the types to a separatetypespackage would help reducing binary size across the board.@aanm What is your opinion? It seems plausible to aim for a simple type package that does not require us to pull in the entire autogenerated code.
It seems odd that the ENISpec struct is in that package in the first place. Probably should be under pkg/eni or something like that.
Have we considered a totally different approach re CNI: all CNI ADD and DEL code is moved into the agent, and the CNI plugin is only doing an RPC call to the agent?
This approach would make much more sense to me.
It seems odd that the
ENISpecstruct is in that package in the first place. Probably should be underpkg/enior something like that.
Why? It's part of the CiliumNode CRD.
Have we considered a totally different approach re CNI: all CNI ADD and DEL code is moved into the
agent, and the CNI plugin is only doing an RPC call to the agent?
This will have a lot of implications. The cilium-cni binary currently runs on the host itself and in the same context in which it is called in. Moving this into the cilium-agent will require us to enter the namespace of a container from another container namespace instead of doing so from the init namespace.
We can have the conversation but the ENISpec should definitely not be the motivation for that discussion ;-)
It seems odd that the
ENISpecstruct is in that package in the first place. Probably should be underpkg/enior something like that.Why? It's part of the CiliumNode CRD.
The api.Rule(s) are also part of CiliumNP CRD but they don't leave under github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2 for example.
FWIW, regarding binary size: switching to Go 1.14 (#10340) will reduce size of the unstripped binaries by about 1MB.
Profile & reduce the memory consumption of secondary components
github.com/go-openapi/analysis was showing up high in memory profiles during startup of the agent:
2076.76kB 21.23% 21.23% 2076.76kB 21.23% github.com/go-openapi/analysis.(*Spec).reset
1024.06kB 10.47% 57.11% 1024.06kB 10.47% github.com/go-openapi/analysis.(*Spec).analyzeSchema
this is mainly due to copying parts of the schema around during analysis instead of passing them by pointer. Note that this memory is eventually GCed again by the Go runtime, but it will still lead to a memory peak during startup.
I sent go-openapi/analysis#54 with a fix (go-openapi/loads#26 will also be needed before we can re-vendor).
i'm seeing this error on minikube. seems related?
level=fatal msg="Error while creating daemon" error="invalid daemon configuration: specified NAT tables size 841429 must not exceed maximum CT table size 786432" subsys=daemon
@michi-covalent I think this was just because you've installed via master in the past few days; if you reinstall using the latest helm charts then I believe this should just work.
@tklauser ^ probably worth double-checking that the above issue is not going to affect any existing users who upgrade from a stable release of Cilium to v1.8 when it comes out.
@michi-covalent @joestringer The error occurs because I forgot to update quick-install.yaml in PR #10289 with the new default NAT table. Sent PR #10604 with a fix.
Also verified upgrading from 1.7 stable release and it works fine with PR #10604
Moved open tasks to https://github.com/cilium/cilium/issues/11726 for 1.9 tracking
Most helpful comment
If I read the code correctly, this is the map used for tail calls:
https://github.com/cilium/cilium/blob/da2e6526b8da79ce8948399fca23691ef0b62be2/pkg/maps/policymap/policymap.go#L32-L34
https://github.com/cilium/cilium/blob/da2e6526b8da79ce8948399fca23691ef0b62be2/pkg/maps/policymap/callmap.go#L70-L81
https://github.com/cilium/cilium/blob/da2e6526b8da79ce8948399fca23691ef0b62be2/pkg/datapath/linux/config/config.go#L154
So it seems it's correct that this is a global map, but the name could probably be better.