I was wondering if it is possible to evaluate the overhead induced by cadvisor + storage drivers compared to a non-container solution in terms of memory for example?
I mean is there any tests that handle that?
I'm also wondering if there is a way to run cadvisor so it is more light weight (like other prometheus containers).
The fact that it consumes quite a lot of CPU & RAM, while not being probed or looked at, is concerning and maybe my setup is wrong...
@RRAlex I think nobody tested the impact of cAdvisor on system overhead. We have to capture resource usage metrics from cAdvisor and compare them to linux monitoring tools
We have done manual profiling of cAdvisor, along with some tuning as part of scalaing Kubernetes. However, for Kubernetes we were only interested in the container manager performance, since we don't run the full standalone version. We decided we could meat our performance goals by lowering the resolution of collected stats. Since most of CPU time is spent scraping metrics, if you decrease the scraping interval from 1s to 10s, it will roughly cut the CPU usage by 90%.
Just providing a little data point: we're using cAdvisor in standalone mode across about 1000 instances here. On average, it's using 0.2% of CPU and 20MB of RAM. There are a few outliers of course, but we've never really had problems with cAdvisor performance.
As @timstclair mentioned, tuning down the collection interval is helpful here. In our case, we use the following settings:
"--housekeeping_interval=30s" \
"--global_housekeeping_interval=2m" \
"--disable_metrics=disk,tcp" \
"--enable_load_reader" \
"--load_reader_interval=5s"
In our experience, the tcp and disk metrics can be very expensive (though that largely depends on what your containers are doing), but the rest (CPU, LA, Memory) is very cheap.
Perfect. From my side, I tested the memory overhead induced by plugging my containers to cadvisor and influxdb. It was very negligible.
I compared the following values:
These 2 values where roughly equal. Sounds good like this?
Thanks all, I indeed started playing with the housekeeping settings _after_ my initial comment and realized I could tone down it's requirements quite a lot...
Everything is much smoother now! :-)
edit: @mboussaa bellow:
--allow_dynamic_housekeeping=true --housekeeping_interval=10s
It might vary over time, but that worked for now.
I'm just starting setting up Prometheus as I need to integrate cadvisor & al. on prod instances. :)
@RRAlex can you provide us your new settings? So that I can use it for the future
Leaving this open in-case anyone has any interest in adding performance-related testing, or documenting ways to lower cavisor's resource consumption.
Most helpful comment
Just providing a little data point: we're using cAdvisor in standalone mode across about 1000 instances here. On average, it's using 0.2% of CPU and 20MB of RAM. There are a few outliers of course, but we've never really had problems with cAdvisor performance.
As @timstclair mentioned, tuning down the collection interval is helpful here. In our case, we use the following settings:
In our experience, the tcp and disk metrics can be very expensive (though that largely depends on what your containers are doing), but the rest (CPU, LA, Memory) is very cheap.