we are using cAdvisor to collect docker container metrics with Prometheus. There are tons of cgroup related metrics being exposed that are not relevant and we want cAdvisor to filter them out
--docker_only flag does not change anything for some reason, we continue to see all /system.slice/* stuff being exposed. After some googling, I came across rawPrefixWhiteList = "/system.slice/kubelet.service, /system.slice/docker.service.." setting #2048 #1926
But I can not figure out how to use that setting and how it relates to docker_only flag? Could anyone explain it in more details how to use rawPrefixWhiteList? Seems it's not documented anywhere.
docker_only makes cAdvisor only collect docker container metrics, and machine level metrics. We don't actually expose rawPrefisWhiteList as a flag. We may want to consider consuming this in the kubelet to get that behavior.
thanks for your quick response.
does it mean we should NOT see /system.slice/* metrics when using docker_only flag? What do you refer to as machine level metrics?
I'm just looking for a way to tell cAdvisor not to collect/expose /system.slice/* metrics as there is noticeable performance impact. We don't use kubernetes
with docker_only, you only get cgroup metrics for cgroups that belong to docker containers, e.g. /kubepods/pod<pod_uid>/<container_id> will be collected, but /kubepods/pod<pod_uid> will not.
Machine metrics are metrics from the / cgroup, as well as, for example, disk metrics from filesystems on the node. We collect these metrics when --docker_only is specified even though it isn't a docker container cgroup.
so, is there a way not to collect system.slice/* metrics?
We don't use kubernetes too, but want to use cadvisor to collect docker container metrics. It collect all cgroup metrics in / , prometheus curl these metrics take a long time.
I wanted to elaborate more on our use case for cAdvisor and Prometheus.
We're running fleets of spot instances that come and go many times a day with peak number of 1500-2000 instances at a moment. Each spot instance hosts a dozen of docker containers which are not static either. All in all it is pretty dynamic environment which in turn produces millions of different time series and leads to extremely high cardinality metrics.
Basically we need just a handful of metrics to monitor CPU and memory usage of every docker container at resolution of every few seconds.
cAdvisor exposes over 2700 time series on every spot instance at any given time. While trying to ingest all of those from every spot into Prometheus, I observed over 400 millions(!) of head time series before it crashed the biggest Prometheus server I could try.
Yes, I can drop the unnecessary metrics at Prometheus side and it does help to get cardinality under control. Still the 2700 time series is over one megabyte of data which Prometheus has difficult time to scrape at high pace. Given there are some 2000 endpoints to scrape, it's a lot of traffic too. It seems there is cpu overhead for collecting un-needed metrics as well.
For the above reasons, it would be helpful to have the option to collect and expose docker related metrics only.
@dashpole thanks for taking care of this. Is the pull request #72787 above going to solve the issue when cAdvisor is used outside of kubernetes?
no, but you should be able to use the --docker_only flag if you just want metrics for docker containers.
@dashpole at the moment cadvisor with the --docker_only flag collects metrics from the /system.slice cgroup which is exactly what we are trying to avoid. Is it going to change that behavior?
@viberan that sounds like a bug. The --docker_only flag should mean you only get docker containers + the / (root) container.
can you paste your configuration (flags + cadvisor version) and the prometheus metrics you are getting here?
@dashpole sorry for delay in response. Below is excerpt from docker compose file:
cadvisor:
image: google/cadvisor:v0.32.0
network_mode: bridge
container_name: cadvisor
ports:
- 9600:8080
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
restart: unless-stopped
command: ["--docker_only=true"]
the output of docker top command:
root@xxxx# docker top cadvisor
UID PID PPID C STIME TTY TIME CMD
root 1495 1477 8 15:45 ? 00:01:59 /usr/bin/cadvisor -logtostderr --docker_only=true
getting ~ 3000 timeseries for prometheus:
curl -s localhost:9600/metrics | wc -l
3010
which is about 1Mb of data
curl -s localhost:9600/metrics | wc -c
1182807
and there are about 2000 system.slice related timeseries out of the above:
curl -s localhost:9600/metrics | grep 'system.slice' | wc -l
2014
metric example:
container_tasks_state{container_label_com_docker_compose_config_hash="",container_label_com_docker_compose_container_number="",container_label_com_docker_compose_oneoff="",container_label_com_docker_compose_project="",container_label_com_docker_compose_service="",container_label_com_docker_compose_version="",container_label_name="",id="**/system.slice/unattended-upgrades.service**",image="",name="",state="uninterruptible"} 0
container_tasks_state{container_label_com_docker_compose_config_hash="",container_label_com_docker_compose_container_number="",container_label_com_docker_compose_oneoff="",container_label_com_docker_compose_project="",container_label_com_docker_compose_service="",container_label_com_docker_compose_version="",container_label_name="",id="**/system.slice/watchdog.service**",image="",name="",state="iowaiting"} 0
There are tons of system.slice stuff:
curl -s localhost:9600/metrics | grep system.slice | cut -d'{' -f1 | sort | uniq -c
64 container_cpu_load_average_10s
64 container_cpu_system_seconds_total
136 container_cpu_usage_seconds_total
64 container_cpu_user_seconds_total
38 container_fs_reads_bytes_total
38 container_fs_reads_total
38 container_fs_writes_bytes_total
38 container_fs_writes_total
64 container_last_seen
64 container_memory_cache
64 container_memory_failcnt
256 container_memory_failures_total
64 container_memory_mapped_file
64 container_memory_max_usage_bytes
64 container_memory_rss
64 container_memory_swap
64 container_memory_usage_bytes
64 container_memory_working_set_bytes
64 container_spec_cpu_period
64 container_spec_cpu_shares
64 container_spec_memory_limit_bytes
64 container_spec_memory_reservation_limit_bytes
64 container_spec_memory_swap_limit_bytes
64 container_start_time_seconds
320 container_tasks_state
curl -s localhost:9600/metrics | grep -Eo 'id="/system.slice[^"]+' | sort | uniq -c
35 id="/system.slice/accounts-daemon.service
33 id="/system.slice/acpid.service
27 id="/system.slice/apparmor.service
27 id="/system.slice/apport.service
35 id="/system.slice/apt-daily-upgrade.service
34 id="/system.slice/atd.service
27 id="/system.slice/cgroupfs-mount.service
35 id="/system.slice/cloud-config.service
35 id="/system.slice/cloud-final.service
27 id="/system.slice/cloud-init-local.service
35 id="/system.slice/cloud-init.service
27 id="/system.slice/console-setup.service
35 id="/system.slice/cron.service
35 id="/system.slice/dbus.service
35 id="/system.slice/docker.service
35 id="/system.slice/exim4.service
35 id="/system.slice/filebeat.service
27 id="/system.slice/grub-common.service
35 id="/system.slice/[email protected]
35 id="/system.slice/irqbalance.service
35 id="/system.slice/iscsid.service
27 id="/system.slice/keyboard-setup.service
27 id="/system.slice/kmod-static-nodes.service
35 id="/system.slice/lvm2-lvmetad.service
27 id="/system.slice/lvm2-monitor.service
35 id="/system.slice/lxcfs.service
27 id="/system.slice/lxd-containers.service
35 id="/system.slice/mdadm.service
27 id="/system.slice/networking.service
35 id="/system.slice/node_exporter.service
35 id="/system.slice/nscd.service
35 id="/system.slice/nslcd.service
35 id="/system.slice/ntp.service
27 id="/system.slice/ondemand.service
27 id="/system.slice/open-iscsi.service
35 id="/system.slice/polkitd.service
35 id="/system.slice/protologbeat.service
27 id="/system.slice/rc-local.service
27 id="/system.slice/resolvconf.service
35 id="/system.slice/rsyslog.service
27 id="/system.slice/setvtrgb.service
27 id="/system.slice/snapd.seeded.service
39 id="/system.slice/snapd.service
35 id="/system.slice/ssh.service
35 id="/system.slice/systemd-journald.service
27 id="/system.slice/systemd-journal-flush.service
35 id="/system.slice/systemd-logind.service
27 id="/system.slice/systemd-modules-load.service
27 id="/system.slice/systemd-random-seed.service
27 id="/system.slice/systemd-remount-fs.service
27 id="/system.slice/systemd-sysctl.service
27 id="/system.slice/systemd-tmpfiles-setup-dev.service
27 id="/system.slice/systemd-tmpfiles-setup.service
39 id="/system.slice/systemd-udevd.service
27 id="/system.slice/systemd-udev-trigger.service
27 id="/system.slice/systemd-update-utmp.service
27 id="/system.slice/systemd-user-sessions.service
34 id="/system.slice/system-getty.slice
35 id="/system.slice/system-serial\\x2dgetty.slice
27 id="/system.slice/ufw.service
27 id="/system.slice/unattended-upgrades.service
35 id="/system.slice/watchdog.service
Do I miss anything in the configuration? Thanks.
@dashpole not sure whether it's a bug or a feature, what I could understand from the source below, accept will always be true for default []string{"/"} value of rawPrefixWhiteList
This would allow all raw cgroups to be always collected, hence there is all that /system.slice/* stuff reported by cAdvisor regardless of -docker_only flag being set.
Oh yeah, you are correct. That is a bug. Ill open something to fix it...
@dashpole thank you for the #2161
Would you mind to add the ability to white list certain cgroup prefixes via a command line flag - #2164
Anyone reading this in current year, the --docker-only flag has been removed. Instead use the raw_cgroup_prefix_whitelist.
--raw_cgroup_prefix_whitelist=/docker/ should filter out everything correctly.
@lededje, are you sure? I did not check it myself, but the flag still shows up in the source code https://github.com/google/cadvisor/blob/f17af505243c9f637b49a279f64de70a98edf3a3/container/raw/factory.go#L32
Most helpful comment
so, is there a way not to collect system.slice/* metrics?