Cadvisor: exclude system.slice metrics from being exposed as prometheus metrics

Created on 17 Dec 2018  路  17Comments  路  Source: google/cadvisor

we are using cAdvisor to collect docker container metrics with Prometheus. There are tons of cgroup related metrics being exposed that are not relevant and we want cAdvisor to filter them out
--docker_only flag does not change anything for some reason, we continue to see all /system.slice/* stuff being exposed. After some googling, I came across rawPrefixWhiteList = "/system.slice/kubelet.service, /system.slice/docker.service.." setting #2048 #1926
But I can not figure out how to use that setting and how it relates to docker_only flag? Could anyone explain it in more details how to use rawPrefixWhiteList? Seems it's not documented anywhere.

Most helpful comment

so, is there a way not to collect system.slice/* metrics?

All 17 comments

docker_only makes cAdvisor only collect docker container metrics, and machine level metrics. We don't actually expose rawPrefisWhiteList as a flag. We may want to consider consuming this in the kubelet to get that behavior.

thanks for your quick response.
does it mean we should NOT see /system.slice/* metrics when using docker_only flag? What do you refer to as machine level metrics?

I'm just looking for a way to tell cAdvisor not to collect/expose /system.slice/* metrics as there is noticeable performance impact. We don't use kubernetes

with docker_only, you only get cgroup metrics for cgroups that belong to docker containers, e.g. /kubepods/pod<pod_uid>/<container_id> will be collected, but /kubepods/pod<pod_uid> will not.

Machine metrics are metrics from the / cgroup, as well as, for example, disk metrics from filesystems on the node. We collect these metrics when --docker_only is specified even though it isn't a docker container cgroup.

so, is there a way not to collect system.slice/* metrics?

We don't use kubernetes too, but want to use cadvisor to collect docker container metrics. It collect all cgroup metrics in / , prometheus curl these metrics take a long time.

I wanted to elaborate more on our use case for cAdvisor and Prometheus.
We're running fleets of spot instances that come and go many times a day with peak number of 1500-2000 instances at a moment. Each spot instance hosts a dozen of docker containers which are not static either. All in all it is pretty dynamic environment which in turn produces millions of different time series and leads to extremely high cardinality metrics.
Basically we need just a handful of metrics to monitor CPU and memory usage of every docker container at resolution of every few seconds.
cAdvisor exposes over 2700 time series on every spot instance at any given time. While trying to ingest all of those from every spot into Prometheus, I observed over 400 millions(!) of head time series before it crashed the biggest Prometheus server I could try.
Yes, I can drop the unnecessary metrics at Prometheus side and it does help to get cardinality under control. Still the 2700 time series is over one megabyte of data which Prometheus has difficult time to scrape at high pace. Given there are some 2000 endpoints to scrape, it's a lot of traffic too. It seems there is cpu overhead for collecting un-needed metrics as well.

For the above reasons, it would be helpful to have the option to collect and expose docker related metrics only.

@dashpole thanks for taking care of this. Is the pull request #72787 above going to solve the issue when cAdvisor is used outside of kubernetes?

no, but you should be able to use the --docker_only flag if you just want metrics for docker containers.

@dashpole at the moment cadvisor with the --docker_only flag collects metrics from the /system.slice cgroup which is exactly what we are trying to avoid. Is it going to change that behavior?

@viberan that sounds like a bug. The --docker_only flag should mean you only get docker containers + the / (root) container.

can you paste your configuration (flags + cadvisor version) and the prometheus metrics you are getting here?

@dashpole sorry for delay in response. Below is excerpt from docker compose file:

  cadvisor:
    image: google/cadvisor:v0.32.0
    network_mode: bridge
    container_name: cadvisor
    ports:
      - 9600:8080
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    restart: unless-stopped
    command: ["--docker_only=true"]

the output of docker top command:

root@xxxx# docker top cadvisor
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                1495                1477                8                   15:45               ?                   00:01:59            /usr/bin/cadvisor -logtostderr --docker_only=true

getting ~ 3000 timeseries for prometheus:

curl -s localhost:9600/metrics | wc -l
3010

which is about 1Mb of data

curl -s localhost:9600/metrics | wc -c
1182807

and there are about 2000 system.slice related timeseries out of the above:

curl -s localhost:9600/metrics | grep 'system.slice' | wc -l
2014

metric example:

container_tasks_state{container_label_com_docker_compose_config_hash="",container_label_com_docker_compose_container_number="",container_label_com_docker_compose_oneoff="",container_label_com_docker_compose_project="",container_label_com_docker_compose_service="",container_label_com_docker_compose_version="",container_label_name="",id="**/system.slice/unattended-upgrades.service**",image="",name="",state="uninterruptible"} 0
container_tasks_state{container_label_com_docker_compose_config_hash="",container_label_com_docker_compose_container_number="",container_label_com_docker_compose_oneoff="",container_label_com_docker_compose_project="",container_label_com_docker_compose_service="",container_label_com_docker_compose_version="",container_label_name="",id="**/system.slice/watchdog.service**",image="",name="",state="iowaiting"} 0

There are tons of system.slice stuff:

curl -s localhost:9600/metrics | grep system.slice | cut -d'{' -f1 | sort | uniq -c
     64 container_cpu_load_average_10s
     64 container_cpu_system_seconds_total
    136 container_cpu_usage_seconds_total
     64 container_cpu_user_seconds_total
     38 container_fs_reads_bytes_total
     38 container_fs_reads_total
     38 container_fs_writes_bytes_total
     38 container_fs_writes_total
     64 container_last_seen
     64 container_memory_cache
     64 container_memory_failcnt
    256 container_memory_failures_total
     64 container_memory_mapped_file
     64 container_memory_max_usage_bytes
     64 container_memory_rss
     64 container_memory_swap
     64 container_memory_usage_bytes
     64 container_memory_working_set_bytes
     64 container_spec_cpu_period
     64 container_spec_cpu_shares
     64 container_spec_memory_limit_bytes
     64 container_spec_memory_reservation_limit_bytes
     64 container_spec_memory_swap_limit_bytes
     64 container_start_time_seconds
    320 container_tasks_state

curl -s localhost:9600/metrics | grep -Eo 'id="/system.slice[^"]+' | sort  | uniq -c
     35 id="/system.slice/accounts-daemon.service
     33 id="/system.slice/acpid.service
     27 id="/system.slice/apparmor.service
     27 id="/system.slice/apport.service
     35 id="/system.slice/apt-daily-upgrade.service
     34 id="/system.slice/atd.service
     27 id="/system.slice/cgroupfs-mount.service
     35 id="/system.slice/cloud-config.service
     35 id="/system.slice/cloud-final.service
     27 id="/system.slice/cloud-init-local.service
     35 id="/system.slice/cloud-init.service
     27 id="/system.slice/console-setup.service
     35 id="/system.slice/cron.service
     35 id="/system.slice/dbus.service
     35 id="/system.slice/docker.service
     35 id="/system.slice/exim4.service
     35 id="/system.slice/filebeat.service
     27 id="/system.slice/grub-common.service
     35 id="/system.slice/[email protected]
     35 id="/system.slice/irqbalance.service
     35 id="/system.slice/iscsid.service
     27 id="/system.slice/keyboard-setup.service
     27 id="/system.slice/kmod-static-nodes.service
     35 id="/system.slice/lvm2-lvmetad.service
     27 id="/system.slice/lvm2-monitor.service
     35 id="/system.slice/lxcfs.service
     27 id="/system.slice/lxd-containers.service
     35 id="/system.slice/mdadm.service
     27 id="/system.slice/networking.service
     35 id="/system.slice/node_exporter.service
     35 id="/system.slice/nscd.service
     35 id="/system.slice/nslcd.service
     35 id="/system.slice/ntp.service
     27 id="/system.slice/ondemand.service
     27 id="/system.slice/open-iscsi.service
     35 id="/system.slice/polkitd.service
     35 id="/system.slice/protologbeat.service
     27 id="/system.slice/rc-local.service
     27 id="/system.slice/resolvconf.service
     35 id="/system.slice/rsyslog.service
     27 id="/system.slice/setvtrgb.service
     27 id="/system.slice/snapd.seeded.service
     39 id="/system.slice/snapd.service
     35 id="/system.slice/ssh.service
     35 id="/system.slice/systemd-journald.service
     27 id="/system.slice/systemd-journal-flush.service
     35 id="/system.slice/systemd-logind.service
     27 id="/system.slice/systemd-modules-load.service
     27 id="/system.slice/systemd-random-seed.service
     27 id="/system.slice/systemd-remount-fs.service
     27 id="/system.slice/systemd-sysctl.service
     27 id="/system.slice/systemd-tmpfiles-setup-dev.service
     27 id="/system.slice/systemd-tmpfiles-setup.service
     39 id="/system.slice/systemd-udevd.service
     27 id="/system.slice/systemd-udev-trigger.service
     27 id="/system.slice/systemd-update-utmp.service
     27 id="/system.slice/systemd-user-sessions.service
     34 id="/system.slice/system-getty.slice
     35 id="/system.slice/system-serial\\x2dgetty.slice
     27 id="/system.slice/ufw.service
     27 id="/system.slice/unattended-upgrades.service
     35 id="/system.slice/watchdog.service

Do I miss anything in the configuration? Thanks.

@dashpole not sure whether it's a bug or a feature, what I could understand from the source below, accept will always be true for default []string{"/"} value of rawPrefixWhiteList

https://github.com/google/cadvisor/blob/150629c099b66e13223ec0601fdf9d49a3282c68/container/raw/factory.go#L70-L75

This would allow all raw cgroups to be always collected, hence there is all that /system.slice/* stuff reported by cAdvisor regardless of -docker_only flag being set.

Oh yeah, you are correct. That is a bug. Ill open something to fix it...

@dashpole thank you for the #2161

Would you mind to add the ability to white list certain cgroup prefixes via a command line flag - #2164

Anyone reading this in current year, the --docker-only flag has been removed. Instead use the raw_cgroup_prefix_whitelist.

--raw_cgroup_prefix_whitelist=/docker/ should filter out everything correctly.

@lededje, are you sure? I did not check it myself, but the flag still shows up in the source code https://github.com/google/cadvisor/blob/f17af505243c9f637b49a279f64de70a98edf3a3/container/raw/factory.go#L32

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dashpole picture dashpole  路  7Comments

ephracis picture ephracis  路  4Comments

NichUK picture NichUK  路  5Comments

arjun-dandagi picture arjun-dandagi  路  4Comments

jlec picture jlec  路  5Comments