NOTE this is currently attributed to an issue with systemd: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760
Telegraf version 0.13.1
CoreOS version 1010.5.0
Many plugins running, including the disk input.
We can reproduce the issue in our environment.
Expect the Telegraf disk input to work correctly on startup.
The Telegraf disk input reports the following errors every collection cycle.
2016/06/08 18:59:20 Error in input [disk]: error getting disk usage info: too many levels of symbolic links
The input starts working correctly when Telegraf is restarted.
can you provide your configuration?
@sparrc I get it while running - https://github.com/deis/monitor/tree/master/telegraf
+1
does anyone have steps to reproduce this?
@sparrc my guess is it's related to telegraf running early in the startup process of CoreOS. As I have only reproduced it on newly booted boxes.
The things I know
My configuration
Setting Agent Hostname to: ip-10-0-20-45.ec2.internal
Setting PROMETHEUS_URLS to: "https://192.168.3.1:443/api/v1/proxy/nodes/ip-10-0-20-45.ec2.internal/metrics", "https://192.168.3.1:443/metrics"
Building config.toml!
Finished building toml...
###########################################
###########################################
# Set Tag Configuration
[tags]
# Set Agent Configuration
[agent]
interval = "10s"
round_interval = true
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
debug = false
quiet = true
flush_buffer_when_full = true
hostname = "ip-10-0-20-45.ec2.internal"
# Set output configuration
[[outputs.influxdb]]
urls = ["http://influxdb.service.consul:8086"]
database = "telegraf"
precision = "ns"
timeout = "5s"
# Set Input Configuration
[[inputs.netstat]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.mem]]
[[inputs.cpu]]
percpu = true
totalcpu = true
[[inputs.disk]]
[[inputs.diskio]]
[[inputs.net]]
# Set Service Input Configuration
###########################################
###########################################
2016/07/01 02:46:14 Starting Telegraf (version 1.0.0-beta1-1-g2211231)
2016/07/01 02:46:14 Loaded outputs: influxdb
2016/07/01 02:46:14 Loaded inputs: disk diskio net netstat swap system mem cpu
2016/07/01 02:46:14 Tags enabled: host=ip-10-0-20-45.ec2.internal
2016/07/01 02:46:14 Agent Config: Interval:10s, Debug:false, Quiet:true, Hostname:"ip-10-0-20-45.ec2.internal", Flush Interval:10s
2016/07/01 02:46:20 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links
2016/07/01 02:46:30 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links
@sstarcher the entire telegraf process hangs when this happens?
@sparrc I still get the other inputs shipped so I doubt it's hanging. I could turn debug on and quiet off to see if anything else is happening.
Disk usage and iNodes are not collected and shipped, but cpu/memory/network is being shipped.
The first time telegraf is starting I see the following in my journal logs
Jul 01 02:46:20 ip-10-0-20-45.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 3782 (telegraf)
Jul 01 02:46:20 ip-10-0-20-45.ec2.internal systemd[1]: Mounting Arbitrary Executable File Formats File System...
Jul 01 02:46:20 ip-10-0-20-45.ec2.internal systemd[1]: Mounted Arbitrary Executable File Formats File System.```
Just noticed this in my papertrail logs and google brought me here:
Aug 03 15:15:41 localhost fluentd: 2016-08-03T22:15:40Z fluentd {"log":"2016/08/03 22:15:40 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links\n","stream":"stderr","docker":{"container_id":"427f86914f90644f9d56770a379cfe5b48f9c7c30598092dade1b40bd8f3830a"},"kubernetes":{"namespace_name":"deis","pod_id":"6164e10f-530c-11e6-a5e2-068755380eff","pod_name":"deis-monitor-telegraf-x5e7f","container_name":"deis-monitor-telegraf","labels":{"app":"deis-monitor-telegraf"},"host":"ip-172-20-0-131.us-west-1.compute.internal"}}
2016/08/03 22:15:40 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links
Repeats every 10s
I'm seeing the same issue with all telegraf pods created by deis workflow 2.3.0 on kubernetes 1.3.4, 1.0.0-beta2-18-g755b2ec and CoreOS stable as the host os.
I think @sstarcher is on the right track. If I attach to a telegraf container and traverse the rootfs with find I get the following:
find / -type s
find: '/rootfs/proc/sys/fs/binfmt_misc': Too many levels of symbolic links
The error only occurs for /rootfs/proc/sys/fs/binfmt_misc while /proc/sys/fs/binfmt_misc can be listed just fine.
The output of mount shows:
systemd-1 on /rootfs/proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=31,pgrp=0,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11343)
same as @felixbuenemann
I am pretty sure that all of the affected _host_ systems (not the containers) are running systemd.
I found this related ubuntu issue: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760
Both coreos stable (what I am running) and ubuntu xenial (listed in the report) are using systemd 229.
OK, I think the issue at hand is that the /proc/sys/fs/binfmt_misc is mounted on the host at first access by the automount proxy. But the telegraf container ist started before that with a read-only mount of the host /proc to /rootfs/proc.
If I do ls /proc/sys/fs/binfmt_misc on the coreos host and then kill the telegraf pod so it restarts, the error goes away in the new pod.
I have been able to work around the issue on CoreOS by requiring the docker.service to want the proc-sys-fs-binfmt_misc.mount service, which causes /proc/sys/fs/binfmt_misc on the host to be mounted before docker starts.
This is the cloud-config userdata to define the drop-in:
coreos:
units:
- name: docker.service
drop-ins:
- name: 60-mount-binfmt-misc.conf
content: |
[Unit]
Wants=proc-sys-fs-binfmt_misc.mount
After=proc-sys-fs-binfmt_misc.mount
The same could be done on other systems by editing or extending the docker systemd service.
CentOS 7.2, docker 1.12.3, telegraf 1.1.1 (in container)
Same issue.
docker-compose.yml
version: '2'
services:
telegraf:
image: telegraf
environment:
HOST_PROC: /rootfs/proc
HOST_SYS: /rootfs/sys
HOST_ETC: /rootfs/etc
hostname: host
volumes:
- ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- /sys:/rootfs/sys:ro
- /proc:/rootfs/proc:ro
- /etc:/rootfs/etc:ro
telegraf.conf:
...
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.diskio]]
...
Having the same issue #1544
I personally solved this issue by creating a shell script that wraps telegraf around.
# FIXME: for some reasons, this doesn't happen to work in the entrypoint :(
if [ -n "$HOST_MOUNT_PREFIX" ]; then
while (mount | grep "$HOST_MOUNT_PREFIX/proc/sys/fs/binfmt_misc" &>/dev/null); do
# this solves #TELEGRAF-1352 (too many levels of symbolic links)
umount $HOST_MOUNT_PREFIX/proc/sys/fs/binfmt_misc
done
fi
/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS
I've changed the Dockerfile's CMD statement to use the script and the issue is gone for good.
has anyone been able to confirm that this is the same issue as https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760?
Host OS Information
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial
Docker OS Information
Docker Base Image : alpine 3.4
Description
I have experienced the same phenomenon. To explain my situation briefly, the Ubuntu host did not have a problem, and there was a picture-like issue inside the docker container. Telegraf left the following logs.
2017-02-06T15: 32: 40Z E! ERROR in input [inputs.disk]: error getting disk usage info: too many levels of symbolic links

My input configuration like this
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "overlay"]
[[inputs.diskio]]
There is something strange. Restart the container and the disk collection will be normal. The contents of the container are as follows.

@gunnaraasen Can you tell me how you resolved the issue?
I found the cause. My case is that the host binfmt_misc is basically an automount so it will mount at the point of use.
sudo systemctl status proc-sys-fs-binfmt_misc.automount
At first, you can see the following logs.
proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point
Loaded: loaded (/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static; vendor preset: enabled)
Active: active (waiting) since Mon 2017-02-06 19:24:19 KST; 22h ago
Where: /proc/sys/fs/binfmt_misc
Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt
http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
After running telegraf, you can see the following logs.
proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point
Loaded: loaded (/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static; vendor preset: enabled)
Active: active (running) since Mon 2017-02-06 19:24:20 KST; 22h ago
Where: /proc/sys/fs/binfmt_misc
Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt
http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems
Feb 07 00:58:50 ip-xxx systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 28579 (telegraf)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
I've never used binfmt_misc on my host before, which means it will not mount. However, since bingmt_misc is mounted to access mount information from telegraf first, since then volume mapping information is included as well, so telegraf can collect information based on disk information without problems.
same issue, I made a quick fix with umount /proc/sys/fs/binfmt_misc just after telegraf start
Closing, please use the workaround provided by @Dark0096