Telegraf: Telegraf unable to start without existing log file

Created on 27 Jul 2020  路  6Comments  路  Source: influxdata/telegraf

Relevant telegraf.conf:

[agent]
flush_interval = "10s"
flush_jitter = "5s"
interval = "10s"
metric_buffer_limit = 20000
round_interval = true

System info:

Telegraf version 1.15.1, Amazon Linux 2018.03

Steps to reproduce:

  1. Install telegraf version 1.15.1
  2. /var/log/telegraf directory exists, but has no files
  3. Start telegraf

Expected behavior:

Telegraf starts, and writes logs to /var/log/telegraf/telegraf.log

Actual behavior:

sudo service telegraf start
Starting the process telegraf [ OK ]
sh: /var/log/telegraf/telegraf.log: Permission denied

Telegraf does not start

Additional info:

When rolling back our version to 14.5.1, telgraf starts correctly and logs to /var/log/telegraf/telegraf.log.
Alternatively, if I manually create a blank log file named /var/log/telegraf/telegraf.log and chown it to telegraf, then version 1.15.1 will start correctly.

This unexpectedly broke metrics reporting for us today on new hosts.

Most helpful comment

should be resolved. Might have to wait for the nightly for the rpm to build to test. I reproduced locally in a VM and it seems to resolve the issue, so I'm thinking this should work.

All 6 comments

this seems like a permission problem. Telegraf should have write permissions to the /var/log/telegraf folder. Make sure that the user running the telegraf process, or its group, has write access to this folder.

@ssoroka i rolled back telegraf to 1.14.5 on the same host, no permissions changes or any other changes to the host besides yum downgrading. and the process started just fine.
this is the only reason I filed an issue here with telegraf

Did telegraf 1.15.1 add some new requirement to creating this log file? I'm working around things on my end, but would like to understand why a new telegraf version was responsble for breaking things.

We are unable to start the telegraf service on fresh installations on EL nodes as well. This looks like an RPM packaging change.

In the 1.14.5 RPM /var/log/telegraf is owned by telegraf:telegraf

# rpm -qvl telegraf
-rw-r--r--    1 root    root                      131 Jun 30 19:20 /etc/logrotate.d/telegraf
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /etc/telegraf
-rw-r--r--    1 root    root                   235766 Jun 30 19:20 /etc/telegraf/telegraf.conf
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /etc/telegraf/telegraf.d
-rwxr-xr-x    1 root    root                 69213184 Jun 30 19:20 /usr/bin/telegraf
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /usr/lib/.build-id
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /usr/lib/.build-id/87
lrwxrwxrwx    1 root    root                       28 Jun 30 19:21 /usr/lib/.build-id/87/58b4a9009b5001278739cd097e59d24c18f23e -> ../../../../usr/bin/telegraf
-rw-r--r--    1 root    root                     5803 Jun 30 19:20 /usr/lib/telegraf/scripts/init.sh
-rw-r--r--    1 root    root                      492 Jun 30 19:20 /usr/lib/telegraf/scripts/telegraf.service
drwxr-xr-x    2 telegraf telegraf                    0 Jun 30 19:21 /var/log/telegraf

In the 1.15.1 it is owned by root:root

# rpm -qvl telegraf
-rw-r--r--    1 root    root                      131 Jul 22 22:21 /etc/logrotate.d/telegraf
-rw-r--r--    1 root    root                   250761 Jul 22 22:21 /etc/telegraf/telegraf.conf
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /etc/telegraf/telegraf.d
-rwxr-xr-x    1 root    root                 69730912 Jul 22 22:21 /usr/bin/telegraf
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /usr/lib/.build-id
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /usr/lib/.build-id/3c
lrwxrwxrwx    1 root    root                       28 Jul 22 22:21 /usr/lib/.build-id/3c/1b944565dc487f5646d216f361977b5c6bb4c0 -> ../../../../usr/bin/telegraf
-rwxr-xr-x    1 root    root                     5803 Jul 22 22:21 /usr/lib/telegraf/scripts/init.sh
-rw-r--r--    1 root    root                      492 Jul 22 22:21 /usr/lib/telegraf/scripts/telegraf.service
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /var/log/telegraf

It looks like the debian packages are working because there is a chown in the post install script: https://github.com/influxdata/telegraf/blob/master/scripts/deb/post-install.sh#L52 This is not present in the RPM post install script.

Thanks for the follow up! Reopening

should be resolved. Might have to wait for the nightly for the rpm to build to test. I reproduced locally in a VM and it seems to resolve the issue, so I'm thinking this should work.

Was this page helpful?
0 / 5 - 0 ratings