Telegraf: Telegraf tail & logparser should handle deleted & newly created files

Created on 29 Sep 2016  路  24Comments  路  Source: influxdata/telegraf

Bug report

Relevant telegraf.conf:

[[inputs.logparser]]
## files to tail.
files = ["/u0/log/jetty/*.request.log"]
## Read file from beginning.
from_beginning = false

System info:

Telegraf 1.0.0
Ubuntu 14.04.4
Jetty

Steps to reproduce:

  1. Jetty is configured to rotate the access log to a new file in the format YYYY_MM_DD.request.log daily
  2. When a new access log file is created, Telegraf does not pick up the file and thus no further metrics is collected from the access log.

    Expected behavior:

Telegraf logparser picks up new file on the same pattern defined in the telegraf.conf

Actual behavior:

Telegraf does not pick up new rotated file automatically

Additional info:

Here's an fuser output of the log directory. The 7692 pid is Telegraf. It was started on 9/28.
/u0/log/jetty/2016_09_23.request.log: 7692
/u0/log/jetty/2016_09_24.request.log: 7692
/u0/log/jetty/2016_09_25.request.log: 7692
/u0/log/jetty/2016_09_26.request.log: 7692
/u0/log/jetty/2016_09_27.request.log: 1081 7692
/u0/log/jetty/2016_09_28.request.log: 1081 7692
/u0/log/jetty/2016_09_29.request.log: 1081 8769

The 2016_09_29.request.log was created by the 8769 process with is jetty. notice that 7692 does not hold a handle of that file.

Interestingly the pid 1081 belongs to logstash-forwarder, which is able to pick up the newly created file automatically.

enhancement

Most helpful comment

This still seems like an issue for [inputs.tail] for Telegraf 1.14.2. Once the file has been deleted and recreated with same name tail doesn't work anymore.

All 24 comments

+1 on this one.

It's working as intended. Jetty should be configured so that it always writes to the same log file and logrotate should be used to do the rotation.

@dgnorton Unfortunately this is not something that is out of the box standard to jetty.

Please refer to here:
http://wiki.eclipse.org/Jetty/Howto/Configure_Request_Logs

Either way, logparser should detect new files. There is not only 1 way to accomplish 1 task.

@akan1979 what happens if retainDays is set to 2 billion (> 5M years) and yyyy_mm_dd is removed from the file name?

This seems to be a problem when a file is (re)moved as well, e.g. given this config:

[[inputs.logparser]]
  files = ["/tmp/telegraf_input.log"]
  from_beginning=false
  [inputs.logparser.grok]
    patterns=["%{INT:measurement}"]

I see this behavior:

echo 1 > /tmp/telegraf_input.log
# start telegraf
echo 2 >> /tmp/telegraf_input.log
# telegraf picks up 2
rm /tmp/telegraf_input.log && touch /tmp/telegraf_input.log
# telegraf 1.0: nothing reported, telegraf 1.0.1 reports
# "Waiting for /tmp/telegraf_input.log to appear..."
echo 3 >> /tmp/telegraf_input.log
# Nothing happens :(

On the plus side, truncation detection seems to work, e.g.

echo 1 > /tmp/telegraf_input.log
echo 2 > /tmp/telegraf_input.log
# produce
# Re-opening truncated file /tmp/telegraf_input.log ...
# Successfully reopened truncated /tmp/telegraf_input.log

... but I suspect more than a few logging tools (e.g. logback) don't support rolling by truncation.

Internally telegraf has to spinup a thread for each file that it needs to tail. Currently it only does that once at startup, which is why deleted & created files are not handled.

I think it's worthwhile to handle these two cases in the tailer & logparser so I'm going to reopen this issue and put it into the 1.3 milestone.

Sounds good. FWIW, it looks like this lib solves addresses these problems. I haven't dug into the source yet, but it has a lot of 猸愶笍s so must be good :)

@njhartwell what lib are you talking about?

Sorry, forgot to paste it https://github.com/hpcloud/tail

we are already using that library, it doesn't solve all of this

Is this still slated to go into the next release of Telegraf? (1.3)

Other than restarting the Telegraf service after each log rotation, are there any other workarounds you can suggest for this?

We are trying to replace Logstash (which handles this case) with Telegraf, we can not proceed until this issue is resolved.

yes, still planned

nope, there is no other known workaround

@mottati I believe https://github.com/influxdata/telegraf/pull/2141 should solve this. Are you having problems with newly created files or only logrotated files?

Our particular use case is we have Telegraf pointing to a test machine where we periodically reinstall our software with newer versions. The deployment is fairly simplistic, we shutdown our servers, rename the old deployment directory to backup.directory (which includes the log file we are monitoring), install the new software, and bring our services back up.

We can add a restart of the Telegraf service to this process until 1.3 is released.

Interestingly enough when I went to look into this this morning, I noticed that Telegraf was still holding on to a deleted file, our software had been reinstalled twice since I had last looked so the file was renamed into the backup directory in the first round, and deleted in the second.

sudo lsof -p 43888
COMMAND    PID     USER   FD   TYPE     DEVICE  SIZE/OFF       NODE NAME
telegraf 43888 telegraf  cwd    DIR        8,1     16384          2 /
telegraf 43888 telegraf  rtd    DIR        8,1     16384          2 /
telegraf 43888 telegraf  txt    REG        8,1  32548479     543967 /usr/bin/telegraf
telegraf 43888 telegraf  mem    REG        8,1   1921096     442382 /lib64/libc-2.12.so
telegraf 43888 telegraf  mem    REG        8,1    142640     442406 /lib64/libpthread-2.12.so
telegraf 43888 telegraf  mem    REG        8,1    154520     442375 /lib64/ld-2.12.so
telegraf 43888 telegraf    0r   CHR        1,3       0t0       4160 /dev/null
telegraf 43888 telegraf    1w   CHR        1,3       0t0       4160 /dev/null
telegraf 43888 telegraf    2w   REG      253,0     43741   28721156 /var/log/telegraf/telegraf.log
telegraf 43888 telegraf    3r   CHR        1,9       0t0       4165 /dev/urandom
telegraf 43888 telegraf    4r   REG      253,0 183324583   63406516 /data/solo.backup/soloots/logs/tasks.log (deleted)
telegraf 43888 telegraf    5u   REG        0,9         0       4158 [eventpoll]
telegraf 43888 telegraf    7r  IPv4 2235795701       0t0        TCP localhost.localdomain:45245->localhost.localdomain:sun-as-jpda (ESTABLISHED)
telegraf 43888 telegraf    8r   DIR       0,10         0          1 inotify
telegraf 43888 telegraf    9u   REG        0,9         0       4158 [eventpoll]
telegraf 43888 telegraf   10r  FIFO        0,8       0t0 2166411407 pipe
telegraf 43888 telegraf   11w  FIFO        0,8       0t0 2166411407 pipe
telegraf 43888 telegraf   12u  IPv4 2226014191       0t0        TCP localhost.localdomain:55572->localhost.localdomain:d-s-n (ESTABLISHED)

Note the deleted tasks.log file that Telegraf has open above.

Than could be another issue you may want to investigate.

:+1:

The tail plugin Readme explicitly states that it should behave like a tail -F meaning that :

it will follow the name of the given file, so that it will be compatible with log-rotated files, and that it will retry on inaccessible files.

but this is not the case as it keeps open the renamed/deleted file.
So it looks more a bug than an enhancement.

this was closed by https://github.com/influxdata/telegraf/pull/2141, it will be available in 1.3

I tried testing this today using the same technique that was listed at the beginning of this issue. I used the latest nightly build of Telegraf. Here is what I found:

telegraf version:

telegraf --version
Telegraf vdev~n201703100819 (git: master 13f314a5076a47b208aeef5bdab12470837e26c1)

Configuration File

[agent]
   flush_interval=2

[[inputs.logparser]]
  files = ["/tmp/telegraf/input.log"]
  from_beginning=true
  [inputs.logparser.grok]
    patterns=["%{INT:measurement}"]

[[outputs.file]]
  files = ["stdout"]

Test Script

#!/bin/bash
# Start with 1 line of data in the input.log
echo 0 > input.log
rm nohup.out
nohup telegraf --config config --debug 2>/dev/null  &
pid=$!

for i in {1..5}
do
    echo "Writing line $i"
    echo $i >> input.log
    sleep 10
    rm input.log
done
kill -9 $pid
cat nohup.out

Test Execution

$ ./test
Writing line 1
Writing line 2
Writing line 3
Writing line 4
Writing line 5
logparser_grok,host=node2087.svc.devpg.pdx.wd measurement="0" 1489182278368364318
logparser_grok,host=node2087.svc.devpg.pdx.wd measurement="1" 1489182278368407112

Conclusion

Assuming that this PR is actually in the build I used, it does not appear that this fix accomplishes what was requested e.g. discovery of newly created files with the same name of the original file.

We are currently using LogStash to parse our log that has this behavior. This is the last remaining issue we have that will allow us to use a pure TICK stack.

I can duplicate your report @mottati. It seems that recreated files are not detected, but truncated files are. There is also a bug where if the file is truncated with a file of the same size it is not detected.

I merged a fix for the problem into our tail fork.

@danielnelson Thanks! Do you know when this fix will make it into a released version of Telegraf?

The fix for this was in the 1.3.0 release, though I have heard that there are remaining issues. https://github.com/influxdata/telegraf/issues/2847

This still seems like an issue for [inputs.tail] for Telegraf 1.14.2. Once the file has been deleted and recreated with same name tail doesn't work anymore.

This still seems like an issue for [inputs.tail] for Telegraf 1.14.2. Once the file has been deleted and recreated with same name tail doesn't work anymore.

Yes, got this bug at telegraf 1.17.0 with [inputs.tail] plugin on ubuntu 20.04
I run telegraf and tail -F --lines=0 /var/log/voodoo/voodoo_metric.log in separate window/
I wrote some loglines to my logfile, and both - telegraf and tail see it and telegraf parse it correctly.
Then, I remove /var/log/voodoo/voodoo_metric.log and touch it, and tail say:
tail: '/var/log/voodoo/voodoo_metric.log' has become inaccessible: No such file or directory tail: '/var/log/voodoo/voodoo_metric.log' has appeared; following new file server_iops{operation="data_read"} 976 1594660047713
but telegraf still dont see new logfile(

This still seems like an issue for [inputs.tail] for Telegraf 1.14.2. Once the file has been deleted and recreated with same name tail doesn't work anymore.

Yes, got this bug at telegraf 1.17.0 with [inputs.tail] plugin on ubuntu 20.04
I run telegraf and tail -F --lines=0 /var/log/voodoo/voodoo_metric.log in separate window/
I wrote some loglines to my logfile, and both - telegraf and tail see it and telegraf parse it correctly.
Then, I remove /var/log/voodoo/voodoo_metric.log and touch it, and tail say:
tail: '/var/log/voodoo/voodoo_metric.log' has become inaccessible: No such file or directory tail: '/var/log/voodoo/voodoo_metric.log' has appeared; following new file server_iops{operation="data_read"} 976 1594660047713
but telegraf still dont see new logfile(

I have the same issue when using Telegraf (1.17.0) [inputs.tail] and Grok Pattern with Linux (Debian Buster latest version) build-in logrotation that rotates the logfile. After the log file gets rotatet Telegraf don't log the new data and output this error:

2021-01-05T23:00:00Z E! [inputs.tail] Tailing "/var/log/mylogfile.log": Unable to open file /var/log/mylogfile.log: open /var/log/mylogfile.log: permission denied
2021-01-05T23:01:01Z E! [inputs.tail] Stopping tail on "/var/log/mylogfile.log": Unable to open file /var/log/mylogfile.log: open /var/log/mylogfile.log: permission denied

After restarting the telegraf service all works fine again until the next log rotation. As workaround I restart the telegraf service after the log rotation, but thats not a clean solution.

Was this page helpful?
0 / 5 - 0 ratings