Telegraf: Create a "logstreamer" plugin

Created on 10 Aug 2015  路  27Comments  路  Source: influxdata/telegraf

Inspired by issue #48, create a plugin for aggregating and pushing data from log files, allowing user-defined regex filters.

This would behave in a similar manner to heka's logstreamer plugin: https://hekad.readthedocs.org/en/v0.9.2/pluginconfig/logstreamer.html#logstreamerplugin

/cc @steverweber

All 27 comments

:+1:

perhaps something simpler, tail a file.
some code like: https://github.com/hpcloud/tail
and add some processing options like.

  • count match of a regex...
  • send raw text that a regex matches

this could be used in many ways! lets say you want to know howmany 404 nginx is returning a second. OR perhaps send raw error.log messages.. The log string lines would be nice in grafana when the table plugin is added.

Where do we start?

Tail code looks interesting, but it may even be overkill for this situation. A telegraf plugin being able to handle a constant stream of messages is something that I've implemented in the statsd plugin that has a PR open now #237. So it's possible, but I think for this situation we might be able to just cache the position in the file, and then start reading from that position on the next call to Gather()

There is also a plugin in a PR that does exactly as @steverweber described (counting status codes of a webserver log), but I probably won't be merging it because it's very specific to that use-case and the author has not written unit tests for it, see #176.

I think that more ideally this plugin should be a general use-case where a user can input any regex that will be counted when matched (or output a string as @steverweber suggested). I'm thinking configuration would look something like this:

[logstreamer]
    [[logfile]]
    measurement = "bazbars"
    file = "/var/log/foo.log"
    regex = ".*bar.*|.*baz.*"
    # Type of output. Can be "string" or "counter"
    type = "counter"

    [[logfile]]
    measurement = "webserver_404"
    regex = ".*404.*"
    [...]

+1

:+1:

keep in mined the logstreamer should recover if a file is

  • deleted and recreated.
  • truncated
  • is partway through a line write.

perhaps make it so multiple logstreamers are not needed for each metric.
/we only want to read log file once/

[streamer]

    [[file]]
    name = "/var/log/nginx/accept.log"
    delimiter = '\n' # default: '\n'

        [[[measurement]]]
        name = "nginx_requests"
        type = "counter" # counter(default)

        [[[measurement]]]
        name = "nginx_404"
        regex = ".*404.*"


    [[file]]
    name = "/var/log/nginx/error.log"

        [[[measurement]]]
        name = "nginx_errors"

        [[[measurement]]]
        name = "nginx_error_msg"
        regex = "<ignore timestamp> (<msg>.*)"
        type = "string"

perhaps file could even be a network stream... this could open up support for syslog:
file = "udp:\\127.0.0.1:4880"

some of the code in heka might be helpfull for udp input:
https://github.com/mozilla-services/heka/blob/dev/plugins/udp/udp_input.go

fyi: i feel telegraf objectives would be further along if it forked or contributed to: https://github.com/mozilla-services/heka - http://hekad.readthedocs.org/en/v0.10.0b1/
options are good tho :)

How about (sample config):

[logstreamer]
dirs = ["/tmp/logs"]
    [[logstreamer.group]]
    mask = "^.*log$"
    rules = ['\s\[(?P<date>\d{1,2}/\w*/\d+:\d+:\d+:\d+ [+-]?\d+)\]\s.*?"\s(?P<code>\d{3})\s(?P<size_value>\d+)']
    name = "nginx"
    date_format = "02/Jan/2006:15:04:05 -0700"

The plugin recursively walks the specified directories and looks for all files that match the "mask".
The it starts tailing them.

There are rules to parse and extract data, where regex named groups are used.
The name "date" is special, so it requires date_format (for golang time.Parse) to be properly parsed and translated to timestamp in metrics.
Names that end with _value are metrics. The rest are tags.
So, for example, after parsing nginx log with the rules above we get:

time            code    dc  group   host                size
2015-10-10T08:22:09.169981459Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:24:19.17656864Z   200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:28:59.828478721Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:39:40.812079491Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:42:14.991151971Z  200                 us-east-1   nginx   c7.local    753832
2015-10-10T08:46:19.562880205Z  200                 us-east-1   nginx   c7.local    753832

I like the idea of reading the datetime from the log, however I think it should be optional. Keep in-mind some time offsetting should be included to maintain the order of the log messages if not using the actual timestamps in the log.

also like the idea of including a tag or field name in the regex/rule.

@ekini I'd like if there was an option to add a straight filename in addition to the "mask"

Also, +1 to date parsing being optional, some people are only going to care about a count within an interval, not a point for every single instance of a regex match.

So you should support that as well, as in my original example above

Of course, date is optional, as well as date_format. Timestamps will be time.Now() then.
And yes, maybe walking through directories is an overkill.

There is one more concern. If you want to cache position in a file, and parse it to the end at each Gather, what happens if file is big? Also, what happens if telegraf gets restarted?

My test code constantly reads files, and sends parsed content to a buffered channel, and after call to Gather get as much as possible from the channel within specified timeout interval.

what happens if file is big?

tailing/seeking to end of file is often not a problem when its big...
perhaps you are referring to many writes between the timespan of a gather().
should have some limit... perhaps 1mb for a string buffer. the tail code i linked above uses a "leaky-bucket"

Also, what happens if telegraf gets restarted?

it gets restarted and jumps to the end of the file... We don't care if we loose some data between. Keeping state data is kinda overkill.

There is still a question of what to do if file is truncated. One option would be to make a ServicePIugin that has the Tail code that @steverweber running in the background.

This probably wouldn't be possible until I merge the statsd code

the https://github.com/hpcloud/tail code seems to handle this well.
https://github.com/hpcloud/tail/blob/master/cmd/gotail/gotail.go

t, err := tail.TailFile("/var/log/nginx.log", tail.Config{
    Follow: true,
    ReOpen: true,
    Poll: true})
for line := range t.Lines {
    fmt.Println(line.Text)
}

Config.ReOpen is analogous to tail -F (capital F):

-F      The -F option implies the -f option, but tail will also check to see if the file being followed has been
         renamed or rotated.  The file is closed and reopened when tail detects that the filename being read from
         has a new inode number.  The -F option is ignored if reading from standard input rather than a file.

ref: http://stackoverflow.com/questions/10135738/reading-log-files-as-theyre-updated-in-go

@ekini you mentioned you had some working code for this a couple weeks ago, do you happen to have anything I can take a look at? I'm interested in getting something working for this

@sparrc yes, I've got something working at ekini@04f4b72182eaf5533275433be6933d15932af480
It's based on mentioned above hpcloud/tail.
It workd, but there are plenty of sharp edges.

a little trick i been toying with.

cat > /cron_mon_log <<EOFXX
#!/bin/bash
tail -F -n0 /var/log/syslog | while read line; do
    curl -X POST 'http://mon-dev-1.private.xxxx.ca:8086/write?db=db' --data-binary "log_mon,hostname=$(hostname) value=\'$line\'"
done
EOFXX

echo '@reboot  root  /cron_mon_log' >> /etc/crontab

might need work, but thought it worth the share.

Maybe more simple with Rsyslog ?

rsyslog.conf:
*.* @127.0.0.1:1514

And listen on 1514 port for example.

Would be great is this could make it to telegraf. :+1:

:+1:

This will most likely start as a telegraf tail plugin that will accept the currently-available data input formats.

Recently came across this log analyzer project that looks like it has a pretty solid format for creating templates and parsing arbitrary logfile formats: https://github.com/trustpath/sequence

Right now it's discontinued, but influxdata could probably fork and take over that project if it turns out to be useful.

I am so interested in this plugin, primarily to monitor the response codes of Apache httpd.
There is already some alpha version to try?

:+1:

Would love to see this. Mostly for parsing Apache/Nginx logs (response codes, top URLs, etc).

It would be an useful feature

Was this page helpful?
0 / 5 - 0 ratings