Influxdb: Refactor logging to use a structured logger and log levels

Created on 19 Jul 2016 · 8Comments · Source: influxdata/influxdb

Feature Request

Refactor our logging to use a structured logger and log levels. Structured logging allows us to use a consistent method of logging messages and a way of filtering messages for support. It also allows us to format messages for either the traditional log file that we normally use and like along with JSON messages for consumption by automated log tools like Splunk.

Proposal: Refactor logging to a structured logger and log levels

Current behavior: We use the Go standard log package

Desired behavior: Using the structured logger

Use case: It makes log files easier to follow and filter out messages that may only be a concern to those doing development.

kinfeature-request support

Source

jsternberg

👍1

Most helpful comment

@corylanou my favorite example from a system with a Graphite endpoint that was getting spammed by bad connections. The error [graphite] 2016/05/02 03:44:09 unable to parse line: : received " " which doesn't have required fields makes up over 99% of the log.

$ wc -l influxd.log
 33376296 influxd.log
$ grep "unable to parse" influxd.log | wc -l
 33232791

So in a 33 million line log file (3.3GB uncompressed), only 143,505 lines are anything other than the Graphite error. About 385 lines per second.

beckettsean on 19 Jul 2016

😄2 👍1

All 8 comments

/cc @beckettsean @corylanou

jsternberg on 19 Jul 2016

More useful if we also implement #7037.

jsternberg on 19 Jul 2016

Yes, #7037 along with log levels would be a killer feature for gathering diagnostic information from customers. It would also allow us to recommend WARN level for production systems and only do INFO or DEBUG when needed.

beckettsean on 19 Jul 2016

We should also add the ability to do backoff logging, so if you see the same type of message over and over, it doesn't fill up logs. Things like can't write to remote node showing up 10,000 in a second aren't useful, regardless of log level.

corylanou on 19 Jul 2016

$ wc -l influxd.log
 33376296 influxd.log
$ grep "unable to parse" influxd.log | wc -l
 33232791

So in a 33 million line log file (3.3GB uncompressed), only 143,505 lines are anything other than the Graphite error. About 385 lines per second.

beckettsean on 19 Jul 2016

😄2 👍1

+1 & allow configurable log handlers... as all levels ends into a single writer (/dev/stderr), to automatically catch some unexpected patterns and raise attention to humans to investigate deeper requires parsing of those logs.

Btw, I know structured logging can be expensive and https://github.com/uber-go/zap/ is cheap

KeepMasterBranch on 7 Oct 2016

@bclermont zap looks really good. I'll be looking into that as a potential solution for this.

A few notes about what I want out of structured logging. First, a step by step plan.

Modify all log statements to use a structured logger. Not necessary to use different log levels or structured logging at this point yet.
Start breaking out messages to use fields and different log levels.

When it comes to the output and what features we want, I think it should be possible to set the log level differently for different parts of the system. I don't think that's needed out of the first version, but we should be able to dynamically change the level of these different loggers too.

I would also like to see it possible to use different output methods. I would personally like the default to be one similar to the current output with colors when a terminal is being used, but I know others want logfmt to be the default. I think making this configurable is easiest rather than deciding on what everyone might want. JSON is definitely something I think should be in there. Even if I don't personally like JSON logging, common devops tools like Splunk have the ability to inject JSON logs and I think it's a decent enough feature as long as it doesn't take too much time to implement.

jsternberg on 29 Nov 2016

Thoughts on uber-go/zap. Initially, I really like the idea and I feel positive to see what direction they go until it becomes a stable API. The sad part is I really hope they rethink the current API because it seems to be missing a few key aspects in the publicly exposed API that make me rethink using it at the current moment.

First, for things I like. I like the use of interfaces and I like the options. I think the model for dealing with fields is superior because of fewer allocations and I like the nesting. I also like zap.CheckedMessage and how the various different levels are handled. I like how everything is typed and that interface{} is not used at all (there are a few popular structured logging libraries like log15 that use interface{} that I don't like). I much prefer log.Info("message", zap.String(key, value)) over log.Info("message", key, value). I view the latter as too prone to mistakes and it won't get caught either in compilation or go vet since it's not one of the printf rules. I also like it better than logrus since, even though they use a similar system there, logrus constructs a map to pass the fields and then has to perform sorting rather than just passing those values as a slice and already being ordered. apex/log uses something similar to logrus and I like that better, but it's on par with how zap does it.

The negatives I've run into are a mostly inconsistent API when you try to do something more complicated. It's trivial to create a single logger and build child loggers off of that and have them all write to the same place. But, if you need to have various different levels for those loggers, you are mostly out of luck.

If you need to have multiple loggers for some reason (like having different writing outputs or different levels), you mostly can't share the same output. In order to synchronize the output, you need to use zap.Output() to wrap the output in a writer that's threadsafe. But, if you use zap.Output() to generate an Option, it is constructed in a way where the threadsafe writer is used between multiple writers. So you have to log to multiple different outputs or supply your own threadsafe writer.

Now the log output is stored in a zap.Meta and you can construct one of those using zap.MakeMeta, but this function appears to be useless. There's no way to construct your own zap.Meta and pass it to a zap.Logger. You have to construct a logger with zap.Encoder and zap.Option to construct the logger which leads to the above problem.

You also can't clone the logger in anyway or pass new options to child loggers so you're stuck with whatever options you pass into the root logger.

Maybe we can use it anyway, but any ability to change log levels or use other outputs does not seem like it will be possible unless they change the API a bit. Likely, just allowing the construction of a zap.Logger from a custom zap.Meta is all that is needed. I can go ask on their bug tracker if they are planning to do something like that. I think if I could change just that one thing the library would work perfectly.

jsternberg on 30 Nov 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Configuration] Limit the memory usage of influxdb

FGRibreau · 45Comments

[feature request] ORDER BY tag values

beckettsean · 43Comments

Allow DISTINCT function to operate on tags

TechniclabErdmann · 80Comments

[feature request] RENAME MEASUREMENT

beckettsean · 68Comments

Intelligent Rollups and Querying of Aggregated Data

jsternberg · 57Comments