After reviewing the codebase with @stefanprodan we want to look into options for structured logging for OpenFaaS. Structured logging writes logs in a specific format that can be parsed easily via fluentd or some log aggregator.
Options mentioned:
Glog has some questionable design decisions and mixed opinion within the Go community.
Logrus used by Docker has issues with the vendoring disaster by Sirupsen and causes no end of headaches due to renaming the case.
Structured logging may allow us to minimize some of the duplication of logged events - i.e.
We could have 4x the data - and if the execution was 200 OK, it's probably pointless collecting the data in 4x log files. In the cloud the effect of 4x logs is expensive. Stefan explained that logging to stderr counts as a critical error on GKE and costs more where as logging to stdout is cheaper. Glog supports redirecting the logs to stdout - Go's log package which we use does not.
We already aggregate Prometheus "RED" metrics on function calls, so this may be an adequate replacement for writing logs for success messages.
What would this look like in the code? The advantage is that we have 4 times less log duplication, which makes sense. But instead of the gateway+provider collecting the logs, what service is collecting them? Is another container running in the deployment which does all the logging?
On Kubernetes you would send the logs to stderr and use a format that Fluentd understands like json or glog. Fluentd will forward the logs to the default could provider storage (Stackdriver, Cloudwatch, etc) or to Elasticsearch if you run on-prem. There is no need for us to ship a log collector on Kubernetes. Docker Swarm comes with several log drivers that will do the same.
I've tested several log packages and I would go with zerolog and json format.
| Library | Time | Bytes Allocated | Objects Allocated |
| :--- | :---: | :---: | :---: |
| zerolog | 767 ns/op | 552 B/op | 6 allocs/op |
| :zap: zap | 848 ns/op | 704 B/op | 2 allocs/op |
| :zap: zap (sugared) | 1363 ns/op | 1610 B/op | 20 allocs/op |
| go-kit | 3614 ns/op | 2895 B/op | 66 allocs/op |
| lion | 5392 ns/op | 5807 B/op | 63 allocs/op |
| logrus | 5661 ns/op | 6092 B/op | 78 allocs/op |
| apex/log | 15332 ns/op | 3832 B/op | 65 allocs/op |
| log15 | 20657 ns/op | 5632 B/op | 93 allocs/op |
Here is an example https://github.com/stefanprodan/k8s-podinfo/blob/master/cmd/podinfo/main.go
@ericstoekl I think there are two unrelated items I've represented in the issue.
Less cost running at high load in the cloud
Logs can be indexed/searched in a repeatable way.
Errors for instance can be picked up and used to trigger alerts.
@stefanprodan are those log numbers your own stats? If not then maybe you can link to where you got the table?
What does the podinfo output look like on the console?
Just chiming in with a datapoint:
For our use case (on Kubernetes), simply wrapping the function-emitted log message in JSON and attaching a loglevel has been sufficient; it's all that's necessary to integrate into our existing log pipelines. It isn't amazingly flexible, but it gets the job done.
For example, if someone's using the http executor and writes to STDERR, they get {"level": "ERROR", "message:" ... } and STDOUT gets {"level": "INFO", "message": ...}.
For the case of the streaming executor, where all feedback needs to come across STDERR, I'd toyed around with the idea of expanding the number of available log levels by allowing functions to drop hints to watchdog in the emitted message, but I don't know if that'd do anything other than cause confusion without some sort of function-SDK-level enforcement of message format.
In general, though, we don't really care about logging anything other than ERRORs. Ideally anything that would be logged at INFO should be captured as time-series data.
@gkuchta thanks for adding this. You said that you transport your logs over Kafka? Is that with fluentd collecting?
@alexellis I've added a link to https://github.com/rs/zerolog You can see the console output and benchmark there.
@alexellis that's correct. container -> fluentd -> kafka -> Elastic Search
We touched upon this in the contributors call today.
@alexellis , @ericstoekl, @stefanprodan re:
Reducing log verbosity
Using structured (machine readable) logs
Perhaps I misunderstand, but I think the discussion conflates collecting and presenting the log data.
For collecting the log data, we could use systemd. It's easy to overlook and take for granted, but it's likely to be around for a long time and the binary format avoids the character-encoding hilarity that comes with "text-based" content.
We want logs to be accessible from kubectl logs, docker logs, stackdriver, cloudwatch, kibana, datadog etc Using the systemd format would make OpenFaaS a non-cloud-native platform :))
I've had good success at work with user-go/zap for structured logs. I've not used zerolog but I don't imagine the differences are significant.
Hi @stefanprodan , I think I'm not explaining myself properly.
I agree with your point about accessing the data.
I was referring to recording the data using one of the established libraries like