A significant use case of docker logs with Java applications is the stack trace, which is by nature multiline. Right now each line becomes a new event, and while this can theoretically be pieced together by later stream processing, it is a much more complex problem to solve there than at the source of the serialized output of stdout
or stderrr
(which is where the docker log driver is located).
How would you propose we handle this case?
Seems like it's either some multi-line parsing, or single-line only but not both.
I think this should be solved within the application doing the logging, not within docker.
It's near impossible to properly detect when multiple lines should be merged together into a single message. There's always going to be some edge case. For stuff like this is almost always better to have the application write directly to the logging facility (syslog, logstash, splunk, whatever). Then there's no re-assembly required, as it was never split up in the first place.
Also I think stuff like this is asking Docker to do too much. If you want advanced log collection, it would be better to use a tool designed to do just that. There's a reason tools like splunk have dozens of config params regarding their log ingestion :-)
I completely agree with @phemmer, I feel it's not the docker responsability to handle single vs multiple lines in log..
Ok, I'll go ahead and close this; I agree that it adds too much complication, so not something we should do.
Thanks for suggesting though @wjimenez5271, I hope you understand
I don't think its that difficult to accomplish, but perhaps I am missing some piece of the equation. Mutliline would require a configurable of some regex that describes the beginning of a log line for a given container, and then the driver would need to concat lines that don't start with that regex with the previous line until it sees the regex match again. It would be important to have some limit on this string buffer to protect against memory usage.
Agreed it can be done downstream by other tools, but keep in mind its much more complex when you have multiplexed streams to do this work.
I can also understand the responsibility argument, but keep in mind the message you're sending to customers. To this end my team recently suggested that maybe Docker's logging solution wasn't the right choice for the problem and instead we should build log shippers into our container runtime to send to the collection tool over the network. This means we now have to manage log tooling that can work with a variety of application stacks and also manage config of where to make network connections to inside the container vs the elegance of a well defined serial channel that almost every language knows how to log to (namely stdout
, sterr
). So by making the logging features of Docker decidedly limited, you've weakened the promise of Docker making complexities of many different applications at scale easier.
Just some food for thought, I respect you have your own set of priorities to manage :)
Our goal is to add support for pluggable logging drivers, and having this option as an opt-in (through a logging driver) is a viable option (see https://github.com/docker/docker/issues/18604). Adding too many features to logging is often non-trivial, as processing logs can become a real bottleneck in high-volume logging setups - we want to avoid such bottlenecks.
why not just --log-delimiter="<%regex%>" ?
Closed. Don't care.
It would REALLY be nice if the container could collect multiline loggings as one event.
Without it the central logging system has to put a lot of effort into figuring out which events belongs together. And when it comes to the ELK stack it currently is not support at all as the Gelf input cannot use the multiline codec(https://github.com/logstash-plugins/logstash-input-gelf/issues/37).
Using the Docker GELF driver is pretty much useless if you have multi-line logs.
https://community.graylog.org/t/consuming-multiline-docker-logs-via-gelf-udp/
@replicant0wnz
Agreed. We decided to make all our services log in json format and it works like a charm.
This worked for me, so I'm leaving a comment here for anyone else trying to make their splunk logs readable:
If you have control of the log content before it gets to stdout:
try replacing your '\n' line endings with '\r'
Docker stops breaking the lines up, and the output is parsable.
We're currently using Logspout in production to collect logs from all the containers running in our cluster.
With this adapter (still open PR), multiline logs are aggregated easily: https://github.com/gliderlabs/logspout/pull/370
We merged it into our copy of Logspout and it's working perfectly fine for a couple of weeks now.
We then use the Logstash adapter to feed it to ELK.
Until we are able to annotate every byte written to a pipe with pid and thread id from which it came, isn't the case that handling multiple log lines via a docker log driver will always be error prone, only able to solve the "single process with a single thread container" case?
In the mean time, it might be worth considering the addition of a log driver which just performs a raw byte-capture of stdout & stderr, removing the byte-level interpretation all together. We could then defer the interpretation of the byte stream to the consumer, allowing the writer to operate with as little overhead as possible (avoiding bad containers that write only newlines, or small numbers of bytes between newlines). The writer would only have to record each payload read from a pipe with the timestamp, pipe it came from, and the bytes read from the the read()
system call. The reader of the byte stream would then be tasked with reassembling the stream according to whatever interpretation they see fit to use.
If a reader wants to do multi-line reassembly, that reader would have all the information available to know what byte came from which pipe, but will still have the multi-process, multi-thread issues.
See: https://github.com/kubernetes-incubator/cri-o/pull/1605
Most helpful comment
why not just --log-delimiter="<%regex%>" ?