Vector: Docker source: odd 501 Not Implemented errors

Created on 7 Feb 2020 · 31Comments · Source: timberio/vector

I was doing some work on a test harness, and somehow I was able to trigger this behavior:

Feb 07 07:13:41.973  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=3d8e163a3ce6431684cab14a4dbc7ca83e68ce10d58a730b2ecb9c3a94a5d5b6
Feb 07 07:13:41.977 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:13:41.977  INFO vector::sources::docker: Stoped listening logs on docker container id=3d8e163a3ce6431684cab14a4dbc7ca83e68ce10d58a730b2ecb9c3a94a5d5b6
Feb 07 07:13:41.978  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=3d8e163a3ce6431684cab14a4dbc7ca83e68ce10d58a730b2ecb9c3a94a5d5b6
Feb 07 07:13:41.981 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:13:41.981  INFO vector::sources::docker: Stoped listening logs on docker container id=3d8e163a3ce6431684cab14a4dbc7ca83e68ce10d58a730b2ecb9c3a94a5d5b6

All the configs are at https://github.com/MOZGIII/vector-merge-test-setup

Steps to reproduce:

Clone and cd to the vector-merge-test-setup repo.
Grab the tcp_test_server: go get github.com/timberio/tcp_test_server
Start the tcp_test_server: tcp_test_server --address=0.0.0.0:9000
Start vector with vector/test_harness.toml config:
../vector/target/debug/vector -c vector/test_harness.toml (I was runing it from the build dir at ../vector)
Start log producer: ./run_log_once_container.sh --build
If the bug is not triggered immediately, run the above command a couple more times.

Vector should start printing the errors as above at a high rate (100s per second).

$ ../vector/target/debug/vector --version              
vector 0.8.0 (g5b97e4c x86_64-unknown-linux-gnu 2020-02-02)
$ docker -v
Docker version 19.03.5, build 633a0ea838

UPD: improved steps to reproduce
UPD2: I can't reproduce it anymore on my end

docker docker bug

Source

MOZGIII

Most helpful comment

@MOZGIII so from what you're saying this means vector needs to run from within a container. How would we load the config? From what I can tell a lot of the logging drivers have you enumerate the entire config via log opts. Which wouldn't fit with our model. I think this is why fluentd/fluentbit went with the fluentd driver shipping their fwd protocol to a centralized fluent* app. That way the configuration for docker is very light and easy. From what I can tell the UX with complex docker logging drivers is much worse than our current solution.

Another thing that we could potentially do if performance is an issue is provide a config that works via jsonfile and its output to disk in /var/log/docker. Similar to what we do for kubernetes. Since, iirc jsonfile is the default this could be an easy way to set it up.

The other concern about refactoring a lot of this, is I have yet to see anyone really note that the performance is an issue or any other issues with docker (I don't count the vector in container as an issue as that was a misconfigured container). This makes mean concerned about investing more time into writing new plugins for vector here.

LucioFranco on 11 Feb 2020

👍2

All 31 comments

Complete log

This is my last successful attempt to reproduce the issue. Here, the output was not flooded with the error messages, but only a few were printed.

$ ../vector/target/debug/vector -c vector/test_harness.toml
Feb 07 07:24:47.024  INFO vector: Log level "info" is enabled.
Feb 07 07:24:47.024  INFO vector: Loading config. path="vector/test_harness.toml"
Feb 07 07:24:47.037  INFO vector: Vector is starting. version="0.8.0" git_version="v0.7.0-81-g5b97e4c" released="Sun, 02 Feb 2020 19:55:29 +0000" arch="x86_64"
Feb 07 07:24:47.038  INFO vector::sources::docker: Capturing logs from now on now=2020-02-07T07:24:47.038460859+03:00
Feb 07 07:24:47.039  INFO vector::sources::docker: Listening docker events
Feb 07 07:24:47.039  INFO vector::topology: Running healthchecks.
Feb 07 07:24:47.040  INFO vector::topology: Starting source "in"
Feb 07 07:24:47.040  INFO vector::topology: Starting transform "parse_json"
Feb 07 07:24:47.040  INFO vector::topology: Starting sink "out"
Feb 07 07:24:47.040 ERROR vector::topology::builder: Healthcheck: Failed Reason: Connect error: Connection refused (os error 111)
Feb 07 07:25:13.341  INFO vector::sources::docker: Started listening logs on docker container id=fb9c01cd2b32f257082e1cdfa60abed6f33d356b3b417127e9390c3c785ee05a
Feb 07 07:26:14.153  INFO vector::sources::docker: Stoped listening logs on docker container id=fb9c01cd2b32f257082e1cdfa60abed6f33d356b3b417127e9390c3c785ee05a
Feb 07 07:27:30.998  INFO vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.001 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.002  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.002  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.006 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.006  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.006  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.010 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.010  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.012  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.016 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.016  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.017  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.020 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.020  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.021  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.025 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.025  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.025  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.028 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.028  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.029  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.033 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.033  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.033  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.036 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.036  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.037  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.041 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.041  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.041  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.044 ERROR vector::sources::docker: docker API container logging error error=Docker Error: Docker Error: 501 Not Implemented
Feb 07 07:27:31.044  INFO vector::sources::docker: Stoped listening logs on docker container id=c07bf944c2deba2ec73cc7f07cc056670104b823157f46d3633f1a51349c7479
Feb 07 07:27:31.755  INFO vector::sources::docker: Started listening logs on docker container id=5a027c973aae06a106c95cebdf3193cec9f09325f737d233432f124b91ee28d6
Feb 07 07:27:42.209  INFO vector::sources::docker: Stoped listening logs on docker container id=5a027c973aae06a106c95cebdf3193cec9f09325f737d233432f124b91ee28d6
Feb 07 07:29:33.714  INFO vector::sources::docker: Started listening logs on docker container id=095aec1a1765a3295d34134b5307a7cf34ee9675879e113480429b4d8985f13e
Feb 07 07:29:34.866  INFO vector::sources::docker: Stoped listening logs on docker container id=095aec1a1765a3295d34134b5307a7cf34ee9675879e113480429b4d8985f13e
Feb 07 07:30:06.366  INFO source{name=in type=docker}: vector::sources::docker: Started listening logs on docker container id=095aec1a1765a3295d34134b5307a7cf34ee9675879e113480429b4d8985f13e
Feb 07 07:30:07.527  INFO vector::sources::docker: Stoped listening logs on docker container id=095aec1a1765a3295d34134b5307a7cf34ee9675879e113480429b4d8985f13e

MOZGIII on 7 Feb 2020

@MOZGIII do you have any more info about how you setup your docker daemon?

Looks like there is a small note I found in the logging api of docker that says

Note: This endpoint works only for containers with the json-file or journald logging drivers.

I am going to open a PR to add this note to ours docs though this should only be possible to enable if you are not using CE, since iirc these are the only supported logging drivers on CE.

But it does 501 in this case, I'd like to get a better warning in our docker code. Ill open a PR for both.

LucioFranco on 7 Feb 2020

👍1

My docker setup is pretty standard, nothing special about it. Running off of Ubuntu 18.04.

Docker CE supports all the log drivers that are listed on their website. In fact, I typically use gelf log driver for my deployments - for various reasons.

I suspect there was some kind of race condition in docker itself - this doesn't look like something we can truly fix on our end.

This may've been caused by me doing docker build. What I noticed was that we don't capture the logs when the containers are building.

Overall, I'm under the impression that ingesting the logs via the docker API is a bad idea:

it causes a significant overhead (almost everything that goes through the docker daemon does)
it is sufficient to odd errors like this one

We can implement our own, more efficient log driver for docker though. I'll do some research and create an issue.

MOZGIII on 7 Feb 2020

@MOZGIII Ok so there are a couple things to unpack here.

So first, from what I am reading it looks like if you enable anything but journald or json-file docker logs won't work and that means that the endpoint we hit also won't and will return a 501. (I found this via a nomad issue). So enabling gelf will make it so that vector can't collect logs. There might also be a bug with vector but basically its not a vector problem as you said.

So there was a reason we originally decided to go with through the docker daemon this way. One of the goals I originally had in mind was making it easy to hook up vector for simple docker deployments. Where you had access to the daemon. The idea here was to basically be similar to what docker logs does by working in the same spots that it does. Meaning, if I can get logs via docker logs I can also therefore ship logs via vector with almost zero configuration. This in theory should also work with things like docker swarm or connecting to a remote docker instance, which are both use cases I've seen before. This to me was important that the "docker" source acted similar to the cli.

The really big thing to note here is that having a docker source doesn't mean it is the only way to gather logs from docker. There are many ways to have docker in your deployment stack. In theory, you should be able to setup the syslog logging driver and route logs to our syslog source.

As for our own docker logging driver @binarylogic and I have chatted about this and I've thought about it a lot. Basically, we right now don't really have a precedent of embedding a "vector protocol" in any third party software. This has a few implications, one is that we don't even really have a vector protocol. Our vector to vector sink is quite primitive right now. So until we have an official protocol that we can embed into the docker source code or our own driver, it doesn't make much sense to make that the official docker source.

I think our end goal with docker should be to have some docker logging driver for vector that ships logs to our vector source. Then get this into the actual code base so that we can ship logs from the fargate docker daemon. This again still doesn't remove the need for a docker source imo.

LucioFranco on 7 Feb 2020

@LucioFranco thanks for the detailed response!

So first, from what I am reading it looks like if you enable anything but journald or json-file docker logs won't work and that means that the endpoint we hit also won't and will return a 501. (I found this via a nomad issue). So enabling gelf will make it so that vector can't collect logs. There might also be a bug with vector but basically its not a vector problem as you said.

So there was a reason we originally decided to go with through the docker daemon this way. One of the goals I originally had in mind was making it easy to hook up vector for simple docker deployments. Where you had access to the daemon. The idea here was to basically be similar to what docker logs does by working in the same spots that it does. Meaning, if I can get logs via docker logs I can also therefore ship logs via vector with almost zero configuration. This in theory should also work with things like docker swarm or connecting to a remote docker instance, which are both use cases I've seen before. This to me was important that the "docker" source acted similar to the cli.

The really big thing to note here is that having a docker source doesn't mean it is the only way to gather logs from docker. There are many ways to have docker in your deployment stack. In theory, you should be able to setup the syslog logging driver and route logs to our syslog source.

I'm very aware of docker logs not working with most of the log drivers, and I'm not implying I'd be using docker source if I configure docker to ship logs via gelf. Definitely, the docker source does what it does - interact with Docker API and reads the logs from it. This is great, I like that it's this option, as it indeed simplifies the configuration for the smaller deployments - it such small deployments it makes the most sense to organize log collection in a way that docker logs command still works.

That said, I probably shouldn't complain about the performance of the docker source - after all it's an opt-in.
I just wish we were more open about the downsides of using the docker source in the documentation.
One thing, in particular, that's proven to be problematic is #1506, and even after the #1513 it's not resolved. I'm pretty sure the delay is occurring internally in docker, and we can't affect it on our end. The reason I think so is that docker logs command suffers from a similar delay, and it looks like the reason for that is common - therefore must be in the docker daemon itself.
I'm pretty sure though, based on my experience with docker logs command, that under load docker daemon will be consuming a lot of CPU when we're loading the logs from it. That has to be tested though, but I'd be very hesitant with such a solution in deployments that have a log of logging going on. Maybe I'm wrong (again this has to be verified), but if not - it's better we warn our users upfront, rather than them stumbling on this after the fact.

As for our own docker logging driver @binarylogic and I have chatted about this and I've thought about it a lot. Basically, we right now don't really have a precedent of embedding a "vector protocol" in any third party software. This has a few implications, one is that we don't even really have a vector protocol. Our vector to vector sink is quite primitive right now. So until we have an official protocol that we can embed into the docker source code or our own driver, it doesn't make much sense to make that the official docker source.

I think our end goal with docker should be to have some docker logging driver for vector that ships logs to our vector source. Then get this into the actual code base so that we can ship logs from the fargate docker daemon. This again still doesn't remove the need for a docker source imo.

Unfortunately, Docker doesn't accept new log drivers into the source for a while now, but it offers a logging plugin API. With that, we won't be embedding our protocol into docker, but on the contrary - it's a vector (or a separate logging plugin, but why not the vector itself?) that'd be implementing the docker logging plugin API. It's actually a very simple thing to implement - all it takes is just an HTTP server and reading from FIFO pipes. As a bonus, it'd be possible to ship vector as docker logging driver plugin!
This should be way more efficient than reading logs from the docker API.

MOZGIII on 7 Feb 2020

Yeah, this is a spot where I think it would be good to provide a guide level explanation on how to work with docker rather than have these types of caveats within the sink docs directly. Though, I do agree we should add something to the docs, I'd like to hear what @ktff has to say about this as well.

So I would like to see a docker logging driver though I would imagine we would want it to implement our protocol rather than a generic http one? In that case, currently you can already use splunk and syslog to ship logs to vector. An issue for this would be good though!

I'd also really like to think about how we approach things like combining say the syslog source with docker's syslog driver? A guide would probably work but Im not sure if stuff like that fits into the source docs?

LucioFranco on 7 Feb 2020

So I would like to see a docker logging driver though I would imagine we would want it to implement our protocol rather than a generic http one? In that case, currently you can already use splunk and syslog to ship logs to vector. An issue for this would be good though!

Well, the HTTP server I mentioned is what docker requires as a control interface to communicate with the plugin. So, we'll have to expose three endpoints they need. From the vector point of view that would have to be a special source - so we can use any sink that we support. Having any kind of protocol other than Docker Loggin Plugin API (an HTTP server at vector end + the ability to read from FIFOs that docker tells us to) is not necessary.

I'd also really like to think about how we approach things like combining say the syslog source with docker's syslog driver? A guide would probably work but Im not sure if stuff like that fits into the source docs?

syslog is a bad idea imo. I try to avoid it as much as possible because, iirc, it only has blocking APIs, and that may be very problematic at times. Maybe that's not that big of an issue with docker-to-vector scenario though. I'd also be interested in knowing more about that, but I've used syslog for any real log aggregation scenario since I did the initial research on it. journald better, but I nonetheless prefer other docker log drivers - gelf or fluentd. In fact, I often use gelf, cause it's the most widely-adopted in the log agents, and so far yields the best performance value - both in theory and in practice.
It's a shame vector doesn't support gelf yet, as I'd hope to use it precisely with the gelf docker log driver. Actually, it might a critical feature missing for more people than just me.

MOZGIII on 7 Feb 2020

I'll definitely try splunk.

So far, it's possible to use the following docker log drivers with vector:

jsonfile
journald
syslog
splunk

I agree that it'd be very valuable adding a guide for various ways it's possible to work with docker. It would be great if we could do some research to be able to actually recommend them per use case.

MOZGIII on 7 Feb 2020

👍1

There is a useful guide somewhere deep in these details, that I'd like to figure out.

LucioFranco on 8 Feb 2020

👍1

I get it now. We will be providing guides around this. The question I have is: When would we recommend our docker source over any of the currently listed Docker drivers? Are there use cases where this ever makes sense? From what I'm reading there are negative performance implications around using the Docker API that can be avoided with a driver, correct?

binarylogic on 8 Feb 2020

@binarylogic I would say, if you're running a small say docker-compose deployment or where you are just using docker and want to ship logs via vector use the docker source. If you are in a more heavy situation where you have many containers and you can configure the docker daemon you should probably use syslog/splunk. Though I'm not sure of the actual performance numbers. Maybe you have some from your experiences @MOZGIII and @ktff.

I'd really like to see if we could collect some data on how users actually deploy docker and want to collect logs directly from docker, not through ECS/k8/nomad.

LucioFranco on 8 Feb 2020

@binarylogic that's correct. That said, sometimes the performance in not a concern, but the conform is - and in that sense using docker source still makes sense, cause it keeps docker logs command operational.

MOZGIII on 8 Feb 2020

Ok. I also think it might help to rename the source to ‘docker_daemon’ or support some of the formats mentioned here within that source, so it’s obvious to the user there are different ways to collect docker logs.

binarylogic on 8 Feb 2020

@LucioFranco I don't have concrete numbers, but I remember observing noticeable impact to the CPU usage (20-70%) on the docker daemon when I streamed logs from it at a high rate. Same with docker-proxy processes and high traffic workload, that's why I tend to use --net=host.
I can research and compare the performance effects with different drivers and vector.

In my experience, using docker for log collection makes sense mostly when you're using docker-compose or similar tools. I have a few projects that do this instead of k8s or something similar.

MOZGIII on 8 Feb 2020

@MOZGIII that sounds good, let’s defer that work though. Your current work is more important. Well open an issue and assign it to you when it’s a priority.

binarylogic on 8 Feb 2020

👍1

Ok. I also think it might help to rename the source to ‘docker_daemon’ or support some of the formats mentioned here within that source, so it’s obvious to the user there are different ways to collect docker logs.

@binarylogic I don't think it makes sense to add new formats to the docker source. The normal gelf/splunk/journald sources should work just fine with docker. I'd rather see some better ways of shipping preset configurations. This makes the most sense to provide good deaults that way - we still have an unopinionated low-level building block (sources), but we can also offer something like preset source, that'd provide just known good presets - but emit the real sources to the topology. I.e. like terraform modules. That'd be a neat way of providing the defaults.

MOZGIII on 8 Feb 2020

@binarylogic I don't think it makes sense to add new formats to the docker source. The normal gelf/splunk/journald sources should work just fine with docker. I'd rather see some better ways of shipping preset configurations. This makes the most sense to provide good deaults that way - we still have an unopinionated low-level building block (sources), but we can also offer something like preset source, that'd provide just known good presets. That'd be a neat way of providing the defaults.

Agreed, this words very well what I've been struggling with. Guides work but I feel like there is a better opportunity for preset like packages.

LucioFranco on 8 Feb 2020

Yeah, in fact I was thinking for a quite some time about the ways to manage and share configurations - I think that'd be very cool!

MOZGIII on 8 Feb 2020

I agree that 501 Not implemented error is probably caused by something in Docker. The reason being that we aren't changing our API endpoint so it should either work or not work if we are at fault, but as @MOZGIII experienced it fails sporadically. Maybe the issue is caused by the large messages that are logged.

@LucioFranco

this should only be possible to enable if you are not using CE

It shouldn't be an issue on Enterprise because of dual logging. Requirements section in docker source specification mentions this.

Guides for Docker makes sense. Cover the ways of connecting, and mention pros/cons.

Regarding docker_source performance, it will always be less performant than directly getting logs from log drivers. The reason being that Docker has to first store the logs using compatible logging driver and then read them, probably also decompress them, when we start asking for them, although after this it doesn't have to read it but just pipe it to us directly from memory, but it's a question if it's really implemented like that.

In return for the performance hit, users are getting convenience.

ktff on 9 Feb 2020

I agree that 501 Not implemented error is probably caused by something in Docker. The reason being that we aren't changing our API endpoint so it should either work or not work if we are at fault, but as @MOZGIII experienced it fails sporadically. Maybe the issue is caused by the large messages that are logged.

I have high suspicion this is cased by races in docker daemon, but it might really also be a race between our code and the Docker API. It's most likely not the large messages, but the addition/removal of the containers, paired with the container lifecycle event ordering that could render this behavior. Well, that's my best bet a least.

MOZGIII on 9 Feb 2020

This reminds me of #1525. It was also caused by a race of Vector process requesting a list of running containers, and Docker adding container, running that same Vector process, to the list of running containers.

ktff on 11 Feb 2020

@MOZGIII @ktff what do you propose as a solution, if any?

binarylogic on 11 Feb 2020

What we could do it actually read through the docker code to figure out what could be causing those issues, and change our implementation if we find a solution. But it's time-consuming and success is not guaranteed.

I think we better spend time on alternative ways of ingesting logs from docker. Like support for more log drivers supported by docker. We can also provide a separate source for reading log files that docker writes directly from disk - it might be easier and have less overhead.

Since simplicity and good defaults are major concerns for vector, maybe the most valuable thing would be to try to enable shipping vector as a docker plugin (i.e. making it installable via docker plugin install). This is closely related to implementing the docker logger plugin interface I was talking above. It might provide an even easier setup/configuration path for the users who already use docker and want to simply collect the logs. I think I saw an issue recently of someone asking about how to configure vector for gathering docker logs while vector itself is running in a container as well - so, even though we have docs, people still can get confused. It should help with that.

Oh, yeah, since I started on ways to ship - maybe publishing vector as a kubernetes addon is a good idea too.

MOZGIII on 11 Feb 2020

Yep, I'm working on the Docker guide right now since it's the first of many.

I think we better spend time on alternative ways of ingesting logs from docker. Like support for more log drivers supported by docker.

My understanding is that Docker plugins just implement a protocol. It wouldn't actually embed Vector; it would be designed to send data to a downstream Vector instance. And, if that's the case, we don't have a protocol to implement (yet). So it makes sense to leverage one of the existing plugins.

binarylogic on 11 Feb 2020

I don't think there is anything on our end we can do to fix the 501 issue besides just retrying the stream which I believe we do.

As for the plugins, I think it makes sense to hold off on a vector plugin if we can take advantage of the splunk or syslog driver.

My understanding is that Docker plugins just implement a protocol. It wouldn't actually embed Vector; it would be designed to send data to a downstream Vector instance. And, if that's the case, we don't have a protocol to implement (yet). So it makes sense to leverage one of the existing plugins.

Agreed, if we are going to market a vector docker plugin we should also be able to market some sort of log shipping protocol for vector instances.

I think I saw an issue recently of someone asking about how to configure vector for gathering docker logs while vector itself is running in a container as well - so, even though we have docs, people still can get confused. It should help with that.

This issue was interesting because it turned into the classic docker in docker situation. I am though really curious how this person deployed their code, were they just running a single node with docker-compose? At that rate, I'd imagine just running vector as a systemd process and using the docker source is much easier then trying to embed it in a docker container.

This brings me back, I am still curious how most people are deploying _docker_ directly instead of using something like nomad or kubernetes.

LucioFranco on 11 Feb 2020

My understanding is that Docker plugins just implement a protocol. It wouldn't actually embed Vector; it would be designed to send data to a downstream Vector instance. And, if that's the case, we don't have a protocol to implement (yet). So it makes sense to leverage one of the existing plugins.

Docker plugin is more like docker image / container that integrates with docker itself through one of the ways docker exposes. For instance, a plugin can register itself as a log driver, in addition to the built-in log drivers, and then users can simply enable it via --log-driver my-plugin at the docker run command. If we ship vector as a plugin, we'll have to implement the "Docker LogDriver protocol" at vector, then publish a specially crafted docker image to Docker Hub or other registry, then users will be able to just do docker plugin install vector/vector.

From a deployment perspective, I think it makes a lot of sense to just deploy vector as an agent that ships logs to something like ES this way.

MOZGIII on 11 Feb 2020

LucioFranco on 11 Feb 2020

👍2

My thinking with using vector as a docker plugin is that we'll have to package it in a way that it's preconfigured. The only source we need to enable there is that would allow it to read the docker logs (via the logging plugin interface) - and we can just add a single sink - and make it configurable via KV pairs that are available for assignment when doing docker plugin install.
It doesn't really provide the full power of vector, and I now understand why it might make sense to offload things to a dedicated vector - for additional configurability. Though there might be a way to achieve it with docker plugin too.

I share the concern that further work on docker might not even be needed - the performance (or other) issues aren't proven yet, and those would be the main justification to putting more effort into integrating with docker specifically.

Another thing that we could potentially do if performance is an issue is provide a config that works via jsonfile and its output to disk in /var/log/docker. Similar to what we do for kubernetes. Since, iirc jsonfile is the default this could be an easy way to set it up.

Yeah, that's what I meant at the second paragraph at https://github.com/timberio/vector/issues/1737#issuecomment-584738523! I just checked docker docs again, and they now have the local log driver in addition to jsonlines (which is still the default). They claim the local format is the optimized format for storing logs locally. This makes me think that they consider jsonlogs another interface to the logging system, and we can rely on as with kubernetes. The difference is, kubernetes documentation explicitly states reading logs from files as a recommended option (so it's a documented integration point), while I never could find anything similar in the docker docs. I think this would be the most efficient way actually - even with docker log driver plugin we'll be relying on docker doing some work to orchestrate the interaction with vector, while reading the log files from disk should let us to just rely on kernel, and avoid communicating to docker daemon. Well, that is unless we want to enhance the logs with additional metadata - but that might not be required, and I could easily live without it in most (if not all) of my deployments.

MOZGIII on 12 Feb 2020

what do you propose as a solution, if any?

Not really a solution, but we could decrease our pressure on Docker server by having a longer delay between receiving the error and retrying.

@MOZGIII when 501 Not Implemented started to happen, did it ever end? If not, there is a chance that our constant retrying is messing with the Docker server.

From users position, a vector docker plugin would need to be justified with a good performance benefit, otherwise it's just as any other log driver, that we can interact with, but with an extra installation step.

The difference is, kubernetes documentation explicitly states reading logs from files as a recommended option (so it's a documented integration point), while I never could find anything similar in the docker docs.

Such a source would only work with jsonfile log driver, and ,without specification/guarantee, would have higher maintenance cost.

ktff on 12 Feb 2020

@MOZGIII when 501 Not Implemented started to happen, did it ever end? If not, there is a chance that our constant retrying is messing with the Docker server.

I had two instances if it:

first one did repeat very frequently and didn’t seem to be stopping, had to terminate vector
second one was once every half a second, and it stopped after a couple times
After those two I wasn’t able to reproduce the problem.

From users position, a vector docker plugin would need to be justified with a good performance benefit, otherwise it's just as any other log driver, that we can interact with, but with an extra installation step.

True, but for some people that’s just that they need. In half my deployments, log agent doesn’t contain any logic besides parsing json and sending it over to elasticsearch. I used to use fluentd and logstash, and often deploy it as a container. What I usually have to do it to take the standard image and build one based off of it with the config file and some lightweight templating solution to substitute parameters (like ES address) from the env vars at “runtime”. If we offer vector in a form like that - it might save our users not just an installation step, but also the efforts of building their own containers with configuration. It’s not as trivial as it sounds really.

MOZGIII on 12 Feb 2020

So I think this has gotten a bit off-topic but I think the conclusion is that the 501 is not something we can fix and we handle retries correctly. So I think we can close this issue.

@MOZGIII If you're up to it, I'd love to see an issue describing potential solutions to a embeded vector/docker logging plugin! I think you have some good ideas you've presented and I agree with your ideas!

LucioFranco on 12 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Define how metrics are converted to logs

binarylogic · 3Comments

Version mismatch in binary output

jamtur01 · 3Comments

New `gcp_cloud_storage` sink

trK54Ylmz · 3Comments

macOS build does not use Security.framework for TLS CA lookup

lewisthompson · 3Comments

New `http` source

binarylogic · 4Comments