Vector: Add additional filters to `docker` source

Created on 7 Oct 2019  路  3Comments  路  Source: timberio/vector

Currently, the docker source supports a limited subset of filters to filter the list of running containers on. We should add support for more to allow users provide more flexible filters.

  • [x] Include container image name
  • [ ] Exclude container id
  • [ ] Exclude container label
  • [ ] Exclude container image name
should docker enhancement

Most helpful comment

I think we need to reopen this, because #1057 is closed, but currently only inclusion, and not exclusion is implemented.

The exclusion is particularly important if we want to run Vector itself in a container, watch for all other containers, and use console sink to debug Vector.

For example, with the following configuration

docker-compose.yaml

version: '3'

services:
  vector:
    image: timberio/vector:nightly-alpine
    volumes:
      - ./vector.toml:/etc/vector/vector.toml
      - /var/run/docker.sock:/var/run/docker.sock

vector.toml

data_dir = "/var/data/vector"

[sources.in]
type = "docker"

[sinks.out]
type = "console"
inputs = ["in"]
encoding = "json"

an infinite loop is created, so that docker-compose up prints increasingly large nested messages looking like this:

vector_1  | {"timestamp":"2020-01-13T14:18:32.552451888Z","stream":"stdout","container_created_at":"2020-01-13T14:18:31.842410563Z","label":{"com":{"docker":{"compose":{"project":"example","config-hash":"23067fcf9e45c928b39f2b55505e977dbe3a120e934354f3c47d617adc87eec1","version":"1.21.0","service":"vector","oneoff":"False","container-number":"1"}}}},"image":"timberio/vector:nightly-alpine","container_name":"example_vector_1","message":"{\"image\":\"timberio/vector:nightly-alpine\",\"timestamp\":\"2020-01-13T14:18:32.527868912Z\",\"container_created_at\":\"2020-01-13T14:18:31.842410563Z\",\"label\":{\"com\":{\"docker\":{\"compose\":{\"container-number\":\"1\",\"project\":\"example\",\"oneoff\":\"False\",\"version\":\"1.21.0\",\"service\":\"vector\",\"config-hash\":\"23067fcf9e45c928b39f2b55505e977dbe3a120e934354f3c47d617adc87eec1\"}}}},\"stream\":\"stdout\",\"container_name\":\"example_vector_1\",\"container_id\":\"892cd26fb6a0696e4b90b962c3f48aa863b80e65ea8bed26c59b61032dcc840c\",\"message\":\"Jan 13 14:18:32.527  INFO vector: Loading config. path=\\\"/etc/vector/vector.toml\\\"\"}","container_id":"892cd26fb6a0696e4b90b962c3f48aa863b80e65ea8bed26c59b61032dcc840c"}

All 3 comments

@LucioFranco this is a duplicate of https://github.com/timberio/vector/issues/1057. Can you update #1057 with any additional details missing?

I think we need to reopen this, because #1057 is closed, but currently only inclusion, and not exclusion is implemented.

The exclusion is particularly important if we want to run Vector itself in a container, watch for all other containers, and use console sink to debug Vector.

For example, with the following configuration

docker-compose.yaml

version: '3'

services:
  vector:
    image: timberio/vector:nightly-alpine
    volumes:
      - ./vector.toml:/etc/vector/vector.toml
      - /var/run/docker.sock:/var/run/docker.sock

vector.toml

data_dir = "/var/data/vector"

[sources.in]
type = "docker"

[sinks.out]
type = "console"
inputs = ["in"]
encoding = "json"

an infinite loop is created, so that docker-compose up prints increasingly large nested messages looking like this:

vector_1  | {"timestamp":"2020-01-13T14:18:32.552451888Z","stream":"stdout","container_created_at":"2020-01-13T14:18:31.842410563Z","label":{"com":{"docker":{"compose":{"project":"example","config-hash":"23067fcf9e45c928b39f2b55505e977dbe3a120e934354f3c47d617adc87eec1","version":"1.21.0","service":"vector","oneoff":"False","container-number":"1"}}}},"image":"timberio/vector:nightly-alpine","container_name":"example_vector_1","message":"{\"image\":\"timberio/vector:nightly-alpine\",\"timestamp\":\"2020-01-13T14:18:32.527868912Z\",\"container_created_at\":\"2020-01-13T14:18:31.842410563Z\",\"label\":{\"com\":{\"docker\":{\"compose\":{\"container-number\":\"1\",\"project\":\"example\",\"oneoff\":\"False\",\"version\":\"1.21.0\",\"service\":\"vector\",\"config-hash\":\"23067fcf9e45c928b39f2b55505e977dbe3a120e934354f3c47d617adc87eec1\"}}}},\"stream\":\"stdout\",\"container_name\":\"example_vector_1\",\"container_id\":\"892cd26fb6a0696e4b90b962c3f48aa863b80e65ea8bed26c59b61032dcc840c\",\"message\":\"Jan 13 14:18:32.527  INFO vector: Loading config. path=\\\"/etc/vector/vector.toml\\\"\"}","container_id":"892cd26fb6a0696e4b90b962c3f48aa863b80e65ea8bed26c59b61032dcc840c"}

@a-rodin implementation of docker source has the logic to exclude itself, so that's a bug.

I have confirmed that it happens because the initial list of running containers, that source fetches, has a chance of not containing the container of the currently running vector which is fetching the list. And as the implementation assumes that, if it isn't present in that list, then it self isn't in docker, it doesn't check later containers for itself.

I'll push a PR which fixes this.

And regarding the exclude options, they would be nice to have, but, besides this case, I don't see any real use case for them. Reason being that they can't be used for optimization, and their functionality is better suited for filter transforms.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jhgg picture jhgg  路  4Comments

binarylogic picture binarylogic  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

leebenson picture leebenson  路  3Comments