Vector: Improve `docker` source's event metadata

Created on 17 Oct 2019  路  5Comments  路  Source: timberio/vector

I feel there's a missed opportunity the the docker source does not enrich the logs by adding Docker container labels (and possibly other metadata) as fields; either by giving an option to provide a set of labels to include as fields (if present), or just an option to include all/none of them.

For context, we are running vector on Kubernetes and are collecting Kubernetes container logs.
Kubernetes passes k8s labels and annotations down to docker containers as labels meaning app name, k8s namespace, pod name, and other metadata are all available from docker to be added as fields with no Kubernetes interaction required.

In terms of other use cases, docker-compose also provides service-name labels and the syntax has a label directive, allowing for fields to be set at the top in a docker-compose.yml file and then be passed all the way through to the log entries.

Right now we are using a file source with /var/log/containers/*.log and can already parse the pod_name and k8s_namespace as fields from the filename, which ironically gives us more context.

docker enhancement

Most helpful comment

If by kube logs you mean whats in /var/log/pods, the available fields are stream (stdout/stderr), log (message), and time. So enrichment is still needed for those logs to know what app logged an event.

With regards to a kubernetes source, I don't think it would be important to include docker-level object labels so long as the k8s labels and metadata are pulled from k8s itself. It's more of an either/or situation as I expect they would have similar data.

Ultimately, implementation details aside, I think the goal is to be able to retrieve container logs with enough field information to determine which app/namespace/pod a log event came from (similar to what the fluentd k8s metadata plugin does).

All 5 comments

@zcapper Hi! This is something we 100% want to have in our docker source. We are just about to close out our first pass at initial container support. Once that is done we will be providing more transforms/sources that can fetch the proper metadata and enrich events.

As for kubernetes we are about to merge #893 which brings Kubernetes support just like how you all have already setup. Following this, we will be providing much better metadata enrichment for events based on the k8 api.

How important is adding docker object labels to the Kubernetes source? I'm not actually sure if those come through in the kube logs or not.

If by kube logs you mean whats in /var/log/pods, the available fields are stream (stdout/stderr), log (message), and time. So enrichment is still needed for those logs to know what app logged an event.

With regards to a kubernetes source, I don't think it would be important to include docker-level object labels so long as the k8s labels and metadata are pulled from k8s itself. It's more of an either/or situation as I expect they would have similar data.

Ultimately, implementation details aside, I think the goal is to be able to retrieve container logs with enough field information to determine which app/namespace/pod a log event came from (similar to what the fluentd k8s metadata plugin does).

I don't think it would be important to include docker-level object labels so long as the k8s labels and metadata are pulled from k8s itself.

Right, thats what I was thinking but I wanted to make sure.

Ultimately, implementation details aside, I think the goal is to be able to retrieve container logs with enough field information to determine which app/namespace/pod a log event came from (similar to what the fluentd k8s metadata plugin does).

Yup, this is the goal but will not be shipped with #893 right away and will come in follow up PRs.

Thanks for the input! 馃槃

This improvement should attempt to enrich all containers that the docker source is fetching logs from. To add more metadata we should do this when we first fetch a list of containers here. From this point, we should pass more container metadata down into the ContainerInfo struct. All of the metadata values can be extracted from the shiplift::rep::Container struct.

The metadata that should be included on each event:

  • [ ] labels which can be stored as nested key values under the label or docker_label key
  • [ ] image should contain the image name
  • [ ] names should contain a list of container names
  • [ ] created_at should contain the time at which this container was created

cc @binarylogic @lukesteensen thoughts?

@LucioFranco those sound good to me!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

binarylogic picture binarylogic  路  3Comments

valyala picture valyala  路  3Comments

jhgg picture jhgg  路  4Comments

a-rodin picture a-rodin  路  3Comments

binarylogic picture binarylogic  路  3Comments