Vector: Get K8S pod/namespace/containername from container event log

Created on 2 Sep 2020 · 14Comments · Source: timberio/vector

Vector Version

nightly-alpine

Vector Configuration File

[transforms.k8s_metadata]

Debug Output

Expected Behavior

I am using source.file to include /var/lib/docker/containers/*/*.log
Now I am expecting I would config vectordev such a way that, I can add filed for pod/namespace/containername
which will tell event log belongs to which k8s pod & namespace etc....

How achieve this in vectordev. I am using nightly build which I belive support k8s.

Actual Behavior

Example Data

Additional Context

References

kubernetes bug help

Source

abhi-paul

Most helpful comment

kubernetes_log should work with all log-compatible sinks.

We're planning to release kubernetes_log with the upcoming v0.11. Until then it only available on nightly-* builds.

MOZGIII on 2 Sep 2020

👍2

All 14 comments

It looks like you're deploying vector with custom YAMLs, consider using our deployment examples: https://github.com/timberio/vector-k8s-examples

Our Helm Chart and recommended YAML configs come with preconfigured kubernetes_logs source. If you want a custom config, the kubernetes_logs source should be used instead of the file source.

We currently don't have a dedicated transform to handle metadata - this functionality is not built-in to the kubernetes_logs source. We're planning to re-add the transform in the future to better support the sidecar deployment model.

MOZGIII on 2 Sep 2020

kubernetes_log as sources what would be the available options it have? For what purpose we can use this? Can I read info about container /pod/ namespace name using this option?

On helm example values.yaml only have sink part what would be the sources part & it's options?

abhi-paul on 2 Sep 2020

It will collect the logs from all Pods across all the k8s cluster Nodes. Vector has to be deployed as a DaemonSet though.

As a result, you'll get events that can conceptually represented in the JSON format as below:

{"file":"/var/log/pods/kube-system_coredns-66bff467f8-pb4vh_f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4/coredns/0.log","kubernetes":{"pod_labels":{"k8s-app":"kube-dns","pod-template-hash":"66bff467f8"},"pod_name":"coredns-66bff467f8-pb4vh","pod_namespace":"kube-system","pod_uid":"f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4"},"message":".:53","source_type":"kubernetes_logs","stream":"stdout","timestamp":"2020-08-14T12:58:55.598884439Z"}
{"file":"/var/log/pods/kube-system_coredns-66bff467f8-pb4vh_f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4/coredns/0.log","kubernetes":{"pod_labels":{"k8s-app":"kube-dns","pod-template-hash":"66bff467f8"},"pod_name":"coredns-66bff467f8-pb4vh","pod_namespace":"kube-system","pod_uid":"f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4"},"message":"[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7","source_type":"kubernetes_logs","stream":"stdout","timestamp":"2020-08-14T12:58:55.599356223Z"}

In this example, the events will correlate to the output you see when you do kubectl logs -n kube-system coredns (just a sample pod for the purposes of demonstration).

As you can see, there's the log line, the name of the file from which this line was collected, and the metadata - pod name, namespace name, pod labels, etc.

kubernetes_log doesn't require any options, but you have to deploy Vector with the corresponding env vars. We'll release the documentation soon, but for now please use the examples I linked above.

Helm has a built-in kubernetes_log source under the name kubernetes_log, you don't have to specify it yourself, and can just assume it's there.

MOZGIII on 2 Sep 2020

👍1

OK. One more thing I assume kubernetes_log will work with all kind of sinks type available?
This kubernetes_log is available only with nightly-alpine or also in 0.10.0-alpine as well?

abhi-paul on 2 Sep 2020

kubernetes_log should work with all log-compatible sinks.

We're planning to release kubernetes_log with the upcoming v0.11. Until then it only available on nightly-* builds.

MOZGIII on 2 Sep 2020

👍2

Thanks for this update. Last question is Daemonset needs to be run as vectordev ServiceAccount & RBAC is it so? OR default account & role is enough?

abhi-paul on 2 Sep 2020

Well, technically we only require RBAC privileges, you have freedom on how to configure them. But if you're referring to our premade distribution configs (YAML or Helm) - RBAC and ServiceAccount are required.
We're using the permissions to talk to the k8s API to obtain metadata from it.

MOZGIII on 2 Sep 2020

[sources.kubernete_log]
type=kubernetes_log

after using this what all I observed

it is reading pod log. If pod have multiple container then how it will read? On above example I can see container information & it's log is missing.
pod belongs to which k8s node that info not present.
I am using splunk_hec as sink. Is it something I am missing in the vector confirm which will expose those missing information?

abhi-paul on 3 Sep 2020

The code is supposed to read the logs for all the containers in the Pod.
Could you elaborate please, how is the log missing in the example? I don’t get it.

Yes, we don’t currently put the information on which node the Pod is running into the log event. It is something we should probably add though, I’ll add it our backlog.

MOZGIII on 3 Sep 2020

From your above output on coredns pod which field will tell the container name.
kubectl logs -n kube-system coredns

Util the backlog resolve do you have any suggestion to achive this using transforms?

When to expect 0.11.0 officially any timeline?

abhi-paul on 3 Sep 2020

When to expect 0.11.0 officially any timeline?

We're working on it :smile: CC @binarylogic

From your above output on coredns pod which field will tell the container name.
kubectl logs -n kube-system coredns

Sorry, I think that example I gave above is incorrect. Here's what I get then I run a fixed version:

$ kubectl logs -n kube-system deployment/coredns
Found 2 pods, using pod/coredns-66bff467f8-tmzrz
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[ERROR] plugin/errors: 2 stats.grafana.org. A: read udp 10.32.0.11:60572->192.168.0.1:53: i/o timeout

So, it doesn't show the container name.

I estimated adding a container name into the log event before, and it's doable. It is currently possible to parse the file field to get the container name:

For example, /var/log/pods/kube-system_coredns-66bff467f8-pb4vh_f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4/coredns/0.log can be interpreted as var/log/pods/<namespace>_<pod name>_<pod uid>/<container name>/<n>.log, where <container name> is coredns. You could use a regex_parser transform to manually parse the file field into those pieces.
This is also something we should add, I'll create an issue for it.

Util the backlog resolve do you have any suggestion to achive this using transforms?

Regarding the container name - I've already mentioned it above.

Adding Node name is also possible since it's available as an env var at the Vector DaemonSet!

You'd need to use the add_field transform like this:

[transforms.add_k8s_node_name]
  type = "add_fields"
  inputs = ["kubernete_log"]
  fields.kubernetes_node_name = "${VECTOR_SELF_NODE_NAME}"

MOZGIII on 3 Sep 2020

For sources.types=kubernetes_log what all other options it have? Can we exclude logs also?

abhi-paul on 8 Sep 2020

Since docs are under construction, you can check out the code for now:

https://github.com/timberio/vector/blob/9c87e8a172265068dbeb815a3331abda39a242e7/src/sources/kubernetes_logs/mod.rs#L52-L65

self_node_name - should be set to the env var value by default, you should probably skip it;
auto_partial_merge - setting this to false will disable automatic merging of the partial events; it is true by default;
annotation_fields - this allows changing the keys under which the metadata is stored.

It's possible to exclude the Pods from the collection by annotating them with a certain label: vector.dev/exclude: "true", see our own config for example: https://github.com/timberio/vector/blob/9c87e8a172265068dbeb815a3331abda39a242e7/distribution/kubernetes/vector-namespaced.yaml#L27-L31

The configuration above make vector ignore it's own logs. Without it, we'll be processing our own logs, which might cause feedback loop, and we don't want that.

MOZGIII on 9 Sep 2020

Since we're created more specific issues for particular tasks, I'm going to close this issue now. Please reopen if you have further questions.
Also, feel free to talk to us in chat at http://chat.vector.dev.

MOZGIII on 9 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings