nightly-alpine
[transforms.k8s_metadata]
I am using source.file to include /var/lib/docker/containers/*/*.log
Now I am expecting I would config vectordev such a way that, I can add filed for pod/namespace/containername
which will tell event log belongs to which k8s pod & namespace etc....
How achieve this in vectordev. I am using nightly build which I belive support k8s.
It looks like you're deploying vector with custom YAMLs, consider using our deployment examples: https://github.com/timberio/vector-k8s-examples
Our Helm Chart and recommended YAML configs come with preconfigured kubernetes_logs source. If you want a custom config, the kubernetes_logs source should be used instead of the file source.
We currently don't have a dedicated transform to handle metadata - this functionality is not built-in to the kubernetes_logs source. We're planning to re-add the transform in the future to better support the sidecar deployment model.
kubernetes_log as sources what would be the available options it have? For what purpose we can use this? Can I read info about container /pod/ namespace name using this option?
On helm example values.yaml only have sink part what would be the sources part & it's options?
It will collect the logs from all Pods across all the k8s cluster Nodes. Vector has to be deployed as a DaemonSet though.
As a result, you'll get events that can conceptually represented in the JSON format as below:
{"file":"/var/log/pods/kube-system_coredns-66bff467f8-pb4vh_f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4/coredns/0.log","kubernetes":{"pod_labels":{"k8s-app":"kube-dns","pod-template-hash":"66bff467f8"},"pod_name":"coredns-66bff467f8-pb4vh","pod_namespace":"kube-system","pod_uid":"f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4"},"message":".:53","source_type":"kubernetes_logs","stream":"stdout","timestamp":"2020-08-14T12:58:55.598884439Z"}
{"file":"/var/log/pods/kube-system_coredns-66bff467f8-pb4vh_f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4/coredns/0.log","kubernetes":{"pod_labels":{"k8s-app":"kube-dns","pod-template-hash":"66bff467f8"},"pod_name":"coredns-66bff467f8-pb4vh","pod_namespace":"kube-system","pod_uid":"f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4"},"message":"[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7","source_type":"kubernetes_logs","stream":"stdout","timestamp":"2020-08-14T12:58:55.599356223Z"}
In this example, the events will correlate to the output you see when you do kubectl logs -n kube-system coredns (just a sample pod for the purposes of demonstration).
As you can see, there's the log line, the name of the file from which this line was collected, and the metadata - pod name, namespace name, pod labels, etc.
kubernetes_log doesn't require any options, but you have to deploy Vector with the corresponding env vars. We'll release the documentation soon, but for now please use the examples I linked above.
Helm has a built-in kubernetes_log source under the name kubernetes_log, you don't have to specify it yourself, and can just assume it's there.
OK. One more thing I assume kubernetes_log will work with all kind of sinks type available?
This kubernetes_log is available only with nightly-alpine or also in 0.10.0-alpine as well?
kubernetes_log should work with all log-compatible sinks.
We're planning to release kubernetes_log with the upcoming v0.11. Until then it only available on nightly-* builds.
Thanks for this update. Last question is Daemonset needs to be run as vectordev ServiceAccount & RBAC is it so? OR default account & role is enough?
Well, technically we only require RBAC privileges, you have freedom on how to configure them. But if you're referring to our premade distribution configs (YAML or Helm) - RBAC and ServiceAccount are required.
We're using the permissions to talk to the k8s API to obtain metadata from it.
[sources.kubernete_log]
type=kubernetes_log
after using this what all I observed
The code is supposed to read the logs for all the containers in the Pod.
Could you elaborate please, how is the log missing in the example? I don鈥檛 get it.
Yes, we don鈥檛 currently put the information on which node the Pod is running into the log event. It is something we should probably add though, I鈥檒l add it our backlog.
From your above output on coredns pod which field will tell the container name.
kubectl logs -n kube-system coredns
Util the backlog resolve do you have any suggestion to achive this using transforms?
When to expect 0.11.0 officially any timeline?
When to expect 0.11.0 officially any timeline?
We're working on it :smile: CC @binarylogic
From your above output on
corednspod which field will tell the container name.
kubectl logs -n kube-system coredns
Sorry, I think that example I gave above is incorrect. Here's what I get then I run a fixed version:
$ kubectl logs -n kube-system deployment/coredns
Found 2 pods, using pod/coredns-66bff467f8-tmzrz
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[ERROR] plugin/errors: 2 stats.grafana.org. A: read udp 10.32.0.11:60572->192.168.0.1:53: i/o timeout
So, it doesn't show the container name.
I estimated adding a container name into the log event before, and it's doable. It is currently possible to parse the file field to get the container name:
For example, /var/log/pods/kube-system_coredns-66bff467f8-pb4vh_f0b1e0d0-6ea8-4ae6-8d72-17b9048d15f4/coredns/0.log can be interpreted as var/log/pods/<namespace>_<pod name>_<pod uid>/<container name>/<n>.log, where <container name> is coredns. You could use a regex_parser transform to manually parse the file field into those pieces.
This is also something we should add, I'll create an issue for it.
Util the backlog resolve do you have any suggestion to achive this using
transforms?
Regarding the container name - I've already mentioned it above.
Adding Node name is also possible since it's available as an env var at the Vector DaemonSet!
You'd need to use the add_field transform like this:
[transforms.add_k8s_node_name]
type = "add_fields"
inputs = ["kubernete_log"]
fields.kubernetes_node_name = "${VECTOR_SELF_NODE_NAME}"
For sources.types=kubernetes_log what all other options it have? Can we exclude logs also?
Since docs are under construction, you can check out the code for now:
self_node_name - should be set to the env var value by default, you should probably skip it;auto_partial_merge - setting this to false will disable automatic merging of the partial events; it is true by default;annotation_fields - this allows changing the keys under which the metadata is stored.It's possible to exclude the Pods from the collection by annotating them with a certain label: vector.dev/exclude: "true", see our own config for example: https://github.com/timberio/vector/blob/9c87e8a172265068dbeb815a3331abda39a242e7/distribution/kubernetes/vector-namespaced.yaml#L27-L31
The configuration above make vector ignore it's own logs. Without it, we'll be processing our own logs, which might cause feedback loop, and we don't want that.
Since we're created more specific issues for particular tasks, I'm going to close this issue now. Please reopen if you have further questions.
Also, feel free to talk to us in chat at http://chat.vector.dev.
Most helpful comment
kubernetes_logshould work with all log-compatible sinks.We're planning to release
kubernetes_logwith the upcoming v0.11. Until then it only available onnightly-*builds.