Vector: New `kubernetes_pod_metadata` transform

Created on 22 Oct 2019 · 26Comments · Source: timberio/vector

We should provide a kubernetes_pod_metadata transform that is able to fetch a list of pods and enrich events. This should work based off of the kube openapi spec using the [k8-openapi] crate.

Configuration

[transforms.my_transform_id]
  type = "kubernetes_pod_metadata"
  fields = ["instance-id", "local-hostname", "local-ipv4", "public-hostname", "public-ipv4"] # default
  namespace = ["your-namespace"] # optional, default is to list ALL pods in all namespaces

Options

fields contains the list of fields to include in each event.
~~namespace contains a list of namespaces that we should watch for a list of pods and their metadata.~~ See global schema for this field.

Fields

We should provide this set of fields by default and they are overridable via the fields option.

pod_name
pod_namespace
pod_uid
labels (not sure if we should provide a way to include the label name in the key or have them as a tuple?)
annotations
node_name

Most of these data points should come from a combination of [PodSpec] and [ObjectMeta] using the [Pod::list_namespaced_pod].

Global Schema

log_schema should add a kubernetes_namespace field to correlate where nested fields should be added to events. This should replace the namespace transform option.

Implementation

This should use the [k8-openapi] crate, plus the [kube] crate, though I would prefer we stick with just hyper and vendor the config loading. This implementation should follow the proposed method for implementing a stream table join transforms as seen in #1069.

Questions

How should we nest these? Should they live under a kubernetes key?
How can we correlate pod events coming from the kubernetes source with the ones on the list? @ktff

Todo

[x] WatchClient
[ ] kubernetes_pod_metadata

feature

Source

LucioFranco

👍3

Most helpful comment

@lukesteensen yeah, I'm not convinced, either way, the reason I like it as a transform is that this is an additional thing. What I mean by that this transform would require access to the kube api which we do not require with the kube source. Because of this, I prefer the more composable way to set things up instead of providing one very thick source that can do everything.

LucioFranco on 24 Oct 2019

👍3

All 26 comments

How can we correlate pod events coming from the kubernetes source with the ones on the list? @ktff

So to give background on this issue. Different kube implementations like EKS and DKS provide different ways they format the pod_uid. Because of this we can't reliably and portably parser the pod_uid since it might change. This leaves us in a tough spot when it comes to correlating pod metadata from the kube api which does not return the full pod_uid but just returns pieces. Because of this, we need to find a strategy to correlate the full pod_uid with the specific pod it comes from in the list that is returned from the kube api.

@ktff and I talked today about this offline. We came to the conclusion the best idea here would be to attempt to fetch the ObjectMeta::uid field and attempt to find an occurrence of this in incoming kubernetes source events. We can then store a HashMap that goes from pod_uid -> uid/metadata to quickly enrich events. This should be the most portable solution. We should continue to explore how different kube setups create their pod_uid to ensure that this will stay forwards compatible.

@ktff please correct me if any of this sounds wrong

LucioFranco on 23 Oct 2019

👍1

cc @lukesteensen as well, I'd like to get your quick +1 on this.

LucioFranco on 23 Oct 2019

Different kube implementations like EKS and DKS provide different ways they format the pod_uid

Can you provide more specifics and/or examples?

Because of this we can't reliably and portably parser the pod_uid since it might change.

What might change? The pod_uid or the implementation?

This leaves us in a tough spot when it comes to correlating pod metadata from the kube api which does not return the full pod_uid but just returns pieces.

What do you mean by "pieces"? Can you provide an example?

Because of this, we need to find a strategy to correlate the full pod_uid with the specific pod it comes from in the list that is returned from the kube api.

I don't fully follow this sentence.

It'd be nice if we could see actual examples. If you've got it, feel free to ignore this, but I think it'll be hard for anyone to help without knowing these.

binarylogic on 23 Oct 2019

Yeah, I need to collect samples but that requires us to spin up EKS, DKS, GKE, etc which takes some time.

So pod_uid which we collect from the log files on each node via the kubernetes source gives us something like <pod namespace>_<pod name>_<object uid>. The issue is that this format is not defined anywhere in the kube spec as far as I know. What makes this worse is that @ktff noticed that EKS is different from minikube, etc. This means that there is a possibility that this pod_uid we collect from the logs might change over time and is not consistent. And the pod_uid is not stored in the pod data returned from the kube api but components of it are stored there. So there is no way to guarantee that say, the object uid is always going to be at the end of the pod_uid.

This is important because we need some way to correlate events with the metadata fetched from the kube api to allow us to enrich it with more information. So before we can start on implementing this issue we need to ensure that we find a proper strategy for detecting which pod object an event belongs too in a portable and forward compat way.

I think we found a good solution above but we will need to make sure it will work across all the platforms.

LucioFranco on 23 Oct 2019

Got it. Is there any prior art we can look at as well? I have to assume other people have solved this also. And if they didn't address the above concerns, it would be a good opportunity for a correctness test in our test harness.

binarylogic on 23 Oct 2019

@binarylogic Yeah, agreed I will report back with what I find.

LucioFranco on 23 Oct 2019

Ok, I think I have a better approach to this and I'd like some feedback.

So let's rephrase this question, what we need is some way to consistently across different implementations identify a pod and container pair. The issue arises via the kubernetes source not having a clear way to extract this. We can extract them from the pod_uid but the format may not be the same across different kube implementations.

My proposal is to use a regex to extract this information and do this on a best effort basis. On top of this, we should provide ready-made regexes for the different platforms so that vector can extract this data out of the box. If for some reason these pod_uid formats change we can then allow the user to change the default one with a custom one. This would support all future versions as well.

As this is an improvement for the kubernetes source I'd like to open an issue for that then update this spec to describe how we can use the combination of pod name, pod object uid, and container name to enrich the metadata.

@ktff I think this is a much more flexible solution, what do you think?

LucioFranco on 23 Oct 2019

Some prior art:

https://github.com/fluent/fluent-bit/blob/master/conf/parsers.conf#L107

LucioFranco on 23 Oct 2019

I think the first solution would give better UX in cases where our best effort fails, and users best effort fails. Also the second solution will have greater maintenance cost in the form of detecting when we should update regexes, but it is much simpler to implement.

ktff on 24 Oct 2019

I think the second solution being much simpler is a better start. if we start seeing that the maintenance is too much we can switch.

LucioFranco on 24 Oct 2019

👍1

Yeah, I need to collect samples but that requires us to spin up EKS, DKS, GKE, etc which takes some time.

I'd be curious to see some examples of how they differ. We might be able to do something clever if there are enough similarities to exploit.

A higher-level question: does this more make sense as a separate transform or as part of the k8s source? I'm not sure, it just feels like there will be some duplicated config and I don't know what the use case would look like for this without the k8s source. Sources are also currently less limited than transforms for the stream-join stuff.

lukesteensen on 24 Oct 2019

LucioFranco on 24 Oct 2019

👍3

This is blocked by https://github.com/timberio/vector/issues/1060, once that is done, we can start work on this.

LucioFranco on 5 Nov 2019

👍1

Kubernetes (at least mine) sets docker labels like this:

io.kubernetes.pod.namespace
io.kubernetes.pod.name
io.kubernetes.container.name

If you hack the docker source a little, to inject docker container labels into each log event, then the kubernetes transform can use these fields to pull up necessary information.

ikatson on 8 Nov 2019

@ikatson thank you for the idea, but unfortunately, as you said, you have such version of it. Generally it isn't portable in ways:

It works only for Kubernetes backed by Docker
It depends on implementation of Kubernetes, but this wouldn't be a problem if it is in some Kubernetes specification.

And even if Kubernetes does this for every container runtime, and has specified that it will do so, it's still a problem to get the labels without the information that we want to extract from the labels.

ktff on 8 Nov 2019

@LucioFranco we should also add image field by default, as source won't be providing it.

ktff on 4 Jan 2020

@LucioFranco for which use cases do you see namespace option being used?

I ask because, it seems as it's only for optimization, and we will already only watch the pods from the local Node, so it seems to me as unnecessary.

ktff on 14 Jan 2020

@ktff I think one use case might be to limit the namespaces we need to fetch, since iirc it is a separate api call per namespace. So its kinda like a filter on which namespaces we will enrich.

LucioFranco on 14 Jan 2020

@LucioFranco so I don't think it's necessary. While it's true that with it, metadata for a few Pods per Node won't need to be fetched, watched for, it's increasing the number of times we need to communicate with the server, without it we only need 2 requests (using GET /api/v1/pods), while with it we need 2*|namespace| requests (using GET /api/v1/namespaces/{namespace}/pods). It's also questionable if users will bother with it.

ktff on 14 Jan 2020

@ktff I was imagining it being more like if we get an event with a namespace that is not included in that list then we don't enrich it. So that wouldn't require any extra, but I guess if we just fetch all the pods at once that can work too! I think I was basing off the original design off of the second endpoint. Happy to go with what you think is correct just make a note of it in the pr description.

LucioFranco on 14 Jan 2020

👍1

@ktff do you think there would be benefit in changing the labels field to be called something like kubernetes_labels? or k8_labels?

LucioFranco on 13 Feb 2020

@LucioFranco yes, some prefix should be there.

I think we should use prefixes which make it clear what object are fields describing. Main reason being that various Kubernetes objects can have id, labels, annotations, etc. If not, there would be ambiguity once we add #1293 and #1424.

In the case of this transform, pod prefix could be used for fields describing Pods, node prefix could be used for fields describing Nodes, and etc.

We could also either just append prefix like pod_labels, or put it, like you mentioned, under a key like pod.labels.

ktff on 14 Feb 2020

@ktff noting that I added a global schema item to the spec that should change how we handle nesting fields.

LucioFranco on 17 Feb 2020

👍1

Would really appreciate this transform ... looking to move off FluentBit but the https://fluentbit.io/documentation/0.13/filter/kubernetes.html filter gives alot of useful data.