Vector: kubernetes_logs corrupts K8s labels with dots in their names

Created on 27 Aug 2020  路  8Comments  路  Source: timberio/vector

Vector Version

Nightly build from 2020-08-27 as you haven't fully released the new K8s source yet...

vector 0.11.0 (gb2f9a09 x86_64-unknown-linux-gnu 2020-08-27)

Vector Configuration File

data_dir = "/var/tmp/vector"

[sources.kubernetes_logs]
  type = "kubernetes_logs"
  auto_partial_merge = true
  self_node_name = "${VECTOR_SELF_NODE_NAME}"
  [sources.kubernetes_logs.annotation_fields]
    pod_labels = "labels"
    pod_name = "pod"
    pod_namespace = "namespace"
    pod_uid = "pod_uid"

[sinks.debug]
  inputs = ["kubernetes_logs"]
  type = "console"
  target = "stdout"
  encoding = "json"

Debug Output

skipped for now - not needed, relevant snippet with prettifying:

{
    "file": "/var/log/pods/kube-system_ebs-csi-node-wtxnw_105dcb2d-b14f-4869-80da-ecae53d4f4cf/liveness-probe/0.log",
    "labels": {
        "app": {
            "kubernetes": {
                "io/instance": "aws-ebs-csi",
                "io/name": "aws-ebs-csi-driver"
            }
        },
        "controller-revision-hash": "6dc48fdb7f",
        "pod-template-generation": "2"
    },
    "message": "I0827 06:37:10.003083       1 main.go:53] Sending probe request to CSI driver \"ebs.csi.aws.com\"",
    "namespace": "kube-system",
    "pod": "ebs-csi-node-wtxnw",
    "pod_uid": "105dcb2d-b14f-4869-80da-ecae53d4f4cf",
    "source_type": "kubernetes_logs",
    "stream": "stderr",
    "timestamp": "2020-08-24T06:37:10.004636713Z"
}

Expected Behavior

The labels should have been:

{
    "file": "/var/log/pods/kube-system_ebs-csi-node-wtxnw_105dcb2d-b14f-4869-80da-ecae53d4f4cf/liveness-probe/0.log",
    "labels": {
        "app": "ebs-csi-node",
        "app.kubernetes.io/instance": "aws-ebs-csi",
        "app.kubernetes.io/name": "aws-ebs-csi-driver",
        "controller-revision-hash": "6dc48fdb7f",
        "pod-template-generation": "2"
    },
    ...
}

Actual Behavior

Dots in labels confused Vector, leading to missing one label & incorrect sub-objects.

Example Data

Original K8s yaml:

# kubectl get pod ebs-csi-node-wtxnw -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-07-09T09:23:18Z"
  generateName: ebs-csi-node-
  labels:
    app: ebs-csi-node
    app.kubernetes.io/instance: aws-ebs-csi
    app.kubernetes.io/name: aws-ebs-csi-driver
    controller-revision-hash: 6dc48fdb7f
    pod-template-generation: "2"
  name: ebs-csi-node-wtxnw
  namespace: kube-system
...

Additional Context

Note that I run vector with suitable env-vars/tokens-in-files to enable K8s access outside a pod to allow me to debug this at a console.

References

This sounds similar to my earlier report about similar behaviour from the JSON parser, see #2814

data model must kubernetes bug

All 8 comments

Indeed, thanks for reporting this! I'll address this shortly.

The fix just merged! Should be included in tomorrow's nightly.

We've added the annotation_fields.flat_labels config option to the kubernetes_logs source. Set it to true and the labels will come flat instead of nested.

Please reopen this issue if you still have trouble with this!

Thanks for the very quick fix - but I'd disagree with making the flat_label option default to the current behaviour. TBH I don't think you need an option at all - the current behaviour is just wrong.

K8s uses dots frequently (best practise is DNS scoped naming of labels), and those dots are not "object" hierarchy like toml files. They are just text strings.

... though I admit dots will then probably get completely screwed up by the usual destination for logs (ElasticSearch), which also has a convention of dots=hierarchy, which it then represents back as dots in the Kibana GUI for maximum confusion.

I currently handle this by converting dots to underscores in Lua. Other log processors have options to do this built-in (e.g. Replace_Dots in FluentBit/ElasticSearch).

That's a good point, we should really swap the defaults. This behavior is indeed incorrect.

I'm estimating the pros/cons of keeping the option. The reason why I made this configurable in the first place is that I guess there are situations where one'd want this behavior, so the decision was to allow user to choose what fits better. In a sense that, even if the behavior is not correct per se, but it might be practically useful.

That said, ideally, users should be able to convert flat keys to nested somehow in a separate transform if there's a need to do so, so there's no loss in covered scenarios if we just remove this field.

Yeah, ultimately it doesn't make sense to keep the switch.

@binarylogic , @Hoverbear you agree?

Agree. Let's default to flat. I don't think we need an option just yet.

It sounds like that will be least surprising to k8s users!

Thanks. While I haven't managed to test current master yet due to your nightlies not building for a a few days now, the first fix you did was included in the last nightly that built, and works for me.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

a-rodin picture a-rodin  路  3Comments

binarylogic picture binarylogic  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

raghu999 picture raghu999  路  3Comments