I consistently get mapping collisions like the following
[2018/10/19 20:36:51] [ warn] [out_es] Elasticsearch error
{"took":1,"errors":true,"items":[{"index":{"_index":"kubernetes_cluster-2018.10.18","_type":"flb_type","_id":"iOALjmYBkpImoLKjgi7j","status":400,"error":{"type":"mapper_parsing_exception","reason":"Could not dynamically add mapping for field [app.kubernetes.io/managed-by]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text]."}}},{"index":{"_index":"kubernetes_cluster-2018.10.18","_type":"flb_type","_id":"ieALjmYBkpImoLKjgi7j","status":400,"error":{"type":"mapper_parsing_exception","reason":"Could not dynamically add mapping for field [app.kubernetes.io/managed-by]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text]."}}},{"index":{"_index":"kubernetes_cluster-2018.10.18","_type":"flb_type","_id":"iuALjmYBkpImoLKjgi7j","status":400,"error":{"type":"mapper_parsing_exception","reason":"Could not dynamically add mapping for field [app.kubernetes.io/managed-by]. Existing mapping f
I would like to be able to namespace the parsed json logs by an kubernetes annotation.
Currently the json parser flattens the keys from the log like this
log {"name":"my-app","hostname":"app-cddcf4d88-hrffh","pid":26,"level":30,"msg":"Unable to log in user","time":"2018-10-21T10:44:19.971Z","v":0}
msg Unable to log in user
name my-app
...
But if we have the ability to specify a namespace as an annotation we could avoid indexing type collisions between apps.
E.g. with the following annotations
annotations:
fluentbit.io/parser: json
fluentbit.io/namespace: 'my-namespace'
I would then get an output like this:
log {"name":"my-app","hostname":"app-cddcf4d88-hrffh","pid":26,"level":30,"msg":"Unable to log in user","time":"2018-10-21T10:44:19.971Z","v":0}
my-namespace.msg Unable to log in user
my-namespace.name my-app
...
This would prevent mapping collisions and also allow me to create logical groupings based on e.g. the type of json logging library i'm using.
The most common case would be some pods with:
kubernetes.labels.app = [old format]
and some pods with:
kubernetes.labels.app.kubernetes.io/foo = bar [new format]
attached is a test case. Unpack it, and run:
fluent-bit -v -v -c tc/fluent-bit.conf
and you will see the error. In this case, my elastic server has previously observed a kubernetes.labels.app= and made that mapping 'text', and it now wants to insert an object.
the solution I believe is to create an elastic index template https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
but then I am not sure what will happen to the first insert (which is text, not object).
This error is also talked about in logstash, filebeat, and fluentd so its not unique here.
I created a new Elasticsearch index template with kubernetes.labels.app having a "type": "object".
Sub labels like kubernetes.labels.app.kubernetes.io/foo are indexed properly.
Old labels like kubernetes.labels.app = "my-app-name" are not indexed and ES Complains of collision.
So by creating the index_template the situation is reversed.
I can correct a number of manifests that are deployed to our clusters but it's not ideal as some are upstream (like prometheus, nginx, etc).
Perhaps there is some way to regex rename the field kubernetes.labels.app to kubernetes.labels.app.app (2 apps) with fluent-bit so at least the data would be in ES?
That or kubernetes.labels_app (underscore separation).
Dots in fields names are discouraged and it doesn't seem possible to overload a key (app) with multiple values(text/object).
I should add that another possibility would be to mutate or pre-process the field on ingestion with an Elastic pipeline but I imagine that would be very resource heavy for the ES cluster.
The elasticsearch template may solve the mapping collision but it doesn't necessarily solve
the problem that sometimes fields with the same name have the same meaning and sometimes they don't.
This is especially true with JSON logs where the app developers have total control. E.g. the key level in one app could refer to a log level but in another it may refer to floor level. So in this case namespacing makes sense.
But as @donbowman pointed out in the slack channel sometimes keys have different names but actually do refer to the same thing e.g. status-code, code, statusCode etc and you would wan't to normalize these keys so that you can create effective search queries/filters. Namespacing does not solve this issue.
For this to work we need to be able to rename keys and for it to work together with namespacing I think we also need the ability to mark specific keys as global.
So maybe we need to decentralize some parts of the configuration and bring it closer to the apps.
I don't know if it makes sense to decentralize an entire parser but maybe key renaming is enough.
CRDs could help here if annotations are to limiting.
Let's say we have somehow defined that all log entries for my-app should be namespaced.
Then we could do something like:
apiVersion: fluentbit.io/v1alpha1
kind: KeyMap
mappings:
statusCode:
rename: code
global: true
...
selector:
matchLabels:
app: my-app
Also to note, when this error occurs, the log message is not inserted to elastic. Worse, its added to a 'retry' queue in fluent-bit, so its quite expensive and achieving nothing. It retries many times.
It should probably not retry fatal errors, only queue overflow or no-connect.
If we set 'Replace_Dots On' in the elasticsearch output, the problem is 'resolved'.
This changes the '.' to a '_'.
Does anything have an issue with this? Its not the best (the label in elastic doesn't match the label in kubernetes), but i'm not sure what else one could do.
{
"@timestamp": "2018-10-22T14:31:59.979Z",
"kubernetes": {
"annotations": {
"artifact_spinnaker_io/location": "dev-test"
},
"container_name": "test-api-server",
"labels": {
"app_kubernetes_io/managed-by": "spinnaker"
},
"namespace_name": "dev-test",
"pod_id": "7d413442-d562-11e8-a780-b637f5d65632",
"pod_name": "test-api-server-7596b6dc8d-fwkkq"
},
"log": "message"
}
Thanks Don. Looks like a reasonable workaround.
I think combined with the ability to split logs by Kubernetes namespace per your recent PR then at least the effects of replacing dots is limited in scope.
Wouldn't really want to apply Replace_Dots_On for the whole cluster if only a few conflicting pods.
I was being too picky and optimistic wrt label clashes.
Have switched on cluster wide and it is simple. Thanks Don.
this problem is still exists.
Is there any workaround?
Most helpful comment
If we set 'Replace_Dots On' in the elasticsearch output, the problem is 'resolved'.
This changes the '.' to a '_'.
Does anything have an issue with this? Its not the best (the label in elastic doesn't match the label in kubernetes), but i'm not sure what else one could do.