There seems to be a problem with the mapping of kubernetes.container.start_time. I find the following errors in my Elastic cluster logs every minute:
Elasticsearch cluster log error:
[2018-07-25T16:06:44,953][DEBUG][o.e.a.b.TransportShardBulkAction] [metricbeat-6.2.4-2018.07.25][0] failed to execute bulk item (index) BulkShardRequest [[metricbeat-6.2.4-2018.07.25][0]] containing [17] requests
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [kubernetes.container.start_time]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:302) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:485) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:607) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:407) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:384) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:482) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:500) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:394) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:384) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:482) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:500) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:394) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:384) ~[elasticsearch-6.2.4.jar:6.2.4]
at
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.IllegalArgumentException: Invalid format: ""
at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[joda-time-2.9.9.jar:2.9.9]
at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) ~[joda-time-2.9.9.jar:2.9.9]
at org.elasticsearch.index.mapper.DateFieldMapper$DateFieldType.parse(DateFieldMapper.java:248) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.DateFieldMapper.parseCreateField(DateFieldMapper.java:456) ~[elasticsearch-6.2.4.jar:6.2.4]
at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297) ~[elasticsearch-6.2.4.jar:6.2.4]
... 65 more
This is related to Kubernetes containers which are in error or didn't start yet...
Not sure what the solution for this problem is? Maybe Metricbeat should check if kubernetes.container.start_time is "" and enter a valid date if not?
Thank you for opening this report, can you please report your kubernetes version too? I understand this is using Metricbeat with the Kubernetes module, are you using the default reference manifest?
@exekias We are using Openshift v3.7.46 with Kubernetes v1.7.6+a08f5eeb62
We are not running Metricbeat as a daemonset, as we had issues with kdump crashing on nodes the moment we start the daemonset... The daemonset is triggering other strange behavior on Openshift nodes., for example it becomes impossible to use "du" and "df". Red Hat support advised to not use Beats daemonsets.
I'm running this metricbeat config on 1 master node:
- module: kubernetes
metricsets:
- node
- system
- pod
- container
- volume
period: 10s
hosts: ["https://osmaster01:10250","https://osmaster02:10250","https://osmaster03:10250","..."]
ssl.certificate_authorities: ["path-to-ca"]
ssl.certificate: "path-to-cert"
ssl.key: "path-to-key"
processors:
- add_kubernetes_metadata:
in_cluster: false
host: osmaster01
kube_config: /home/user/.kube/config
- module: kubernetes
enabled: true
metricsets:
- state_node
- state_deployment
- state_replicaset
- state_pod
- state_container
period: 10s
hosts: ["kube-state-metrics.monitoringpr.svc:8080"]
Is there anything I can change myself manually in the mapping to get rid of these errors? I could maybe update the metricbeat template and change
"start_time": {
"type": "date"
}
to
"start_time": {
"type": "keyword"
}
Or is there a better workaround?
As a possible workaround, I believe you can drop start_time if it is empty:
procesors:
- drop_fields:
fields: ["kubernetes.container.start_time"]
when.equals:
kubernetes.container.start_time: ""
@exekias Same issue for [kubernetes.pod.start_time]
Can you confirm the workaround fixes the issue? that would help with the bugfix
@exekias The workaround fixes the problem
Need to use the drop_fields two times like this:
processors:
- add_kubernetes_metadata:
in_cluster: false
host: os-host
kube_config: /home/svcaccount/.kube/config
- drop_fields:
fields: ["kubernetes.container.start_time"]
when.equals:
kubernetes.container.start_time: ""
- drop_fields:
fields: ["kubernetes.pod.start_time"]
when.equals:
kubernetes.pod.start_time: ""