Beats: Allow to overwrite @timestamp with different format

Created on 15 Mar 2019  路  14Comments  路  Source: elastic/beats

For confirmed bugs, please report:

using filebeat to parse log lines like this one:

{"host":"s3-ssl-conn-0.localdomain","service":"sfused","instance":"unconfigured","pid":31737,"trace_type":"op","trace_id":7257574788016540,"span_id":1918366419434151,"parent_span_id":4632228147107467,"@timestamp":"2019-03-15T19:41:07.282853+0000","start_time":1552678867282.853,"end_time":1552678867283.062,"duration_ms":0.208984,"op":"service","layer":"workers_arc_sub","error":false,"cancelled":false,"tid":32534}

returns error as you can see in the following filebeat log:

2019-03-15T19:41:11.564Z        ERROR   jsontransform/jsonhelper.go:53  JSON: Won't overwrite @timestamp because of parsing error: parsing time "2019-03-15T19:41:07.175876+0000" as "2006-01-02T15:04:05Z07:00": cannot parse "+0000" as "Z07:00"

I use a template file where I define that the @timestamp field is a date:

{
    "mappings": {
        "doc": {
            "properties": {
                "layer": {
                    "type": "keyword"
                }, 
                "ip_addr": {
                    "type": "ip"
                }, 
                "string": {
                    "type": "text"
                }, 
                "service": {
                    "type": "keyword"
                }, 
                "@timestamp": {
                    "type": "date"
                }, 
                "parent_span_id": {
                    "index": "false", 
                    "type": "long"
                }, 
                "trace_type": {
                    "type": "keyword"
                }, 
                "trace_id": {
                    "type": "long"
                }, 
                "label": {
                    "type": "keyword"
                }, 
                "ip_port": {
                    "type": "long"
                }, 
                "instance": {
                    "type": "keyword"
                }, 
                "host": {
                    "type": "keyword"
                }, 
                "num": {
                    "type": "keyword"
                }, 
                "end_time": {
                    "type": "double"
                }, 
                "key": {
                    "type": "keyword"
                }, 
                "error": {
                    "type": "boolean"
                }, 
                "cancelled": {
                    "type": "boolean"
                }, 
                "path": {
                    "type": "text"
                }, 
                "span_id": {
                    "index": "false", 
                    "type": "long"
                }, 
                "start_time": {
                    "type": "double"
                }, 
                "op": {
                    "type": "keyword"
                }
            }
        }
    }, 
    "template": "app-traces-*", 
    "settings": {
        "index.refresh_interval": "30s"
    }
}

All 14 comments

I would think using format for the date field should solve this? https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

Closing this for now as I don't think it's a bug in Beats.

It does not work as it seems not possible to overwrite the date format.
see https://discuss.elastic.co/t/cannot-change-date-format-on-timestamp/172638

I now see that you try to overwrite the existing timestamp. We should probably rename this issue to "Allow to overwrite @timestamp with different format" or something similar.

As a work around, is it possible that you name it differently in your json log file and then use an ingest pipeline to remove the original timestamp (we often call it event.created) and move your timestamp to @timestamp. That is what we do in quite a few modules.

Hello,

Unfortunately no, it is not possible to change the code of the distributed sytem which populate the log files.
and it is even not possible to change the tools which use the elasticsearch datas as I do not control them (so renaming is not possible).
the log harvester has to grab the log lines and send it in the desired format to elasticsearch.
(I have the same problem with a "host" field in the log lines.
it is a regression as it worked very well in filebeat 5.x but I understand that the issue comes from elasticsearch and the mapping types.

right now, I am looking to write my own log parser and send datas directly to elasticsearch (I don't want to use logstash for numerous reasons) so I have one request,
could you write somewhere in the documentation the reserved field names we cannot overwrite (like @timestamp format, host field, etc..)?
It could save a lot of time to people trying to do something not possible.

additionally, pipelining ingestion is too ressource consuming,
I've too much datas and the time processing introduces too much latency for the treatment of the millions of log lines the application produces.

With 7.0 we are switching to ECS, this should mostly solve the problem around conflicts: https://github.com/elastic/ecs Unfortunately there will always a chance for conflicts. If you use foo today and we will start using foo.bar in the future, there will be a conflict for you.

What I don't fully understand is if you can deploy your own log shipper to a machine, why can't you change the filebeat config there to use rename?

I'm curious to hear more on why using simple pipelines is too resource consuming. Did you run some comparisons here?

I have the same problem.
I feel elasticers have a little arrogance on the problem.

We have added a timestamp processor that could help with this issue. You can tell it what field to parse as a date and it will set the @timestamp value.

It doesn't directly help when you're parsing JSON containing @timestamp with Filebeat and trying to write the resulting field into the root of the document. But you could work-around that by not writing into the root of the document, apply the timestamp processor, and the moving some fields around.

This is caused by the fact that the "time" package that beats is using [1] to parse @timestamp from JSON doesn't honor the RFC3339 spec [2], (specifically the part that says that both "+dd:dd" AND "+dddd" are valid timezones)
So some timestamps that follow RFC3339 (like the one above) will cause a parse failure when parsed with:
ts, err := time.Parse(time.RFC3339, vstr)

[1] https://github.com/elastic/beats/blob/0ea1def3c688da2cfbba8571617ed3460f6cba9c/libbeat/common/jsontransform/jsonhelper.go#L53
[2] https://github.com/golang/go/issues/31113

This is caused by the fact that the "time" package that beats is using [1] to parse @timestamp from JSON doesn't honor the RFC3339 spec [2], (specifically the part that says that both "+dd:dd" AND "+dddd" are valid timezones)
So some timestamps that follow RFC3339 (like the one above) will cause a parse failure when parsed with:
ts, err := time.Parse(time.RFC3339, vstr)

[1]

https://github.com/elastic/beats/blob/0ea1def3c688da2cfbba8571617ed3460f6cba9c/libbeat/common/jsontransform/jsonhelper.go#L53

[2] [golang/go#31113](https://github.com/golang/go/issues/31113)

Seems like I read the RFC3339 spec to hastily and the part where ":" is optional was from the Appendix that describes ISO8601.

We have added a timestamp processor that could help with this issue. You can tell it what field to parse as a date and it will set the @timestamp value.

It doesn't directly help when you're parsing JSON containing @timestamp with Filebeat and trying to write the resulting field into the root of the document. But you could work-around that by not writing into the root of the document, apply the timestamp processor, and the moving some fields around.

Could be possible to have an hint about how to do that? Seems like Filebeat prevent "@timestamp" field renaming if used with json.keys_under_root: true.

In my company we would like to switch from logstash to filebeat and already have tons of logs with a custom timestamp that Logstash manages without complaying about the timestamp, the same format that causes troubles in Filebeat.

You can disable JSON decoding in filebeat and do it in the next stage (logstash or elasticsearch ingest processors).

You can disable JSON decoding in filebeat and do it in the next stage (logstash or elasticsearch ingest processors).

Seems like a bit odd to have a poweful tool like Filebeat and discover it cannot replace the timestamp. I mean: storing the timestamp itself in the log row is the simplest solution to ensure the event keep it's consistency even if my filebeat suddenly stops or elastic is unreachable; plus, using a JSON string as log row is one of the most common pattern today.
I wonder why no one in Elastic took care of it.

Using an ingest urges me to learn and add another layer to my elastic stack, and imho is a ridiculous tradeoff only to accomplish a simple task.

For now, I just forked the beats source code to parse my custom format.

Was this page helpful?
0 / 5 - 0 ratings