Hi,
I'm trying to remove the "@timestamp" attribute before indexing the data in ElasticSearch but when I do so I get this error:
NoMethodError: undefined method `to_iso8601' for nil:NilClass
This is my configuration:
input {
dynamodb{
endpoint => "dynamodb.us-east-1.amazonaws.com"
streams_endpoint => "streams.dynamodb.us-east-1.amazonaws.com"
view_type => "new_image"
table_name => "User"
perform_scan => "true"
log_format => "json_drop_binary"
number_of_scan_threads => "8"
number_of_write_threads => "8"
read_ops => "8"
}
}
filter {
json {
source => "message"
}
mutate {
remove_field => ["message", "host", "@version", "@timestamp"]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "idm"
document_id => "%{Username}"
document_type => "users"
}
stdout { }
}
Ideally I would want to remove all unnecessary fields like "@version" and disallow Logstash to add any generated/internal attributes that it needs.
Just put following in output
codec => line {
format => "%{message}"
}
Certain @fields are reserved fields and should not be removed. This is an enhancement to the remove_field code that should check if a field is reserved, ignore that field and log warn this.
@guyboertje This is true for _some_ plugins. I think there are some users who want to remove the @timestamp field and the @version field from certain outputs. Perhaps that should be a function of those specific outputs, though.
+1
I am also interested in omitting the @timestamp field, and perhaps other Logstash-generated fields, from events that Logstash outputs to Elasticsearch.
I am a member of the development team for a product that extracts data from proprietary binary-format logs, and then forwards that data to Logstash; for example, as JSON Lines over TCP.
Each log event that this product extracts—each line of JSON Lines that it forwards to Logstash—contains a field named time that is the event time stamp. If the original binary-format log contains multiple candidate fields for an event time stamp, the product chooses one to use as the value of the time field.
Currently, I use a Logstash config with a date filter to match the value of the Logstash-generated @timestamp field to the time field.
But then I end up with events (documents) in Elasticsearch that have both time and @timestamp fields, with effectively* identical values.
To avoid this duplication, I can use remove_field to remove the time field, but this starts to grate. My input events already contain a time stamp field named time. I’m happy with that field name. I don’t want to have to specify a date filter to “map” that field to the Logstash-specific @timestamp field. I don’t want to have to remove “my” time field to avoid duplication.
I could omit the date filter and let Logstash set @timestamp to the default value: the time that Logstash first sees the event. I can imagine that this might be useful to assist with debugging, in the case of problems with forwarding. Given the choice, though, I think I’d prefer to save the bytes and simply omit @timestamp, and have a “lean” Logstash config with only input and output sections; no filter section.
* The value of the @timestamp field generated by the date filter does not exactly match the original time field value. The time field value:
+hh:mm or -hh:mmwhereas @timestamp is in UTC, with a Z zone designator, and contains fractions of a second to only 3 decimal places. (I understand that Elasticsearch currently represents date fields as Epoch time values with millisecond precision.)
For various reasons (that I'm happy to discuss), we (the product development team) would prefer to preserve, in the ingested Elasticsearch document, the “difference-component” zone designator and precision of the original time field, even if these are only preserved in the original string field value of the source.
Unfortunately, the precedent has been set for Logstash (and Beats, for that matter) to use @timestamp as the canonical field. You have to hack an output plugin to remove the @timestamp field.
As an FYI, you don't need an extra mutate filter to remove the field after successful conversion in the date filter:
filter {
date {
match => [ 'time', '... ' ]
remove_field => 'time'
}
This works in most other filters as well.
the precedent has been set for Logstash ... to use @timestamp as the canonical field
Yes, that’s true, and that’s one reason why I’m grappling with this question. Because, in the context of Logstash, it leads me to set @timestamp to the value of “my” time field, and then remove time. Whereas, ideally, I’d prefer the time field from my product to pass through with its original name and value to the analytics platform—Elasticsearch is just one such platform—without being “forced” into using a different field name.
In practice, though, that might prove to be impractical, because there is no “cross-platform canon” in this regard.
Other platforms aside, even within the Elastic Stack, if I bypass Logstash and use the Elasticsearch bulk API, I don’t need to introduce @timestamp. That is, unless I want documents ingested via the bulk API to match the structure of documents ingested via Logstash.
Most helpful comment
Just put following in output
codec => line {
format => "%{message}"
}