Logstash: How to figure out what charset to use?

Created on 23 Mar 2015  Â·  14Comments  Â·  Source: elastic/logstash

Received an event that has a different character encoding than you configured. {:text=>"\u0000\u0005\u0000/\u00005xC0\u0012\u0000", :expected_charset=>"UTF-8", :level=>:warn}

If I receive this warning and I'm unable to get the text, how do I figure out what to set the charset to?

codec => plain {
               charset => "ISO-8859-1"
               }

}

All my machines are running ubuntu

enhancement

Most helpful comment

I met similar issue, I got the "Received an event that has a different character encoding than you configured" message from logstash when I sent logs from filebeat to logstash, the logstash config was like this,:
input {
tcp {
port => 5000
type => syslog
}
}
....other config....

I fixed this issue by config logstash as this:
input {
beats {
port => 5000
type => syslog
}
}
....other config....

Just replace "tcp" or "udp" with "beats".

All 14 comments

Hi any updates on this? I tried using different ports for the log stash forwarder but I always get this error -

Received an event that has a different character encoding than you configured. {:text=>"\u0016\u0003\u0001\u0000xA0\u0001\u0000\u0000x9C\u0003\u0003\u001CxD7UDB\u001CC\fxAEnxEBxE3xB0xF3xF0)xF5\u000FxE5ah,x9CNQ\sxD2xECx9BxCAxF6\u0000\u0000\u001AxC0/xC0+xC0\u0011xC0\axC0\u0013xC0\txC0\u0014xC0", :expected_charset=>"UTF-8", :level=>:warn}
Received an event that has a different character encoding than you configured. {:text=>"\u0000\u0005\u0000/\u00005xC0\u0012\u0000", :expected_charset=>"UTF-8", :level=>:warn}
2015-03-24T20:05:46.053+0000 54.162.225.244:35336 \u0016\u0003\u0001\u0000xA0\u0001\u0000\u0000x9C\u0003\u0003\u001CxD7UDB\u001CC\fxAEnxEBxE3xB0xF3xF0)xF5\u000FxE5ah,x9CNQ\sxD2xECx9BxCAxF6\u0000\u0000\u001AxC0/xC0+xC0\u0011xC0\axC0\u0013xC0\txC0\u0014xC0

And I'm not sure what charset to set the codec to.

It seems like your data isn't actually readable text. I don't recognize any of the byte sequences, and it's not valid ISO8859-1 (Latin1) nor UTF-8, for sure.

I see this very same behavior. Most of my logs make it through just fine, but occassionally I see this error in my logstash logs and in Kibana. Interestingly these messages come _only_ from ElasticSearch master nodes.

My pipeline is: ElasticSearch logs, monitored by rsyslog using imfile, sent over syslog/TCP to a remote logstash instance. Ubuntu 14.04 on AWS.

I don't see anything relevant in the ES logs on disk, nor in the syslog. Those messages certainly don't appear. I don't recognize the content either.

I'm not sure if this is correct in your case but if anyone is still dealing with this, this stack overflow post:

http://stackoverflow.com/questions/24490395/rsyslog-sending-badly-encoded-corrupted-data-via-tcp-receiving-using-logstas

The last answer down as of posting mentions it could be the ssl handshake, and you need to set ssl_enabled to true.

I will be closing this, I've looked at the stack overflow and it seems to fixed that specific issue.

I can provide a case that may have this problem

I use logstash-log4j-input to receive the log from log4j SocketAppender, but i didn't configure it correctly in logstash, when i change from tcp to log4j, everything is ok

tcp {
port => xxx
....
}

should change to

log4j {
port => xxx
....
}

I met similar issue, I got the "Received an event that has a different character encoding than you configured" message from logstash when I sent logs from filebeat to logstash, the logstash config was like this,:
input {
tcp {
port => 5000
type => syslog
}
}
....other config....

I fixed this issue by config logstash as this:
input {
beats {
port => 5000
type => syslog
}
}
....other config....

Just replace "tcp" or "udp" with "beats".

thanks perry0105 i had the same problem and it worked :+1:

Can I chime in here?

I'm trying to collect BRO data by sending the logs to Logstash using Filebeat.

The bro logs are in plain/us-ascii format.

I've told Filebeat that the logs are plain text.

Originally Logstash complained that it was expecting UTF-8, so i added this stanza to its inputs file:

codec => plain {
   charset => "US-ASCII"
}

I am no longer getting character set errors reported to me by logstash, but when I query the data in Elasticsearch, it's still all garbled, in strings like:

��N��y?�����}:�����������/��kR�� ����������Fc�

or:

KHE�?�������\�

I've been pulling my hair out for longer than I'd care to admit, and i don't seem to be making much progress, so any tips or pointers would be appreciated! Thanks

thanks perry0105 I also had the same issue and it worked

@sincoew is db retaining the files? TCP does handshaking, so if you switch to UDP, you are going to throw data at Elastic which might not stick. If it doesnt stick, it will just be ignored. So Im thinking that you are creating a lossy system. Is that correct?

i have many files, and those files encode in many encding: utf-8,us-ascii,unknown-8bit and so on
how should i work with it?

This is a issue from 2015 that still there, quite hard to setup logstash to read a simple log with charset=us-ascii to the day and there is not a single answer here or anywhere to point it to the right direction. Would be super nice if someone that actually works in the project could put a light on this shade. Thank you.

@thomasmodeneis the trouble here is that without knowing the encoding of a document, it's just an ordered sequence of zeroes and ones; it is the encoding, and _only_ the encoding, that gives the sequence semantic meaning.

The internal representation of an event's field in Logstash is UTF-8, a _superset_ of US-ASCII (this means that all valid US-ASCII -- which uses only the lower 7 bits of each byte -- is also by definition valid UTF-8); to process files of other encodings, the input and/or codec needs to be explicitly configured with the name of the encoding for the bytestream it receives.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

amodakvnera picture amodakvnera  Â·  3Comments

scheung38 picture scheung38  Â·  5Comments

bertramn picture bertramn  Â·  3Comments

dvic picture dvic  Â·  3Comments

max-wittig picture max-wittig  Â·  4Comments