I use ElasticSearch as output module.
There is a large number of errors in ES. Such as "Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x5c", which cause a so long time for response from the ES.
I think it is a character encoding question. I want to know how to convey the character set information to utf-8 for rsyslog input or output.
Thanks!
Note: rsyslog documentation is available here:
pre-release,
current stable release
Have you tried using the mmutf8fix module already? This should fix the problems you are experiencing. You can read about it in the documentation.
@jgerhards Thanks for your help!
Yes, I have tried using the mmutf8fix module which resolved my problem temporarily. But it will replace the invalid utf8 content to white space.
I want to keep the original content. Is there a method?
Rsyslog is agnostic to the character encoding of the messages that it processes,
it treats them as a string of bytes and passes them through as-is
If you have a log source that isgenerating invalid UTF-8, rsyslog doesn't have
enough information to figure out how to fix it, because rsyslog has no way of
knowing what the actual encoding of the log message is to trnaslate it into
UTF-8. The only thing we can do is to detect that the string of bytes is not
valid UTF-8, and at that point we can only blank them out.
you could run it through an external script (mmexternal) to try and fix it up
yourself.
David Lang
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Rsyslog is agnostic to the character encoding of the messages that it processes,
it treats them as a string of bytes and passes them through as-is
If you have a log source that isgenerating invalid UTF-8, rsyslog doesn't have
enough information to figure out how to fix it, because rsyslog has no way of
knowing what the actual encoding of the log message is to trnaslate it into
UTF-8. The only thing we can do is to detect that the string of bytes is not
valid UTF-8, and at that point we can only blank them out.
you could run it through an external script (mmexternal) to try and fix it up
yourself.
David Lang