I have run fluent-bit for k8s, but after run logrotate, in_tail is not watch log file, which has been rotated.
My fluentbit config:
[SERVICE]
Flush 5
Daemon Off
Log_Level debug
Parsers_File parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /log/*_default_*.log
DB /log/kube.db
Mem_Buf_Limit 5MB
[OUTPUT]
Name stdout
Match *
My logrotate config
/var/lib/docker/containers/*/*.log {
daily
rotate 1
copytruncate
compress
delaycompress
missingok
notifempty
}
I thinks something was wrong after logs file has changed outside container, how I reproduce: I run a fluent-bit containers in docker, mount volume [current_folder]:/log
# Run this command inside fluent-bit container
$ echo '{"log":"hello world","stream":"stdout","time":"2017-08-14T07:16:44.331313977Z"}' >> message_default_inside.log
# This work!
# Run this command outside of fluent-bit container
$ echo '{"log":"hello world","stream":"stdout","time":"2017-08-14T07:16:44.331313977Z"}' >> message_default_outside.log
# This not work!
Trying start, restart sometime.
note that when a third-party tool rotate a file Fluent Bit catch this event (which is a file rename), and what it does is to keep monitoring the rotated file for the next 5 seconds (Rotate_Wait option), after that is not longer monitored.
Now when a file is rotated, likely the original application that create the logs will re-create the file (same name), but in order to let Fluent Bit catch that file creation it needs to re-scan the path, this operation is handled by the Refresh_Interval option, by default it re-scan every 60 seconds, I suggest to keep this value low as 5 seconds.
If the issue mentioned do not address the problem explained above, please provide detailed steps to try to reproduce the problem.
@duythinht is there any pending question/issue on your side ?
I'm still troubleshoot this issue. Sometime tail keep working, sometime it's not working (after logrotate running)
I'm also with same issue.
When my app rotates the file fluent-bit container provides this error plugins/in_tail/tail_file.c:688 errno=2
and the log stop being monitored and fluent-bit container gets frozen.
Trying today to change the refresh-interval as @edsiper mentioned and then i will provide feedback.
Changed the refresh-interval didn't helped.. when file rotated fluent-bit didn't monitored it anymore, needed to restart the fluent container.
@hdiass 0.12.7 has been released, please upgrade to that version and let us know if the issue persists.
Hello @edsiper, i upgraded fluent-bit but even though same issue, when file rotates its read anymore by fluent-bit and stays in loop trying to read the file. i've turned on the debug log level to post here the behaviour, if it helps.
[2017/11/06 22:03:07] [debug] [task] destroy task=0x7fca0023c0e0 (task_id=0)
[2017/11/06 22:03:07] [debug] [dyntag tail.0] 0x7fca0028b120 destroy (tag=tail.0)
[2017/11/06 22:03:34] [debug] [in_tail] rotated: /some/directory/file.log -> /some/directory/file.log
[2017/11/06 22:03:34] [debug] [in_tail] removed /some/directory/file.log
[2017/11/06 22:03:36] [debug] [in_tail] append new file: /some/directory/file.log
[2017/11/06 22:03:36] [debug] [in_tail] add to scan queue /some/directory/file.log, offset=10487070
[2017/11/06 22:03:36] [debug] [in_tail] file=/some/directory/file.log promote to TAIL_EVENT
[2017/11/06 22:03:36] [debug] [in_tail] file=/some/directory/file.log cannot promote, unregistering
[2017/11/06 22:03:41] [debug] [in_tail] append new file: /some/directory/file.log
[2017/11/06 22:03:41] [debug] [in_tail] add to scan queue /some/directory/file.log, offset=10487070
[2017/11/06 22:03:41] [debug] [in_tail] file=/some/directory/file.log promote to TAIL_EVENT
[2017/11/06 22:03:41] [debug] [in_tail] file=/some/directory/file.log cannot promote, unregistering
[2017/11/06 22:03:46] [debug] [in_tail] append new file: /some/directory/file.log
[2017/11/06 22:03:46] [debug] [in_tail] add to scan queue /some/directory/file.log, offset=10487070
[2017/11/06 22:03:46] [debug] [in_tail] file=/some/directory/file.log promote to TAIL_EVENT
[2017/11/06 22:03:46] [debug] [in_tail] file=/some/directory/file.log cannot promote, unregistering
@hdiass what kind of rotation mode are you using, copytruncate ?
[2017/11/06 22:03:34] [debug] [in_tail] rotated: /some/directory/file.log -> /some/directory/file.log
why the rotated file have the same name ?
[2017/11/06 22:03:36] [debug] [in_tail] file=/some/directory/file.log promote to TAIL_EVENT
[2017/11/06 22:03:36] [debug] [in_tail] file=/some/directory/file.log cannot promote, unregistering
that means that a file was promoted for inotify but then it failed, mostly because it was deleted.
Looks like your file are being rotated faster than the refresh_interval, please set a refresh_interval of 5 seconds
@edsiper, the application that i want to monitor handles the log file itself, not using logrotate from the system. What the app does for what i can see is create a "backup" file with the old log file and recreates a new log file with the same name. Ok i'll set the refresh interval for that value and test again,
Thanks a lot,
@edsiper I was checking and i already had refresh interval option set on 5, so that will not help
you can find the the config file i'm using below.
[SERVICE]
Flush 1
Daemon off
Log_Level debug
#Log_File /var/log/fluentbit.log
Parsers_File parsers.conf
[INPUT]
Name tail
Path /some/directory/file.log
Parser json-test
DB /var/log/flb_kube.db
Mem_Buf_Limit 10MB
Buffer_Chunk_Size 8k
Buffer_Max_Size 32k
Refresh_Interval 5
[OUTPUT]
Name es
Match *
Host some.elastic.search.com
Port 1111
Logstash_Format On
Retry_Limit False
Logstash_Prefix some-index-logs
Index some-index-logs
Time_Key _@timestamp
also maybe good for you to know, the timestamp between old file last log is really like miliseconds difference from the first timestamp on the new log file.
old log file last line time stamp : "@timestamp":"2017-11-06T22:03:06.198+00:00"
newly created log file first line: "@timestamp":"2017-11-06T22:03:34.274+00:00"
If you can somehow tell me what is the best config here to fluent-bit correcty follow the log after the rotation
I challenge the similar behaviour. We discovered it's related to logrotate "copytruncate" option.
copytruncate
Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one. It can be used when some program cannot be told to
close its logfile and thus might continue writing (appending) to the previous log file forever. Note that there is a very small time slice between copying the file and truncating it,
so some logging data might be lost. When this option is used, the create option will have no effect, as the old log file stays in place.
Fluent bit should recognize number of lines in file, and if that is < then the previous value, it should re-read the file from scratch + reset it's position (whatever to get un-blocked).
Can confirm the issue using Fluent-Bit v0.12.13. sqlite3 db keeps the counter even when the log file itself was logrotated ans reset to 0 bytes.
Only workaround I was able to come up with is not to use the DB option. 馃槖
Copytruncate mode is dangerous and should be avoided in this scenario, in general it leads to data loss.
Counting the number of lines is not a solution since that will mean: for every read(2) go to the beginning of the file and count the number of line breaks (\n). The other solution would be to check for the file size on every read using stat(2), again ..it will be performance killer and a constant pain. It's times better to use a different log rotation mode than copytruncate.
Note that also copytruncate is done by a third party tool, so there is high chances that truncation is done when the application is writing data to the file, there is no "sync". A workaround would be to let Docker handle rotation. There is relevant discussion on this topic on Kubernetes repo:
We're using fluent-bit outside of kubernetes/docker. Usually "logrotate" is responsible for logrotation (Debian/Ubuntu). In some cases we're still using "remote_syslog2" which claims to handle this scenario https://github.com/papertrail/remote_syslog2#log-rotation-and-the-behavior-of-remote_syslog - maybe an inspiration?
@rmoriz thanks for the hints.
Actually the papertrail client does specifically the workaround mentioned above: "stat(2) the file when some 'write' operation was done":
Note that the workaround will __only__ work if the tool that generated the original log file did not open the file using O_APPEND mode.
I will try an enhancement
@rmoriz @epcim
I pushed some improvements on GIT master to handle file truncation. Would you please re-build and test ?
Confirm 0.13 Dev, tested for a while and seems it really works with logrotate and the above options. No freezes yet.
Deployed + tested one week. looks good so far.
thanks everyone for helping on this issue.
Closing as Fixed.
Awesome @edsiper 馃弳
Will this be released in the 0.12.x line? It would be very helpful!
Released :)
@edsiper Awesome, thanks! 馃挴
ping @edsiper
The official documentation here https://fluentbit.io/documentation/0.13/input/tail.html states:
Rotation with truncation (e.g. logrotate's copytruncate mode) is not supported.
Is the documentation outdated or is there still an issue with logrotate and copytruncate?
Documentation needs to be updated, in the other side the note the following requirement:
Note that the workaround will only work if the tool that generated the original log file did not open the file using O_APPEND mode.
@edsiper FYI the documentation (even for 1.0: https://docs.fluentbit.io/manual/input/tail) still mentions "Rotation with truncation (e.g. logrotate's copytruncate mode) is not supported."
Also, regarding your remark that it "will only work if the tool that generated the original log file did not open the file using O_APPEND mode": does that mean we can expect logs rotated through logrotate's copytruncate to work or not?
Updating the docs now, thanks for catching that.
Most helpful comment
I challenge the similar behaviour. We discovered it's related to logrotate "copytruncate" option.
copytruncate
Fluent bit should recognize number of lines in file, and if that is < then the previous value, it should re-read the file from scratch + reset it's position (whatever to get un-blocked).