Catch below error
[2019/01/18 11:28:15] [error] [storage] [cio chunk] error initializing backend file
[2019/01/18 11:28:15] [error] [input chunk] could not create chunk file
[2019/01/18 11:28:15] [error] [input chunk] no available chunk
[lib/chunkio/src/cio_file.c:254 errno=12] Cannot allocate memory
[2019/01/18 11:28:15] [error] [storage] cannot mmap file /var/log/fluent-bit//tail.1/145563-1547782095.129903982.flb
To Reproduce
Your Environment
`
@INCLUDE input_*.conf
@SET DBFILE=/etc/fluent-bit/tailfile.db
[SERVICE]
Flush 5
Daemon Off
Log_Level info
HTTP_Server Off
storage.path /var/log/fluent-bit/
storage.sync normal
Parsers_File parser.conf
[OUTPUT]
Name http
Match *
Host 127.0.0.1
Port 56924
URI /api/v1/logs
Format json
json_date_key time
json_date_format iso8601
Retry_Limit False
`
`
[FILTER]
Name grep
Match *
Regex name .+
input files
[INPUT]
Name tail
storage.type filesystem
DB ${DBFILE}
Path /var/log/xx/xxx.log
Parser parse1
[INPUT]
Name tail
storage.type filesystem
DB ${DBFILE}
Path /var/log/yyy/yyy.log
Parser parser2
`
thanks a lot
Are u able to reproduce the problem with v1.0.2?
I tested v1.0.2 and reproduced the problem.
error code: https://github.com/fluent/fluent-bit/blob/v1.0.1/lib/chunkio/src/cio_file.c#L254
In the log file most log lines are irrelevant, so I use grep filter to filter them. and below is /api/v1/metrics stat data, maybe it can help
`
{
"input": {
"tail.0": {
"records": 0,
"bytes": 0
},
"tail.1": {
"records": 10200,
"bytes": 867850
},
"tail.2": {
"records": 38790,
"bytes": 5883359
},
"tcp.0": {
"records": 0,
"bytes": 0
},
"tail.3": {
"records": 0,
"bytes": 0
},
"storage_backlog.0": {
"records": 0,
"bytes": 0
}
},
"filter": {
"grep.0": {
"drop_records": 48990,
"add_records": 0
}
},
"output": {
"http.0": {
"proc_records": 0,
"proc_bytes": 0,
"errors": 0,
"retries": 0,
"retries_failed": 0
}
}
}
`
Few questions come to my mind:
thanks bluebike
I run fluent-bit on a VM. and:
uname -m
x86_64
lsof |grep fluent |wc -l
129863
is it because my configuration wrong?
lsof -p PID (PID = process id ) ... you can get to open files + memory mappings of process.
hmm number of mmap(2)ed files can be the reason, there is no control over that now.
lsof -p PID (PID = process id ) ... you can get to open files + memory mappings of process.
@bluebike ,thanks;
ls -al /proc/$(pgrep fluent-bit)/fd |wc -l
68
cat /proc/sys/vm/max_map_count
65530
When storage.type=filesystem, fluent-bit always create and delete many new files, and wc -l | cat /proc/$(pgrep fluent-bit)/maps keep increasing to about 65530, and then error occured,
when error occured:
wc -l /proc/$(pgrep fluent-bit)/maps
65532
in /proc/$(pgrep fluent-bit)/maps most lines like below example:
`
7f2e519bf000-7f2e519fe000 rw-s 00001000 09:7f 2530610 /var/log/fluent-bit/tail.2/129997-1548061275.75669219.flb (deleted)
7f2e519fe000-7f2e51a3d000 rw-s 00001000 09:7f 2530609 /var/log/fluent-bit/tail.2/129997-1548061275.29716865.flb (deleted)
7f2e51a3d000-7f2e51a7c000 rw-s 00001000 09:7f 2530608 /var/log/fluent-bit/tail.2/129997-1548061275.14106181.flb (deleted)
`
It seems caused by reach the limit of max_map_count.
and any advice to avoid this problem, It reproduce every time when storage.type=filesystem
I'm also hit by this issue. Seems like the program doesn't free its maps and when it hits the kernel limit it doesn't crash, it just sits there unable to mmap new files, however it segfaults on shutdown.
I am troubleshooting this issue. Are you facing many retries when flushing the data?,
None. This bug is extremely reproducible on my setup: 1.0.2 deb package over 16.04 (no containers),
[SERVICE]
Flush 1
storage.path /root/flb-storage/
storage.sync full
storage.checksum on
storage.backlog.mem_limit 64M
[INPUT]
Name cpu
Interval_Sec 1
storage.type filesystem
[FILTER]
Name modify
Match *
Condition Key_Does_Not_Exist fqdn
Add fqdn ${HOSTNAME}
[OUTPUT]
Name null
Match *
[OUTPUT]
Name forward
Match *
Retry_Limit off
Host 1.2.3.4
Port 24224
Hi,
Same here: maps count seems to keep increasing. However, the limit which it blocks at, has a strange value: 262146.
Let me guess, you run the program on the elastic host? Elastic sets this
parameter for itself.
On Mon, 28 Jan 2019 18:03 Yann Soubeyrand, notifications@github.com wrote:
Hi,
Same here: maps count seems to keep increasing. However, the limit which
it blocks at, has a strange value: 262146.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fluent/fluent-bit/issues/1037#issuecomment-458214881,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAetT3QpblGum6kKGWLH06INHx-y3Wobks5vHy1RgaJpZM4aHLPo
.
Let me guess, you run the program on the elastic host? Elastic sets this parameter for itself.
That was my first bet but no: this value is the same on all our nodes. ulimit -n, however, returns 65536 on all the nodes.
Let me guess, you run the program on the elastic host? Elastic sets this parameter for itself.
That was my first bet but no: this value is the same on all our nodes. ulimit -n, however, returns 65536 on all the nodes.
OK, it seems I confused max open files (result of ulimit -n) and max map count (result of sysctl vm.max_map_count) values. Max map count is set to 262144 on all our nodes (I don't know why, though).
thanks everyone for helping troubleshoot this issue.
The fix (https://github.com/fluent/fluent-bit/commit/19c24380855266c739b623e60eb756b872356331) is already in place in GIT Master and backported for v1.0.4 release.