Fluent-bit: error initializing backend file(Cannot allocate memory)

Created on 18 Jan 2019  Â·  15Comments  Â·  Source: fluent/fluent-bit

Bug Report

Catch below error

[2019/01/18 11:28:15] [error] [storage] [cio chunk] error initializing backend file
[2019/01/18 11:28:15] [error] [input chunk] could not create chunk file
[2019/01/18 11:28:15] [error] [input chunk] no available chunk
[lib/chunkio/src/cio_file.c:254 errno=12] Cannot allocate memory
[2019/01/18 11:28:15] [error] [storage] cannot mmap file /var/log/fluent-bit//tail.1/145563-1547782095.129903982.flb

To Reproduce

  1. storage.type=filesystem
  2. run fluent-bit for a long time

Your Environment

  • Version used: v1.0.1
  • Configuration:

`

@INCLUDE input_*.conf
@SET DBFILE=/etc/fluent-bit/tailfile.db
[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    info
    HTTP_Server  Off
    storage.path /var/log/fluent-bit/
    storage.sync normal
    Parsers_File parser.conf

[OUTPUT]
    Name http
    Match *
    Host 127.0.0.1
    Port 56924
    URI /api/v1/logs
    Format json
    json_date_key time
    json_date_format iso8601
    Retry_Limit False

`

  • Server type and version:
  • Operating System and version: CentOS Linux release 7.4.1708 (Core)
  • Filters and plugins:

`

[FILTER]
    Name grep
    Match *
    Regex name .+

input files

[INPUT]
    Name tail
    storage.type filesystem
    DB ${DBFILE}
    Path /var/log/xx/xxx.log
    Parser parse1

[INPUT]
    Name tail
    storage.type filesystem
    DB ${DBFILE}
    Path /var/log/yyy/yyy.log
    Parser parser2

`

thanks a lot

bug fixed

All 15 comments

Are u able to reproduce the problem with v1.0.2?

I tested v1.0.2 and reproduced the problem.
error code: https://github.com/fluent/fluent-bit/blob/v1.0.1/lib/chunkio/src/cio_file.c#L254

In the log file most log lines are irrelevant, so I use grep filter to filter them. and below is /api/v1/metrics stat data, maybe it can help

`

{
"input": {
    "tail.0": {
      "records": 0,
      "bytes": 0
    },

    "tail.1": {
      "records": 10200,
      "bytes": 867850
    },

    "tail.2": {
      "records": 38790,
      "bytes": 5883359
    },

    "tcp.0": {
      "records": 0,
      "bytes": 0
    },

    "tail.3": {
      "records": 0,
      "bytes": 0
    },

    "storage_backlog.0": {
      "records": 0,
      "bytes": 0
    }

  },

  "filter": {
    "grep.0": {
      "drop_records": 48990,
      "add_records": 0
    }
 },

  "output": {
    "http.0": {
      "proc_records": 0,
      "proc_bytes": 0,
      "errors": 0,
      "retries": 0,
      "retries_failed": 0
    }
  }
}

`

Few questions come to my mind:

  • Are you running 32/64 bit system? In 32bit system size / count of mem mapped files is (more) limited.
  • You could look with lsof(1) what files are mmapped. If there is many many log-files
    concurrently mmapped.. that can be problem.

thanks bluebike

I run fluent-bit on a VM. and:

uname -m

x86_64

lsof |grep fluent |wc -l

129863

is it because my configuration wrong?

lsof -p PID (PID = process id ) ... you can get to open files + memory mappings of process.

hmm number of mmap(2)ed files can be the reason, there is no control over that now.

lsof -p PID (PID = process id ) ... you can get to open files + memory mappings of process.

@bluebike ,thanks;

ls -al /proc/$(pgrep fluent-bit)/fd |wc -l

68

cat /proc/sys/vm/max_map_count

65530

When storage.type=filesystem, fluent-bit always create and delete many new files, and wc -l | cat /proc/$(pgrep fluent-bit)/maps keep increasing to about 65530, and then error occured,

when error occured:

wc -l /proc/$(pgrep fluent-bit)/maps

65532

in /proc/$(pgrep fluent-bit)/maps most lines like below example:

`
7f2e519bf000-7f2e519fe000 rw-s 00001000 09:7f 2530610 /var/log/fluent-bit/tail.2/129997-1548061275.75669219.flb (deleted)

7f2e519fe000-7f2e51a3d000 rw-s 00001000 09:7f 2530609 /var/log/fluent-bit/tail.2/129997-1548061275.29716865.flb (deleted)

7f2e51a3d000-7f2e51a7c000 rw-s 00001000 09:7f 2530608 /var/log/fluent-bit/tail.2/129997-1548061275.14106181.flb (deleted)

`

It seems caused by reach the limit of max_map_count.

and any advice to avoid this problem, It reproduce every time when storage.type=filesystem

I'm also hit by this issue. Seems like the program doesn't free its maps and when it hits the kernel limit it doesn't crash, it just sits there unable to mmap new files, however it segfaults on shutdown.

I am troubleshooting this issue. Are you facing many retries when flushing the data?,

None. This bug is extremely reproducible on my setup: 1.0.2 deb package over 16.04 (no containers),

[SERVICE]
    Flush        1
    storage.path              /root/flb-storage/
    storage.sync              full
    storage.checksum          on
    storage.backlog.mem_limit 64M
[INPUT]
    Name cpu
    Interval_Sec 1
    storage.type  filesystem
[FILTER]
    Name modify
    Match *
    Condition Key_Does_Not_Exist fqdn
    Add fqdn ${HOSTNAME}
[OUTPUT]
    Name null
    Match *
[OUTPUT]
    Name forward
    Match *
    Retry_Limit off
    Host 1.2.3.4
    Port 24224

Hi,

Same here: maps count seems to keep increasing. However, the limit which it blocks at, has a strange value: 262146.

Let me guess, you run the program on the elastic host? Elastic sets this
parameter for itself.

On Mon, 28 Jan 2019 18:03 Yann Soubeyrand, notifications@github.com wrote:

Hi,

Same here: maps count seems to keep increasing. However, the limit which
it blocks at, has a strange value: 262146.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fluent/fluent-bit/issues/1037#issuecomment-458214881,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAetT3QpblGum6kKGWLH06INHx-y3Wobks5vHy1RgaJpZM4aHLPo
.

Let me guess, you run the program on the elastic host? Elastic sets this parameter for itself.

That was my first bet but no: this value is the same on all our nodes. ulimit -n, however, returns 65536 on all the nodes.

Let me guess, you run the program on the elastic host? Elastic sets this parameter for itself.

That was my first bet but no: this value is the same on all our nodes. ulimit -n, however, returns 65536 on all the nodes.

OK, it seems I confused max open files (result of ulimit -n) and max map count (result of sysctl vm.max_map_count) values. Max map count is set to 262144 on all our nodes (I don't know why, though).

thanks everyone for helping troubleshoot this issue.

The fix (https://github.com/fluent/fluent-bit/commit/19c24380855266c739b623e60eb756b872356331) is already in place in GIT Master and backported for v1.0.4 release.

Was this page helpful?
0 / 5 - 0 ratings