Fluent-bit: Polling of prometheus metrics randomly causes SIGSEGV

Created on 19 Nov 2019 · 14Comments · Source: fluent/fluent-bit

Bug Report

Describe the bug
Fluent-bit randomly crashes with SIGSEGV and from the stacktrace it looks like it's the polling of prometheus metrics that causes it:

[engine] caught signal (SIGSEGV)
#0  0x7f719373053e      in  ???() at ???:0
#1  0x55a0f2c6eb29      in  string_cmp() at src/http_server/api/v1/metrics.c:142
#2  0x7f71936d5d20      in  ???() at ???:0
#3  0x7f71936d5a97      in  ???() at ???:0
#4  0x7f71936d5a81      in  ???() at ???:0
#5  0x7f71936d5a97      in  ???() at ???:0
#6  0x7f71936d5a97      in  ???() at ???:0
#7  0x7f71936d5fde      in  ???() at ???:0
#8  0x55a0f2c6f248      in  cb_metrics_prometheus() at src/http_server/api/v1/metrics.c:287
#9  0x55a0f2f50da5      in  thread_cb_init_vars() at lib/monkey/mk_server/mk_http_thread.c:68
#10 0x55a0f2f5bd86      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#11 0xffffffffffffffff  in  ???() at ???:0
2019-11-19 09:45:47,071 INFO exited: fluentbit (terminated by SIGABRT (core dumped); not expected)

Additional context
Fluent-bit version: 1.3.2
Running in kubernetes

bug fixed

Source

maglun2

👍1

Most helpful comment

Looks really good, no SIGSEGV at all since I deployed your patched version! Previous version got multiple SIGSEGV a day. I would say that your patch fixed the problem. Thanks you so much for looking into and fixing the problem @fujimotos! Should I make a comment in the PR?

maglun2 on 22 Nov 2019

🎉4 👍1

All 14 comments

Added valgrind to fluent-bit container. Found this output related to cb_metrics_prometheus:

==15== Conditional jump or move depends on uninitialised value(s)
==15==    at 0x27785D: metrics_get_latest (metrics.c:46)
==15==    by 0x277C51: cb_metrics_prometheus (metrics.c:196)
==15==    by 0x559DA5: thread_cb_init_vars (mk_http_thread.c:68)
==15==    by 0x564D86: co_init (amd64.c:117)
==15==  Uninitialised value was created by a stack allocation
==15==    at 0x563D83: co_swap_function (in /opt/td-agent-bit/bin/td-agent-bit)
==15==
==15== Use of uninitialised value of size 8
==15==    at 0x277881: metrics_get_latest (metrics.c:54)
==15==    by 0x277C51: cb_metrics_prometheus (metrics.c:196)
==15==    by 0x559DA5: thread_cb_init_vars (mk_http_thread.c:68)
==15==    by 0x564D86: co_init (amd64.c:117)
==15==  Uninitialised value was created by a stack allocation
==15==    at 0x563D83: co_swap_function (in /opt/td-agent-bit/bin/td-agent-bit)

maglun2 on 20 Nov 2019

👍1

@maglun2 I experimented with /api/v1/metrics/prometheus today and could
not reproduce your issue, nor could I find the root cause by reading the related
part of the code.

Can you post the configuration you're using?
Does the segfault always occur with the exact same stacktrace?

fujimotos on 20 Nov 2019

@maglun2 I also took a deep dive into the metrics module, I found a potential memory leak by inspection, but no obvious race conditions. It seems to me that metrics is using both TLS (thread local storage) metrics_list = pthread_getspecific(hs_metrics_key) and reference counting buf->users++ which seemed a curious on the face of it. My impression is batches of metrics arrive from a queue and the queue is flushed taking care to check the user count is zero. But it's not clear to me how multiple consumers are all getting copies of the data unless there is pub/sub going on on the queuing side. And if so, why also reference counting?

nigels-com on 20 Nov 2019

@fujimotos yes looks like it's always the same stacktrace. Here is our config:

  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
        Merge_JSON_Log Off

    @INCLUDE input-kubernetes.conf
    @INCLUDE input-tcp.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-forward.conf

  filter-kubernetes.conf: |
    # merge log must be off otherwise messages logged as json will bloat elasticsearch with lots of fields
    [FILTER]
        Name                kubernetes
        Match               kube_file.*
        Kube_Tag_Prefix     kube_file.var.log.containers.
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Labels              On
        Annotations         Off

    # we need merge log on for nginx ingress controller access logs to work
    [FILTER]
        Name                kubernetes
        Match               kube_file.*nginx-ingress-controller*
        Kube_Tag_Prefix     kube_file.var.log.containers.
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        Merge_Log_Trim      On
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Labels              On
        Annotations         Off

    # add cluster name
    [FILTER]
        Name   record_modifier
        Match  *
        Record cluster ${K8S_CLUSTER_NAME}

    # we dont know what records from tcp forward source will be tagged with since it is set by the sender
    # assume all records come from tcp forward source
    [FILTER]
        Name   record_modifier
        Match  *
        Record log_source fluentbit_tcp

    # and then fix those that we know come from other source(s)
    [FILTER]
        Name   record_modifier
        Match  kube_file.*
        Record log_source fluentbit_tail

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Flush             2
        Path              /var/log/containers/*.log
        #Exclude_Path      /var/log/containers/fluent*.log
        Tag               kube_file.*
        Parser            docker
        DB                /var/log/flb_kube.db
        Buffer_Chunk_Size 64k
        Buffer_Max_Size   256k
        Mem_Buf_Limit     100m
        Skip_Long_Lines   Off
        Refresh_Interval  10
        Merge_JSON_Log    Off
        Multiline         Off

  input-tcp.conf: |
    [INPUT]
        Name              forward
        Flush             2
        # pretty sure tag isn't supported for forward input
        Tag               kube_tcp.*
        Listen            0.0.0.0
        Log_Level         info
        Port              5500
        Buffer_Chunk_Size 64k
        Buffer_Max_Size   256k
        Mem_Buf_Limit     100m

  output-forward.conf: |
    [OUTPUT]
        Name                forward
        Match               *
        Host                ${FLUENT_HOST}
        Port                ${FLUENT_PORT}
        Retry_Limit         False

  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        Decode_Field_As   escaped    log

maglun2 on 20 Nov 2019

👍1

@nigels-com I appreciate you digging into the problem though I hope the question isn't directed at me 😄

maglun2 on 20 Nov 2019

@maglun2. Sometimes the process of putting things into words can help narrow or pinpoint a problem. But not this time.

nigels-com on 20 Nov 2019

👍1

Hm. I'm not entirely sure why this is happening.

#1  0x55a0f2c6eb29      in  string_cmp() at src/http_server/api/v1/metrics.c:142
...
#8  0x55a0f2c6f248      in  cb_metrics_prometheus() at src/http_server/api/v1/metrics.c:287

The stacktrace suggests that metrics_list somehow contains an invalid
string and it makes strcmp() run over the buffer ... but it's unlikely
since we use flb_sds_cat() that guarantees null termination.

The only possible route I can think of is flb_sds_cat() returning NULL
at metrics.c#L279, and this causes the SIGSEGV in qsort().

I posted #1762 that attempts to fix the issue based on this hypothesis.

@maglun2 Can you check out the version and see if it solves your problem?

fujimotos on 21 Nov 2019

Thanks @fujimotos will try it now

maglun2 on 21 Nov 2019

@fujimotos initial tests looks very promising. No sigsegv since I deployed your version. Will deploy to more test clusters and let it run during the day.

maglun2 on 21 Nov 2019

I realize now that your branch is based off master so there might be other changes affecting the behaviour I guess?

maglun2 on 21 Nov 2019

I realize now that your branch is based off master so there might be other changes affecting the behaviour I guess?

For metrics.c, we have no particular diff between v1.3.2 and master.
You can confirm it by running

$ git log v1.3.2..master http_server/api/v1/metrics.c

So if the patch fixes the issue, that's it.

fujimotos on 21 Nov 2019

👍1

maglun2 on 22 Nov 2019

🎉4 👍1

Looks really good, no SIGSEGV at all since I deployed your patched version! Previous version got multiple SIGSEGV a day. I would say that your patch fixed the problem.

Nice! I appreciate if you make a comment in the PR for reference.

Now we can just wait for @edsiper to merge that patch into the main line.

fujimotos on 22 Nov 2019

👍2

Fix merged into v1.3 branch also for the next v1.3.4 release (this week).