Fluent-bit: incorrect datetime to unix timestamp conversion

Created on 27 Jan 2018  路  10Comments  路  Source: fluent/fluent-bit

Hello,
_make the long story short_: fluent-bit returns incorrect timestamp from format %H:%M:%S datetime. The timestamp's time is correct. But for timestamp's date I got 28 days difference.
.........
I use iotop to monitor I/O status of processes. Here is a log sample:
16:53:54 674 be/4 nginx 508.84 K/s 35125.77 K/s 0.00 % 0.00 % nginx: worker process
I also wrote a regex to parse this type of log. Here is my template from parsers.conf:

[PARSER]
    Name    iotop
    Format  regex
    Regex   ^(?<time>[^\ ]*)\ *(?<PID>[\d]*)\ *(?<PRIO>[^\ ]*)\ (?<USER>[^\ ]*)[^\d]*(?<DISKREAD>[\d]*\.[\d]*)[^\d]*(?<DISKWRITE>[\d]*\.[\d]*)[^\d]*(?<SWAPIN>[\d]*\.[\d]*)[^\d]*(?<IO>[\d]*\.[\d]*)[\ \%]*(?<COMMAND>.*)$
    Time_Format %H:%M:%S
    Time_Key time
    Time_Keep On
    types PID:integer DISKREAD:float DISKWRITE:float SWAPIN:float IO:float

Everything seems OK, as you see, the parsed log would be shown like this:

iotop: [1514739234.000000000, {"time"=>"16:53:54", "PID"=>674, "PRIO"=>"be/4", "USER"=>"nginx", "DISKREAD"=>508.840000, "DISKWRITE"=>35125.770000, "SWAPIN"=>0.000000, "IO"=>0.000000, "COMMAND"=>"nginx: worker process"}]
But let's check the Unix timestamp on the parsed log: 1514739234. It equals to 12/31/2017 16:53:54.
I checked the system date. It is 01/27/2018. There is a 4 week bias!
Hence If we forward such log to fluentd node, we will get the wrong date, too.
.........
I checked this on both CentOS 6 (compiled fluent-bit) and Debian 8 (td-agent-bit from repository) and I got the same result.

bug fixed

Most helpful comment

Note: #1960 is the minimal fix that solves the original issue. It should be sufficient
as long as this issue is concerned.

Some comments on the previous posts:

  • As to @tantra35's observation, I think it's basically correct. We don't need the
    current complication, and we can drastically simplify the time handling without
    losing anything. I'll do this after v1.4.0 released
  • As to @nigels-com's comment, I concur with him. It would be better if we can
    handle the midnight robustly. My rough idea is to add a logic to rollback the
    timestamp to yesterday if the timestamp is a future date; I think this issue would
    deserve an another separate ticket.

All 10 comments

UPDATE
fluentbit converts any datetime format that are without date to 31 Dec 2017.
i.e. by using Time_Format %H:%M:%S it converts 16:07:21 to 2017-12-31 16:07:21.
It doesn't matter which date we are in.

thanks for reporting this problem. You are right, actually the parser only populate internally the Year if is missing but not the day and month.

which application generate this log type ?

I have the same issue with Fluent Bit 1.0.4.
My log has this format:
12:14:03 (lmgrd) Server's System Date and Time: Tue Feb 05 2019 12:14:03 UTC
I use this time format in parser: time_format %k:%M:%S
And I got this timestamp in parsed log: 1546258443.000000 (Which is Monday, 31 December 2018 12:14:03)

@edsiper why parsing of date/time string made so hard why not possible simple make gmtime_r(&time_now, tm); to fill incomplete data? This little demo demonstrate approach:

#define _XOPEN_SOURCE
#include <time.h>
#include <stdio.h>

int main(int argc, char* arvg[])
{
    struct tm tmy;

    time_t  time_now = time(NULL);
    gmtime_r(&time_now, &tmy);

    strptime("2019-02-20 23:55:43", "%H:%M:%S", &tmy);

    printf("int tm_sec: %d\n", tmy.tm_sec);
    printf("int tm_min: %d\n", tmy.tm_min);
    printf("int tm_hour: %d\n", tmy.tm_hour);
    printf("int tm_mday: %d\n", tmy.tm_mday);
    printf("int tm_mon: %d\n", tmy.tm_mon);
    printf("int tm_year: %d\n", tmy.tm_year + 1900);
    printf("int tm_wday: %d\n", tmy.tm_wday);
    printf("int tm_yday: %d\n", tmy.tm_yday);
    printf("int tm_isdst: %d\n", tmy.tm_isdst);
}

so there will be not any problems as described in this issue, and code can be much simpler:

 src/flb_parser.c | 74 +++++++-------------------------------------------------
 1 file changed, 9 insertions(+), 65 deletions(-)

diff --git a/src/flb_parser.c b/src/flb_parser.c
index 31e53d50..a95afe2f 100644
--- a/src/flb_parser.c
+++ b/src/flb_parser.c
@@ -182,29 +182,12 @@ struct flb_parser *flb_parser_create(char *name, char *format,
         p->time_fmt = flb_strdup(time_fmt);

         /* Check if the format is considering the year */
-        if (strstr(p->time_fmt, "%Y") || strstr(p->time_fmt, "%y")) {
+        if ((strstr(p->time_fmt, "%Y") || strstr(p->time_fmt, "%y")) &&
+           strstr(p->time_fmt, "%m") &&
+           strstr(p->time_fmt, "%d"))
+        {
             p->time_with_year = FLB_TRUE;
         }
-        else {
-            size = strlen(p->time_fmt);
-            p->time_with_year = FLB_FALSE;
-            p->time_fmt_year = flb_malloc(size + 4);
-            if (!p->time_fmt_year) {
-                flb_errno();
-                flb_parser_destroy(p);
-                return NULL;
-            }
-
-            /* Append the year at the beginning */
-            tmp = p->time_fmt_year;
-            *tmp++ = '%';
-            *tmp++ = 'Y';
-            *tmp++ = ' ';
-
-            memcpy(tmp, p->time_fmt, size);
-            tmp += size;
-            *tmp++ = '\0';
-        }

         /* Check if the format contains a timezone (%z) */
         if (strstr(p->time_fmt, "%z") || strstr(p->time_fmt, "%Z") ||
@@ -231,32 +214,14 @@ struct flb_parser *flb_parser_create(char *name, char *format,
          * - http://stackoverflow.com/questions/7114690/how-to-parse-syslog-timestamp
          * - http://code.activestate.com/lists/python-list/521885/
          */
-        if (p->time_with_year == FLB_TRUE) {
-            tmp = strstr(p->time_fmt, "%S.%L");
-        }
-        else {
-            tmp = strstr(p->time_fmt_year, "%s.%L");
-
-            if (tmp == NULL) {
-                tmp = strstr(p->time_fmt_year, "%S.%L");
-            }
-        }
+        tmp = strstr(p->time_fmt, "%S.%L");
         if (tmp) {
             tmp[2] = '\0';
             p->time_frac_secs = (tmp + 5);
         }
         else {
             /* same as above but with comma seperator */
-            if (p->time_with_year == FLB_TRUE) {
-                tmp = strstr(p->time_fmt, "%S,%L");
-            }
-            else {
-                tmp = strstr(p->time_fmt_year, "%s,%L");
-
-                if (tmp == NULL) {
-                    tmp = strstr(p->time_fmt_year, "%S,%L");
-                }
-            }
+            tmp = strstr(p->time_fmt, "%S,%L");
             if (tmp) {
                 tmp[2] = '\0';
                 p->time_frac_secs = (tmp + 5);
@@ -673,7 +638,6 @@ int flb_parser_time_lookup(char *time_str, size_t tsize,
     char *time_ptr = time_str;
     char tmp[64];
     char fs_tmp[32];
-    struct tm tmy;

     *ns = 0;

@@ -688,11 +652,6 @@ int flb_parser_time_lookup(char *time_str, size_t tsize,
      * to be parsed.
      */
     if (parser->time_with_year == FLB_FALSE) {
-        /* Given time string is too long */
-        if (time_len + 6 >= sizeof(tmp)) {
-            return -1;
-        }
-
         /*
          * This is not the most elegant way but for now it let
          * get the work done.
@@ -704,26 +663,11 @@ int flb_parser_time_lookup(char *time_str, size_t tsize,
             time_now = now;
         }

-        gmtime_r(&time_now, &tmy);
-        uint64_t t = tmy.tm_year + 1900;
-
-        fmt = tmp;
-        u64_to_str(t, fmt);
-        fmt += 4;
-        *fmt++ = ' ';
-
-        memcpy(fmt, time_ptr, time_len);
-        fmt += time_len;
-        *fmt++ = '\0';
-
-        time_ptr = tmp;
-        time_len = strlen(tmp);
-        p = strptime(time_ptr, parser->time_fmt_year, tm);
-    }
-    else {
-        p = strptime(time_ptr, parser->time_fmt, tm);
+        gmtime_r(&time_now, tm);
     }

+    p = strptime(time_ptr, parser->time_fmt, tm);
+
     if (p != NULL) {
         /* Check if we have fractional seconds */
         if (parser->time_frac_secs && (*p == '.' || *p == ',')) {

@tantra35 sorry for the very long reply.

@fujimotos can you take a look at this please?

@edsiper Sure, I will look after this issue.

Perhaps worth pointing out here an important note about the strptime function, from the manpage:

In principle, this function does not initialize tm but stores only the values specified. This means that tm should be initialized before the call. Details differ a bit between different UNIX systems. The glibc implementation does not touch those fields which are not explicitly specified, except that it recomputes the tm_wday and tm_yday field if any of the year, month, or day elements changed.

On the other hand it doesn't seem robust to infer current year, month, day or hour as _now_ considering that the year, month, day or hour may have rolled-over since the time of the partial timestamp.

For example 11:59pm will possibly be interpreted as the next day, depending on how long it takes to reach fluent-bit.

@edsiper I have posted #1960 that fixes the issue reported by @eXtrem0us.

With that merged, Fluent Bit will parses a time string '13:45:30' extrapolating
the date fileld from the system date. So the parser would produce (assuming
today is Feb 20 2020):

{"time":"2020-02-20 13:45:30","log":"..."}

... instead of:

{"time":"2019-12-31 13:45:30","log":"..."}

Note: #1960 is the minimal fix that solves the original issue. It should be sufficient
as long as this issue is concerned.

Some comments on the previous posts:

  • As to @tantra35's observation, I think it's basically correct. We don't need the
    current complication, and we can drastically simplify the time handling without
    losing anything. I'll do this after v1.4.0 released
  • As to @nigels-com's comment, I concur with him. It would be better if we can
    handle the midnight robustly. My rough idea is to add a logic to rollback the
    timestamp to yesterday if the timestamp is a future date; I think this issue would
    deserve an another separate ticket.

Fixed on #1960 , thanks, everyone.

@fujimotos pls open a new ticket for remaining issues

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Barbazoo picture Barbazoo  路  3Comments

arienchen picture arienchen  路  3Comments

iamshreeram picture iamshreeram  路  3Comments

edsiper picture edsiper  路  4Comments

UladzimirSemiankou picture UladzimirSemiankou  路  3Comments