Gunicorn: Gunicorn default error logging date format is not standard ISO8601

Created on 3 May 2018  路  34Comments  路  Source: benoitc/gunicorn

https://github.com/benoitc/gunicorn/blob/e73ca252f7e1d0286998a0ae4254164291020a0c/gunicorn/glogging.py#L88

The motivation for this issue is the following grok statement in logstash:

grok { match => { "message" => "\[(?<gunicorn.time>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})? %{ISO8601_TIMEZONE})\] \[%{NUMBER:[gunicorn][worker_id]}\] \[%{LOGLEVEL:[gunicorn][level]}\] %{GREEDYDATA:[gunicorn][message]}" } remove_field => "message" }

Had the date format included no space between the time and the time zone the following statement would have cut it:

grok { match => { "message" => "\[%{TIMESTAMP_ISO8601:[gunicorn][time]}\] \[%{NUMBER:[gunicorn][worker_id]}\] \[%{LOGLEVEL:[gunicorn][level]}\] %{GREEDYDATA:[gunicorn][message]}" } remove_field => "message" }

I would argue that a standard date format would be better suited for a default configuration. I think adding an extra space there adds no value but creates problems instead. Either having to replace the whole default logconfig, which is excessive, or having to write custom expressions in log parsers to cater for it.

Improvement Discussion FeaturLogging

Most helpful comment

Ok so that does mean that grok in logstash is preloaded with a strict pattern rather than the lenient one. I have worked with standards people before at the BBC/EBU and I remember this kind of terrible wording. For readability "(say) a space character" This is not implementable.... what is that character? Of course you can allow any character at this point and that is not OK. So the guys at Logstash implemented this the following way to be either T or space

TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?

Furthermore THIS IS NOT WHAT THE ISSUE WAS ABOUT. What the issue was about is the timezone suffix having a space that the standard DOES NOT ALLOW.

So no, unfortunately this format is not common. This format is unorthodox specific to gunicorn. I know how stupid this sounds that we are arguing about a space but when it comes to languages and automotons syntax is everything.

All 34 comments

The date format follows the Common Log Format: https://en.wikipedia.org/wiki/Common_Log_Format

Oh, my apologies. My response is about the access log format. This report is about the error log format.

Looks like you could use %{DATESTAMP} %{ISO8601_TIMEZONE}

for 19.x it's too late to do any change. It would break a lot of logging usage around. Is this something we want to change in next major version?

@tilgovi so is this something wanted?

A lot of deployed system rely on the Current format. I am worried it's a too major change that doesn't bring much value since a rule can always be written about it in log stash like systems. Thoughts?

A possible compromise could be a switch and deprecation. You can start phasing out the odd(however so slightly) date format that is not following standards and provide the user with a switch to a fixed log line format. I know that I can override the logger via python but since gunicorn comes with its competing settings baked in it would not be my favourite choice :D I am not an OCD person but cooking up a grok query is a modern type of torture especially when the difference you are accounting for is one space :D And then it is up to the developer to maintain that longer than necessary match should anything change. It is not the end of the world if one has to but we keep fixing bugs even if people rely on the buggy software... That is what I think... not sure if the question was pointed at me...

We could look at changing the default for R20.

This format is pretty common actually. The RFC 3339 has a note about it:

NOTE: ISO 8601 defines date and time separated by "T". Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character.

The error log is also printed on the command line and should be read by a human and I would like to keep it that way. Is there any issue with the timezone format ?

Ok so that does mean that grok in logstash is preloaded with a strict pattern rather than the lenient one. I have worked with standards people before at the BBC/EBU and I remember this kind of terrible wording. For readability "(say) a space character" This is not implementable.... what is that character? Of course you can allow any character at this point and that is not OK. So the guys at Logstash implemented this the following way to be either T or space

TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?

Furthermore THIS IS NOT WHAT THE ISSUE WAS ABOUT. What the issue was about is the timezone suffix having a space that the standard DOES NOT ALLOW.

So no, unfortunately this format is not common. This format is unorthodox specific to gunicorn. I know how stupid this sounds that we are arguing about a space but when it comes to languages and automotons syntax is everything.

That's helpful, @kozmaz87. Thanks for digging into the spec.

But since we are here I would love to also point to the access log format, which is even worse. I did not even manage to find in the code any place where it is coming from... It is constructed from some pretty obscure configuration mechanism. assigned from somewhere else. I did not checkout the code to investigate but it does produce this output:

127.0.0.1 - - [13/Aug/2018:15:03:26 +0000] "GET /debug/sms HTTP/1.1" 400 74 "-" "python-requests/2.18.4"

Looking at the gunicorn docs we learn that the 2nd part of this log is a '-' for whatever reason... the 3rd is supposed to be the user, which is also evaluating to '-' not sure why and then comes this lovely date format where the hour is married to the year by a ':' and of course the favourite space separated timezone suffix again. But after digging around I found that this is how nginx logs it so I assume this is where this space separated timezone thing is originating from by trying to mimic the nginx access logs. HAProxy does use this as well except it does not put a timezone suffix on...

Logging is crazy... Someone bring me a bucket of icy water :)

The access log format is definitely "common": https://en.wikipedia.org/wiki/Common_Log_Format

We add referrer and user agent to the end, though. Refer to the --access-logformat flag: http://docs.gunicorn.org/en/latest/settings.html#access-log-format

I realise that. I just find it funny that the 2nd item is a '-' :D It feels like whoever implemented it had had it with this and just put in a '-'

i would keep the current log format. imo common format is good and I am not aware about any change in servers upstream. thoughts? cc @tilgovi

bump @tilgovi also @berkerpeksag

closing the issue as won't fix. As said @tilgovi we are using the [common log format](
https://en.wikipedia.org/wiki/Common_Log_Format.

I think we could keep this open. We are not using the common log format for the error log. The common log format is an access log format and we use it there.

Actually, never mind. I double checked just now and we use the same time format string for both. That seems better to me than any alternative. I would not want the access log and the error log to have different date formats.

I understand the issue better now and think we should re-open.

Here's example output from Gunicorn with default settings:

[2019-01-25 11:44:34 -0800] [22794] [INFO] Starting gunicorn 19.9.0
[2019-01-25 11:44:34 -0800] [22794] [INFO] Listening at: http://127.0.0.1:8000 (22794)
[2019-01-25 11:44:34 -0800] [22794] [INFO] Using worker: sync
[2019-01-25 11:44:34 -0800] [22797] [INFO] Booting worker with pid: 22797
[2019-01-25 11:44:36 -0800] [22797] [INFO] 127.0.0.1 - - [25/Jan/2019:11:44:36 -0800] "GET / HTTP/1.1" 200 14 "-" "curl/7.54.0"

The issue is not about parsing the common log format of the access log, it's about parsing the _entire log line_.

Gunicorn outputs a timestamp, pid, and level at the start of the log line. The access log lines _also_ have a message in common log format with its own timestamp.

Notice how the timestamps are not in the same format. The original request for this issue was to have the timestamp at the start of the log line have no space _just like the common log format_.

That would look like this:

[25/Jan/2019:11:44:34 -0800] [22794] [INFO] Starting gunicorn 19.9.0
[25/Jan/2019:11:44:34 -0800] [22794] [INFO] Listening at: http://127.0.0.1:8000 (22794)
[25/Jan/2019:11:44:34 -0800] [22794] [INFO] Using worker: sync
[25/Jan/2019:11:44:34 -0800] [22797] [INFO] Booting worker with pid: 22797
[25/Jan/2019:11:44:36 -0800] [22797] [INFO] 127.0.0.1 - - [25/Jan/2019:11:44:36 -0800] "GET / HTTP/1.1" 200 14 "-" "curl/7.54.0"

I think the answer is probably not, because the common log format is not as international (it has a short month name).

However, we could change the timestamp at the start of each log line to be an ISO8601 timestamp.

https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations

The space between date and time is actually allowed in place of a T character, but a space between the time and the zone offset is not.

Here are valid ISO8601 formats we could have:

  • [2019-01-25T11:44:34-0800]
  • [2019-01-25 11:44:34-0800]

Compare to what we have now:

[2019-01-25 11:44:34 -0800]
                    ^ there is a space here

I would very much worry about breaking deployed systems, though.

well parsing/reformatting logs can easily be handled tools like logstash, so I'm not sure it's an issue. I would keep it that way for now.

If people want a different format for access logs maybe we can add a specific identifier for it? That way we wouldn't break the format. Error logs are however an issue since we don't provide a way to format them. In such case maybe an environment variable would do it?

I don't think the access log format should change. It is common log format right now and we have the --access-log-format setting.

The access log format only formats the _message_ of the access log that gets passed to the handler. The handler then has its own formatter.

Our default formatter for the stream handlers on stdout and stderr put a timestamp on the beginning. That means that with the default configuration the access log has _two_ timestamps: one at the beginning and one in the message.

Changing the formatter for the handler requires using one of the --logconfig options (file or dict).

We should consider having an ISO8601 timestamp in the default formatter.

parsing/reformatting logs can easily be handled tools like logstash

Yes, but it's very convenient when those tools can parse the timestamp with built-in patterns so the user doesn't have to write a regexp. The original issue was opened because grok has a built-in pattern for ISO8601 timestamps.

@tilgovi I don't want to break the compatibility. Also NGINX does offer the possibility to set the time either using ISO8601 format or Common Log Format :

$time_iso8601
local time in the ISO 8601 standard format
$time_local
local time in the Common Log Format

https://nginx.org/en/docs/http/ngx_http_log_module.html

I would do the same since it doesn't break the legacy. Btw shouldn't we just display the access log line to the output? It seems we shouldn't not have the first header with the PID. Thoughts?

about making it part of the default formatter i'm worried it breaks some tool around. What about having a custom environment variable TIME_ISO8601=true to force it?

I don't want to break the compatibility.

I don't either. I only wanted to re-open the ticket because I think we closed it for the wrong reason. We both responded as though the issue was to change from common log format. The issue is the timestamp in the default formatter, not the format of the access log message. I'm glad we can discuss it more, but the answer maybe is to still do nothing.

Btw shouldn't we just display the access log line to the output? It seems we shouldn't not have the first header with the PID. Thoughts?

Possible. I'm not sure.

What about having a custom environment variable

Maybe okay. The user can always control the log completely by using the advanced logging configuration. We try to make some settings simple for the CLI, like --log-level, so users do not have to use configuration files. Maybe we could add --log-date-format? It could even recognize symbolic strings like iso8601. This setting would be for users that don't want to use --log-config or log_config_dict.

@tilgovi in the mean time the 20.0 release is a good time for a change in the format since we are breaking the compatibility with python 2.

I think what worries me more about any legacy is that ISO8601 is hard to parse for human eyes and lot of people, including me, are using the console as an opportunity to observe what is happening.

I would like to suggest the following:

  • [ ] add an option -iso8601 that force stdout & stderr logs to use this format (like you're suggesting)
  • [ ] in access log add an option in the format to display the time under this format

While we are here maybe we could also have an -utc option to use UTC for the time? Thoughts?

Just to be sure, this would be the proposed diff if we just changed the default:

diff --git a/gunicorn/glogging.py b/gunicorn/glogging.py
index 56cc5bd..0735e58 100644
--- a/gunicorn/glogging.py
+++ b/gunicorn/glogging.py
@@ -80,7 +80,7 @@ CONFIG_DEFAULTS = dict(
         formatters={
             "generic": {
                 "format": "%(asctime)s [%(process)d] [%(levelname)s] %(message)s",
-                "datefmt": "[%Y-%m-%d %H:%M:%S %z]",
+                "datefmt": "[%Y-%m-%d %H:%M:%S%z]",
                 "class": "logging.Formatter"
             }
         }
@@ -175,7 +175,7 @@ class Logger(object):
     loglevel = logging.INFO

     error_fmt = r"%(asctime)s [%(process)d] [%(levelname)s] %(message)s"
-    datefmt = r"[%Y-%m-%d %H:%M:%S %z]"
+    datefmt = r"[%Y-%m-%d %H:%M:%S%z]"

     access_fmt = "%(message)s"
     syslog_fmt = "[%(process)d] %(message)s"

I think an option to use an ISO8601 datetime in the access log tokens is interesting, but it's separate from what motivated this issue.

I don't feel strongly about this, by the way. 馃槃 I just want to represent the issue accurately.

we probably need to think a little bit more about it. I am postponing it to 20.1 to let us some times.

Dear devels,
I'm facing a request loss issue (has nothing to do with Gunicorn). I would need to be able to have the precise timestamp including the microseconds, as in this example from one of our Apache servers : 2019-10-30 14:27:16.960421. It would be a cool enhancement, thx for considering it.

Any hope if the flag, log-date-format iso8601, will be available in the next vers of gunicorn?

Was this page helpful?
0 / 5 - 0 ratings