We will be refactoring the ECS agent's logging to make it more flexible and powerful.
This is a master logging ticket to cover issues #1738, #881, and #287
Design details are being finalized but will be posted here when ready for public consumption and comment.
Thanks for starting on this!
Can you clarify what this Issue is addressing exactly if ECS_AVAILABLE_LOGGING_DRIVERS allows json-file already?
See: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html
@nehalrp This issue is for addressing problems with the logfiles that the ecs agent writes to the /var/log/ecs/ directory.
The env var you are referencing is used for configuring the instance's docker daemon's available docker logging drivers, which I believe is more related to the logs generated by the tasks running on the instance.
Hello everyone, please review and comment on the following plan for our logs going forward.
This project aims to address the following problems:
We plan to make the following overall changes:
Currently we use a date-based rollover behavior that rotates the logfile every hour, resulting in logfiles that look like /var/log/ecs/ecs-agent.log.2019-12-09-21
This behavior will remain the same for date-based rollovers, with one exception, that the ācurrentā logfile will always be named /var/log/ecs/ecs-agent.log. only rotated files receive a suffix. The primary reasoning behind this is because many implementations of syslog do not support wildcard matches, which makes integrating agent logs with syslog impossible for many of our customers.
Size-based rollovers will use the standard linux convention of appending a number to the rotated files. ie, the ācurrentā logfile will be named /var/log/ecs/ecs-agent.log, with rotated files getting a numerical suffix, ie,
/var/log/ecs/ecs-agent.log.1, /var/log/ecs/ecs-agent.log.2, etc.
The default output format will be a structured log line in the style of logfmt, we will also be adding the āmoduleā context which will simply be the filename from where the log msg came.
level=info time=2019-12-12T23:43:29Z msg="Initializing stats engine" module=engine.go
The reason for this format is that it is fairly easily to machine-parse in addition to being human-readable. It has become something of a defacto standard (also being used in uber/zap and logrus, for example), and it conforms to Splunkās logging best practices (see https://dev.splunk.com/enterprise/docs/developapps/logging/loggingbestpractices/)
The json output format will output the same information, but in a one-line json format:
{"time": "2019-11-07T22:52:02Z", "level": "info", "msg": "Starting Amazon Elastic Container Service Agent", "module": "engine.go"}
First off we can add a āmoduleā context that states the filename that the log was generated in.
Currently, we create a single logger for the entire agent project. In order to guarantee that a standard set of context gets added, we will add custom loggers with context added for a few select structs:
|struct |custom context |
|--- |--- |
|Task |taskARN |
|Container |containerName |
|ENI |ENIID |
|ManagedTask |taskARN |
|VolumeResource |volumeName |
As an example, one current logline in managedTask would go from this:
2019-12-13T00:16:51Z [INFO] Managed task [arn:aws:ecs:us-west-2:039193556298:task/auto-robc/33d755ba06e94b3594ba9c2da95a6f4a]: waiting for event for task
to this:
level=info time=2019-12-13T00:16:51Z msg="waiting for event for task" module=task_manager.go taskARN=arn:aws:ecs:us-west-2:039193556298:task/auto-robc/33d755ba06e94b3594ba9c2da95a6f4a
The benefit here is that agent developers do not need to add context on a case-by-case basis, which often leads to inconsistent or missing context when multiple developers are working on a project.
As an example of developer experience, we will be changing logging calls that look like this:
seelog.Infof("initialize volume: Task [%s]: non-autoprovisioned volume not found", task.Arn)
to this:
task.log.Infof("initialize volume: non-autoprovisioned volume not found")
The following configuration environment variables will need to be added to the agent:
|Config Var |Options |Description |Default value |
|--- |--- |--- |--- |
|ECS_LOG_ROLLOVER_TYPE |"date" or "size" |Controls whether the logfile will be rotated based on size or hour |date |
|ECS_LOG_OUTPUT_FORMAT |"logfmt" or "json" |Controls the log output format. json would have each logline being a structured json map containing timestamp, message, and log level. |logfmt |
|ECS_LOG_MAX_FILE_SIZE_MB |any float |If rollover behavior is set to "size", this controls the maximum size (in megabytes) of each logfile |10 |
|ECS_LOG_MAX_ROLL_COUNT |any integer |Controls the number of rotated logfiles to keep. Applies to both size and date rollover options. |24 |
| | | | |
level=info time=2019-12-12T23:43:29Z msg="Loading configuration" module=agent.go
level=info time=2019-12-12T23:43:29Z msg="Image excluded from cleanup: amazon/amazon-ecs-agent:latest" module=parse.go
level=info time=2019-12-12T23:43:29Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=parse.go
level=info time=2019-12-12T23:43:29Z msg="Amazon ECS agent Version: 1.34.0, Commit: ca640387" module=agent.go
level=info time=2019-12-12T23:43:29Z msg="Creating root ecs cgroup: /ecs" module=init_linux.go
level=info time=2019-12-12T23:43:29Z msg="Creating cgroup /ecs" module=cgroup_controller_linux.go
level=info time=2019-12-12T23:43:29Z msg="Loading state!" module=statemanager.go
level=info time=2019-12-12T23:43:29Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2019-12-12T23:43:29Z msg="Restored cluster 'auto-robc'" module=agent.go
level=info time=2019-12-12T23:43:29Z msg="Restored from checkpoint file. I am running as 'arn:aws:ecs:us-west-2:039193556298:container-instance/auto-robc/3330a8a91d15464ea30662d5840164cd' in cluster 'auto-robc'" module=agent.go
Most helpful comment
Hello everyone, please review and comment on the following plan for our logs going forward.
Problem statement
This project aims to address the following problems:
Solution
overview
We plan to make the following overall changes:
Rollover behavior
Currently we use a date-based rollover behavior that rotates the logfile every hour, resulting in logfiles that look like
/var/log/ecs/ecs-agent.log.2019-12-09-21This behavior will remain the same for date-based rollovers, with one exception, that the ācurrentā logfile will always be named
/var/log/ecs/ecs-agent.log. only rotated files receive a suffix. The primary reasoning behind this is because many implementations of syslog do not support wildcard matches, which makes integrating agent logs with syslog impossible for many of our customers.Size-based rollovers will use the standard linux convention of appending a number to the rotated files. ie, the ācurrentā logfile will be named
/var/log/ecs/ecs-agent.log, with rotated files getting a numerical suffix, ie,/var/log/ecs/ecs-agent.log.1, /var/log/ecs/ecs-agent.log.2, etc.Output format
The default output format will be a structured log line in the style of logfmt, we will also be adding the āmoduleā context which will simply be the filename from where the log msg came.
level=info time=2019-12-12T23:43:29Z msg="Initializing stats engine" module=engine.goThe reason for this format is that it is fairly easily to machine-parse in addition to being human-readable. It has become something of a defacto standard (also being used in uber/zap and logrus, for example), and it conforms to Splunkās logging best practices (see https://dev.splunk.com/enterprise/docs/developapps/logging/loggingbestpractices/)
The json output format will output the same information, but in a one-line json format:
{"time": "2019-11-07T22:52:02Z", "level": "info", "msg": "Starting Amazon Elastic Container Service Agent", "module": "engine.go"}Custom context and loggers
First off we can add a āmoduleā context that states the filename that the log was generated in.
Currently, we create a single logger for the entire agent project. In order to guarantee that a standard set of context gets added, we will add custom loggers with context added for a few select structs:
|struct |custom context |
|--- |--- |
|Task |taskARN |
|Container |containerName |
|ENI |ENIID |
|ManagedTask |taskARN |
|VolumeResource |volumeName |
As an example, one current logline in managedTask would go from this:
2019-12-13T00:16:51Z [INFO] Managed task [arn:aws:ecs:us-west-2:039193556298:task/auto-robc/33d755ba06e94b3594ba9c2da95a6f4a]: waiting for event for taskto this:
level=info time=2019-12-13T00:16:51Z msg="waiting for event for task" module=task_manager.go taskARN=arn:aws:ecs:us-west-2:039193556298:task/auto-robc/33d755ba06e94b3594ba9c2da95a6f4aThe benefit here is that agent developers do not need to add context on a case-by-case basis, which often leads to inconsistent or missing context when multiple developers are working on a project.
As an example of developer experience, we will be changing logging calls that look like this:
seelog.Infof("initialize volume: Task [%s]: non-autoprovisioned volume not found", task.Arn)to this:
task.log.Infof("initialize volume: non-autoprovisioned volume not found")configuration
The following configuration environment variables will need to be added to the agent:
|Config Var |Options |Description |Default value |
|--- |--- |--- |--- |
|
ECS_LOG_ROLLOVER_TYPE|"date" or "size" |Controls whether the logfile will be rotated based on size or hour |date ||
ECS_LOG_OUTPUT_FORMAT|"logfmt" or "json" |Controls the log output format. json would have each logline being a structured json map containing timestamp, message, and log level. |logfmt ||
ECS_LOG_MAX_FILE_SIZE_MB|any float |If rollover behavior is set to "size", this controls the maximum size (in megabytes) of each logfile |10 ||
ECS_LOG_MAX_ROLL_COUNT|any integer |Controls the number of rotated logfiles to keep. Applies to both size and date rollover options. |24 || | | | |
Appendix
Agent startup logfile example (logfmt)