Zephyr: Logging Subsystem Enhancements

Created on 13 Feb 2018 · 14Comments · Source: zephyrproject-rtos/zephyr

[x] Modify namespace (SYS_LOG -> LOG)
[ ] Support multiple backends
- [ ] FCB
- [x] memory
- [ ] filesystem
- [x] networking
- [ ] mcumgr(!)
[x] Add timestamps
[x] Provide meta-data (such as log level) separately to the backend (right now it's embedded into the pre-formatted string)
[ ] multi-core support: master - slave configuration where one core acts as slave and the master collects logging from the slave and displays the logs of the system as a whole

Feature Logging high

Source

nashif

Most helpful comment

Would you consider trying to shorten the API names a bit? E.g. LOG_WARN, LOG_INFO, LOG_ERR and LOG_DBG? Part of the parameter list is often a fairly long string, and having to split it up something to be avoided.

jhedberg on 20 Apr 2018

😄2

All 14 comments

@ccollins476ad FYI, for 1.12

carlescufi on 13 Feb 2018

nashif on 27 Mar 2018

CC @pizi-nordic

carlescufi on 27 Mar 2018

As part of this cleanup can we look at supporting an API that allows the user to do something like:

LOG(level, log_format, ...)

and have features like COLOR work consistently.

galak on 10 Apr 2018

I would like to present a logger design which I think might be suitable for the Zephyr. I will post my proposal in parts, starting from the high-level overview. It will be later followed by some notes about the implementation and advanced features like multi-core support and Trust Zone integration.

Introduction

The complexity of modern embedded systems enforces changes in our working methodology. Challenges we face every day include:

Large amount of data.
The continuously increasing speed of peripherals is a dominating factor that influences rate of debugging data generation, while complexity of the protocols, peripherals and firmware determines size of generated data as more information is needed to describe state of the monitored component and the system. The amount of data generated during testing and debug sessions negatively impact on the data analysis time. As result, we can observe creation of dedicated tools, which are verifying system behaviour basing on the received data, extract required information from data stream or convert the information into a form which is easier to analyse by human.
Existence of multiple separated domains in the system.
This includes multi core and multi CPU systems as well as new technologies like Trust Zone. Usually in such case firmware for each domain is being developed separately, however system behaviour and performance depends on successful cooperation of all components. As result the tools we need must be able to present system-level view of the firmware internals in order to easily trace interaction between domains. The detail view is also needed as complex systems are built from large number of components. Developer must be able to focus on currently developed or tested module and silence all noise coming from other parts of the system.

These challenges faced every day by a developer, tester or system integrator could be addressed by dedicated logger subsystem, presented in this document.

Requirements

The core requirements of the logger are defined by its usability:

Performance.
The performance is the key parameter of the logger subsystem. As the logger acts as observer of the system internals it has to have minimal impact on the observed entities. Minimizing time needed to log data is especially important in real-time subsystems, which just fails if deadlines are not met. The log processing and transmission speed is also important, as there could be a lot of log data which have to be gathered in order trace down complex problem.
Context-free operation and lock-less log gathering.
The data logging function must neither take any locks nor wait for any other kind of resources. Moreover, it should be prepared to be interrupted at any time by higher priority code, which could also use the same API for logging.
Support for multiple log levels and component/instance separation.
Depending on the situation, the system-level or detailed view is needed. The good logger to support different log levels for different components on the system in order to minimize amount of information not relevant to the currently performed task. This is especially important for modules which are re-used in multiple places of the system. For example, the pool-based memory allocator must have different log level setting for each pool, as enabling full debug component-wise might generate overwhelming amount of information.
Ease of use.
The logger usability is also defined by the amount and quality of the presented information. The easier it will be to use, the more likely developer will put it in the newly created code and the more information will be available. Moreover, access to the data and integration with both internal (for example memory management subsystem or building infrastructure) and external infrastructure (like custom testing frameworks) should be as simple and straightforward as possible. The human factor cannot be underestimated while thinking about logger usage - simple features like colours might be used to catch attention to the most important messages significantly reducing time need for log analysis.

Operation Principles

Logger Frontend

The Logger Frontend collects logs form the system. From the developer perspective it mimics printf() interface. Dedicated macro (one for each log level) accepts format string and optional arguments:

LOG_ERROR("This is error");
LOG_WARNING("This is warning with %u argument", 1)
LOG_INFO("This is info message with %u %s", 2, "arguments");
LOG_DEBUG("This is debug message");

The most important difference between the printf()-based solutions and the proposed logger is the processing of data. The macros presented above do not perform any processing. Instead they just store the pointer to the format string and optional arguments in the dedicated buffer. As result the cost of message logging is equal to few memory writes. The stored data in converted to to the human-readable string later, in the idle loop of the system in order to avoid negative impact of increased CPU load on the system operation. The proposed behaviour has one limitation, which must be taken under consideration by every developer using the logger: as processing is performed in the idle loop, some time after macro execution, thus all string pointers provided to the log system have to be valid all the time.

In addition to the data provided by developer, the macros automatically add some metadata, like timestamps, identifier of module generating logs as well as log level. All the information is stored into a buffer (the circular buffer is an obvious choice, however it has some drawbacks which will be discussed later) where it waits for the further processing.

Logger Core

The Logger Core is responsible for management of log processing. As the log processing must have minimal impact on the system, the Logger Core is called directly from the idle thread and can be pre-empted any time. Moreover, to minimize latency in systems when no preemption is used, it processes single log message on each call, allowing system to check if there is a more important task to handle as often as possible. On each call, single log entry is fetched from the buffer in which it was stored by the frontend and is passed to connected backends.

Logger Backend

The backend is responsible for log messages presentation, transmission or storage. Usually, at the first stage of processing, backend converts the data received from logger core into a human-readable from: the printf()-like expression is evaluated and converted to string, which is prepended by metadata and terminated by end of line string. Additional presentation tasks, like for example applying colours to warning and error messages, are also performed at this stage. Then, the generated strings are sent to the host:

[00:01:24.548072] <error> example_module: This is error
[00:01:24.548073] <warning> example_module: This is warning with 1 argument
[00:01:24.548074] <info> example_module: This is message with 2 arguments
[00:01:24.548075] <debug> example_module: This is debug message

The backend could also perform any other action on the received logs. This include storing them in flash or forwarding them to host without rendering.

Basic Features

Log Filtering

The log filtering is done separately for each backend basing on log level and source of the message (module/instance id). Such approach allows for debugging one backend using another one as well as forwarding selected messages (like errors and warnings) to dedicated backend (for example a flash-based storage).

The log filtering is done in 2 places in order to minimize amount of CPU time needed to collect and process logs. The coarse check is done in the Logger Frontend: the LOG_* macros does not perform any action if there is no backend waiting for a messages of given level from given module. If the message pass this check, the Logger Core do the final filtering by routing the messages only to backends requesting the such information.

Object Dumping

The delayed processing of the log messages might be a problem when a developer would like to log some strings (in general: any data) which is only available at the time of LOG_* macro execution. To address this problem, an object dumping interface is provided. The developer might use dedicated macro (LOG_HEXDUMP()) to insert blob of data into log buffer. Such blob is then presented to the user as a hexdump of stored data. Unfortunately this feature has an significant cost, as the log buffer must be large enough to hold whole dumped object.

Similar mechanism might be used as a compatibility layer in order to provide equivalent of the existing printk() interface.

Panic Mode

As proposed logger buffers the messages, an implicit flush is required when unrecoverable errors is detected. Also, some backends (for example a network based ones) usually are not able to work properly after critical error . In order to allow logging during faults, a panic mode is introduced. When Logger is informed about system fault it:

Informs backends that system is in panic mode.
The complex backeds relaying on large subsystems (like network stacks), should immediately stop to avoid further errors. Simpler backends should minimize their dependence on the system components and switch to blocking mode. For example, the UART backend should use polling instead of interrupts and send every message as soon as possible without any additional buffering. Moreover, the backends could change its filtering settings in order to output as much information as possible.
Flushes log buffer.
This step forces rendering of all stored logs and its transmission in blocking mode.
Become blocking.
When log buffer is empty, the logger system become blocking. The LOG_* macros directly calls backends as there is no hope that the system will return to the idle thread to process logs there.

pizi-nordic on 20 Apr 2018

👍2

jhedberg on 20 Apr 2018

😄2

@jhedberg: Good idea!

pizi-nordic on 23 Apr 2018

I think it would be nice if Zephyr could also provide macros that mimic the vprintf() functionality (by taking a va_list parameter). It would make it easier to use Zephyr as logging backend for other projects, such as libmetal, where we need to forward the formatting arguments from another function (see OpenAMP/libmetal#38).

skordal on 24 Apr 2018

@skordal: Of course the *printf() interface could be provided, but in my opinion we should allow that only during a transition period, as there are essential differences between logger and the behaviour expected from *printf() calls.

The proposed logger is lossy: If there is no time to process logs, they will be dropped in order to maintain system operation. Such loss is not expected in printf(). Moreover direct usage of printf() is more costly, as you are loosing benefit of the delayed log processing.

pizi-nordic on 24 Apr 2018

That's fine, what I mean is, it could be useful to have the possibility to pass a va_list to the logging functions, to make it possible to forward logging messages from other libraries/frameworks (such as libmetal) which implement their own logging frameworks.

The problem here is that I want to forward logging messages from a function, defined as metal_log_handler(enum metal_log_level level, const char *format, ...). If I want to use Zepyr's logging macros I would have to preformat the logging messages in metal_log_handler() (by calling vsprintf() or something similar) before calling LOG_ERROR(), since I cannot forward the argument list from the logging function to Zephyr's logging API for formatting.

If possible, it would be nice to have some additional logging macros which allows us to call the Zephyr logging API in such situations, to remove the overhead of having to format the log messages in the framework before passing the strings on to the Zephyr logging API.

skordal on 27 Apr 2018

@skordal: I see your problem now. The vsprintf() is a simple, but ineffective solution. What is needed to interface such function to the proposed logging subsystem is a code which extract arguments from the va_list basing on format string and pass them to the logger. Such code might be available through dedicated macros like VLOG_VINFO(), VLOG_ERR() and so on. The only drawback of such approach be a reduced performance of the VLOG_* macros, as format string has to be scanned at the logging time.

pizi-nordic on 27 Apr 2018

As principles of the logger seems to be accepted by everyone, it is time to discuss some implementation details:

Implementation considerations

The data structures used in the Logger Core are the most critical part of the design as they have direct impact on the key logger parameters (especially performance). At the first look, a plain circular buffer satisfies all needs: saving data in the buffer is fast, the data blocks could have different size and memory usage is very efficient. However there are some corner cases which require complex handling when this type of log storage is used.

logger-buffer-problem-1

The first, obvious limitation of the plain circular buffer is a requirement of memory copying shown on the picture above. The data must be copied out from the circular buffer and placed into dedicated backed queues. If the same message is directed to several backends, the same data is copied multiple times, occupying significant amount of the memory. The other problem, presented on the picture below is visible when logging is performed from code running on different priority levels:

logger-buffer-problem-2

In the example presented above a thread producing log entries is preempted by an interrupt which also logs some messages. The preemption occurred when the thread was writing data to the Log buffer, making Log Entry #3 incomplete. As result the Log Entry #3 is stuck in the buffer until execution of interrupted thread is not resumed, which have several implications:

In case of any error triggering a log readout (for example a crash in the interrupt which triggers Panic Mode), this entry must be skipped by code reading log buffer.
If the interrupt will produce enough logs to fill the buffer, the oldest entries will be replaced by the new ones. The stuck entry also have to be skipped during this process. Moreover, in order to keep consistent log output, the skipped entry should not produce any message on the output when system will finally reach log processing in the idle thread as it would be overwritten in normal situation.
- The stuck Log Entry #3 also creates fragmentation problem in the log buffer: If the size of the IRQ Log Entry #6 exceed the size of space available between beginning of the log buffer and the stuck entry, a new log entry must be either placed after the Log Entry #3 (but this will create a hole in the buffer, which have to be handled) or simply dropped.

All the issues mentioned above could be resolved by using a slight modified data structures presented below:

logger-buffer-proposal

In the proposed approach a fixed size log entries are allocated from the dedicated pool, filled in and then placed into circular buffer, eliminating stuck entry issue described earlier. As the entries are fixed size, the buffer pool might use efficient block allocator (like mem_slab) to handle most common cases. In order to satisfy larger requests, like data dumping, the entries could be chained together, as seen on the figure above. Further data handling also requires minimal overhead because the log entries could be shared by multiple backends due to reference counter present in the log entry header. After processing the backends are decrementing the reference counter and the log entry returns to the pool when it is no longer used.

pizi-nordic on 30 Apr 2018

I think it's worth mentioning the limitations of this approach:

using %s with strings which are not permanent. Logger is storing argument value so it will store address of the string which may be on stack. By the time message is processed string content is not valid anymore.
arguments are 32 bit. Logging of doubles is not supported.