Description of Problem:
It is unclear which log level should be used for which purpose when building a bot. More often than not, one will notice that either there is to much information in an output log or to little while developing a bot.
The python guidelines on log levels (https://docs.python.org/3/howto/logging.html#when-to-use-logging) can serve as an orientation. We do have two common diagnostic cases needed by different users: debugging a bot (e.g. action failed to execute) VS debugging underlying machine learning machinery (e.g. wrong featurization).
Overview of the Solution:
All log levels should have a definition of when we expect a user to use them, e.g. during local development we'd expect a user to use log level debug. Of course the user can use a different log level, but this should guide us which information should be logged at which severity. This is mainly focused on rasa run.
Situations we should clarify the log level for:
This would require us to add an additional log level (TRACE) with lower severity than DEBUG. E.g. we would log something like
2020-11-15 14:20:30 DEBUG rasa.core.policies.rule_policy - Current tracker state:
[state 1] user intent: affirm | previous action name: action_listen
using trace.
Great push! 👍
I think we should also considering structlog when running in production. That makes it easier to monitor Rasa (X) deployments via ELK etc. In local mode plain text logs might be more user friendly. However, structlog could also be interesting in this case as it would allow for quick filtering for certain things (e.g. output of ML components)
oh interesting, I do see the appeal for automated monitoring but as you said not quite practical for local debugging - at least not everywhere.
would love to hear @tczekajlo opinion on the structured logging + elk part
I'd love to see structured logging for rasa running in a container. Current logging basically is useless without additional configuration/parser for ELK.
Also, I think it would be great if rasa has auto mode for logging - if tty is available display logging that is more user-friendly.
what information would we log if we'd use something like the mentioned structlog to work with ELK? e.g. are there default attributes we should log or is it completely up to us?
what information would we log if we'd use something like the mentioned
structlogto work with ELK? e.g. are there default attributes we should log or is it completely up to us?
It's up to us. If we would use JSON format as well there are ts and msg fields. Here is a short blogpost about structured logs in the Kubernetes https://kubernetes.io/blog/2020/09/04/kubernetes-1-19-introducing-structured-logs/ (might brings some ideas)
what information would we log if we'd use something like the mentioned structlog
An idea would be to log similar things as @rgstephens is logging in his OpenTelemetry changes 🤔
What information would we log
Wouldn't we replace all existing std logger module calls with structlog?
This would require us to add an additional log level (
TRACE) with lower severity thanDEBUG.
The trace level would also be useful for troubleshooting lock store, event broker and any other areas that could generate a large number of log messages. I currently hesitate to add logging to these areas but a trace level would be useful for this.
Logging & OpenTracing/Telemetry
Switching to structured logging would fit well with tracing. The structured logs would be attached to spans. There's an example of that with OpenTracing here. Elastic and Datadog both provide examples of integrating stuctlog with their proprietary tracing systems.
Most helpful comment
I'd love to see structured logging for rasa running in a container. Current logging basically is useless without additional configuration/parser for ELK.
Also, I think it would be great if rasa has
automode for logging - ifttyis available display logging that is more user-friendly.