Is there a clear distinction between what belongs in a structured Body vs. in Attributes? If there's not, it seems to make it less predictable where data is expected, and how to map between other models.
A top-level "message" string seems to be pretty common among other log data models. In this model, if you want to have a top-level message string and structured data describing the event, are you supposed to put the message in Body and other data that would otherwise be in Body in Attributes? I don't see anything about special-casing a property of Body as a top-level message.
From the mapping perspective, considering the Elastic Common Schema (ECS) example, it doesn't include all ECS fields, but message is the only one shown as mapping to Body. Body seems like a logical (perhaps the most logical) place for fields like error.message or event.id. So the fact that fields like error.message are shown as mapping into Attributes makes it seem that there's not a clear distinction, and that the mapping could be ambiguous depending on whether message is populated.
(BTW, unlike most of the rest of the document, the ECS example refers to "body" and "attributes" rather than "Body" and "Attributes", which I assume is a typo or holdover from a previous version.)
Attributes is documented as:
SHOULD follow OpenTelemetry semantic conventions for Attributes.
If I understand correctly, that means that a property of Attributes that has the name of a well-known attribute should have the meaning and data type defined for that attribute, but meanwhile Attributes can also include arbitrary custom attributes.
That also seems to have implications for placement of data within Body vs. Attributes, because whereas no semantic conventions apply to Body, if a property gets bumped from Body into Attributes, then it may be that conventions are supposed to apply that wouldn't otherwise.
Does it actually help to have another container for key/values associated with Log Record? If something is not covered by semantic conventions, I think it simply means that its definition is out of scope of OT - it might be related to a given environment, come in MDC, etc. As long as there's no conflict in the key names, why we cannot reuse Attributes?
As long as there's no conflict in the key names, why we cannot reuse Attributes?
Attributes require a key to record a value. Body does not. Body is better suited for the most common legacy use case of logs: an unstructured text log line. To record it in the Attributes we would need a semantic convention for what key to use which is different from everything else.
Separate Body and Attributes appear to better fit existing logging data models (e.g. MSG vs STRUCTURED-DATA in Syslog, or log message vs log fields in Zap logger).
Thank you both for commenting.
What I was really wondering is if it would make the most sense to have a top-level Message that's always a string when defined, and structured data that's part of the event would always go into a map in a top-level property such as Attributes (either directly or perhaps under a key reserved for a nested map of custom data with no semantic conventions?), or perhaps a top-level map reserved for that data (though maybe that clashes with reserving top-level properties for things that are almost always present).
Maybe I misunderstood, but I got the impression @pmm-sumo was suggesting that any structured data that could currently be placed in Body could instead go in Attributes, instead of Body sometimes containing structured data, which is part of what I was getting at. By "another container" did you mean Body or Context from #1660?
I think I messed up and my comment was largely referring to Context of #1660 (somehow I mixed up the two issues). Apologies for that @jmm!
I think we may also want to change perspective when looking at that. Let's consider that both Body and Attributes contain key-values. Does it make any practical difference from processor, exporter or vendor perspective if a given key-value is present in Body or in Attributes?
One case I can think of if someone wants to put a boundary between metadata (present in Attributes) and record content (which might be a structured Body), so there would be a clear distinction between those.
To bring an example, consider someone is having a temperature sensor and logging its output. The sensor has some metadata assigned, e.g. id, connection type, etc. that are not part of the record. Practically, this might look like following:
Body: {"temperature": 21.4, "unit": "degrees_celcius"}
Attributes: {"sensor_id": "1a90c", "manufacturer": "Acme", "connection": "usb 3.2"}
@pmm-sumo no worries!
The only case I can think of if someone wants to put a boundary between metadata (present in
Attributes) and record content (which might be a structuredBody), so there would be a clear distinction between those.
Yeah that's what I was getting at. But the way it's designed currently, any time you want to have structured record content _and_ a top-level message, the structured content would have to be bumped into Attributes anyway. Unless the idea is that message and structured record content are mutually exclusive. And the example mapping of ECS to this model shows data that I see as part of the event, and therefore probably most at home in Body, being mapped into Attributes.
Unless the idea is that message and structured record content are mutually exclusive.
My understanding of Body field description is exactly that - either a raw message or a structured content (map or array). This is further reinforced by any type definition
Right, it's mutually exclusive in terms of Body. But if you populate Body with a message string, is that supposed to mean you can't also populate arbitrary structured data that doesn't have standardized semantics? If that's not the intent then the structured data would have to get bumped into Attributes, right? Let's say a message and tags, for example.
So what I'm really wondering is if things would be more straightforward by having a top-level Message that's always a string when populated, and additional structured data without standardized semantics would always go within a certain top-level property (whether it's under Attributes or Body or something else).
But if you populate
Bodywith a message string, is that supposed to mean you can't also populate arbitrary structured data that doesn't have standardized semantics?
I believe they can hold any sort of data. They SHOULD (not MUST) follow Semantic Conventions according to the data model.
Lets consider several options:
1) Attributes only
My understanding is that it's much like ECS schema. There can be a special key, say Message that denotes an attribute containing raw string message. If the message is structured, the related fields are mixed with metadata
2) Attributes and Message
This is the same like above, except Message is now a field of the record. Everything else holds true
3) Attributes, Message and MessageAttributes (for lack of a better name)
Let's say we want to separate record-level attributes and message-level attributes (if the original message was structured). That's one way to do it.
4) Attributes and Body
This is the current approach. Body is flexible enough to cover either a structured original message or a plain-text original message but not really both (unless some standard field name would be introduced for plain-text message).
Things get bit more complex when there's mix of structured and unstructured data. For a practical example, here's a random output from OpenTelemetry Collector:
2021-05-06T21:43:23.740+0200 info service/application.go:261 Starting OpenTelemetry Collector... {"Version": "v0.24.0-27-gfa73baf8", "GitHash": "fa73baf8", "NumCPU": 16}
Taking the timestamp, log level and caller aside, we end up with essentially a message: Starting OpenTelemetry Collector... and some attributes: {"Version": "v0.24.0-27-gfa73baf8", "GitHash": "fa73baf8", "NumCPU": 16}. With the current approach, should they land in Attributes or as a part of a structured Body? If it's the latter, what about the message?
Actually, log data model comes with an answer for that case - the attributes go to Attributes and message to Body:
| Field | Type | Description | Maps to Unified Model Field |
| ts | Timestamp | Time when an event occurred measured by the origin clock. | Timestamp |
| level | enum | Logging level. | Severity |
| caller | string | Calling function's filename and line number. | Attributes, key=TBD |
| msg | string | Human readable message. | Body |
| All other fields | any | Structured data. | Attributes |
@jmm @pmm-sumo let's discuss today in Log SIG meeting if you plan to attend.
What do you think about the following?
If the log record contains one or more "pieces" of data that may fit either in Body or Attributes but do not fit the description of the other top-level fields (Timestamp, Severity, etc) then follow these guidelines to decide how to record these pieces of data in the Body and Attribute fields:
Note: if there is more than one piece of data that matches the rules 1-4 then we cannot record then in the Body, we have to come up with some keys and record each piece of data as a key/value pair in the Attributes.
This is a non-exhaustive set of heuristics but should be probably a good starting point. What do you think?
@pmm-sumo @djaglowski will one of you be able able to submit a PR to make corresponding changes in the spec?
@tigrannajaryan @djaglowski sure, preparing a proposal. Since the guidelines are quite clear, I am going to literally put those into a dedicated section
Most helpful comment
What do you think about the following?
If the log record contains one or more "pieces" of data that may fit either in Body or Attributes but do not fit the description of the other top-level fields (Timestamp, Severity, etc) then follow these guidelines to decide how to record these pieces of data in the Body and Attribute fields:
Note: if there is more than one piece of data that matches the rules 1-4 then we cannot record then in the Body, we have to come up with some keys and record each piece of data as a key/value pair in the Attributes.
This is a non-exhaustive set of heuristics but should be probably a good starting point. What do you think?