Opentelemetry-specification: Event is one of the major verticals as well as Metrics, Tracing and Logging in observability from my perspective

Created on 8 Jul 2019  路  13Comments  路  Source: open-telemetry/opentelemetry-specification

As we stated:

In software, observability typically refers to telemetry produced by services and is divided into three major verticals:

  • Tracing, aka distributed tracing, provides insight into the full lifecycles, aka traces, of requests to the system, allowing you to pinpoint failures and performance issues.
  • Metrics provide quantitative information about processes running inside the system, including counters, gauges, and histograms.
  • Logging provides insight into application-specific messages emitted by processes.

These verticals are tightly interconnected. Metrics can be used to pinpoint, for example, a subset of misbehaving traces. Logs associated with those traces could help to find the root cause of this behavior. And then new metrics can be configured, based on this discovery, to catch this issue earlier next time.

I think we should consider Events as one of the major verticals in open-telemetry too.
Such as machine reboot, process core dump, deployment, a system kernel error, and even Java Exceptions.

Products like sentry and cloudtrail.

https://en.wikipedia.org/wiki/Event_monitoring

api semantic-conventions needs discussion after-ga metrics trace

Most helpful comment

Bumping this up since it was referenced from #67.

What are Events if not just Logs?

Alternatively, could we treat events as special logs with a known structure?

Structured logs have been a thing for a long while. There is an RFC that defines how structured logs should be represented in text files [1] and AFAIK many modern logging backends support this format, in addition to other ways to ingest structured logs (e.g. as JSON).

On Windows the system and application logs are even called just that: Event Logs [2] and are structured.

So, the question is how are Events different from Logs? In what way?

[1] https://tools.ietf.org/html/rfc5424#page-15
[2] https://docs.microsoft.com/en-us/windows/win32/wes/windows-event-log

All 13 comments

Agree with that Events is import too. SaaS, like datadog also provide events support.

And also, I'm prefer to add events to span object as property.

Here is an example in real word. I'm measuring the container startup time in K8s, one component is kubelet, the work node agent in K8s. The container startup process in one work node including pull image, create container and run the container, but we need not make all this step as separated spans, we can make it as one span including some events:

{
    "service": "kubelet",
    "operation": "start_pod",
    "startTimestamp": "2019-07-05T19:17:55.961571091+08:00",
    "duration": "3000",
    "events": [
        {
            "level": "info",
            "summary": "image nginx pulled",
            "message": "Container image "nginx:1.17.0-alpine" already present on machine",
            "timestamp: "2019-07-05T19:17:56.961571091+08:00",
        },
        {
            "level": "info",
            "summary": "container created",
            "message": "Created container",
            "timestamp: "2019-07-05T19:17:57.961571091+08:00",
        },
        ... ...
    ],
    ... ...
}

To me, an Event is just a special case of a Trace, with 1 span.

To me, an Event is just a special case of a Trace, with 1 span.

Yeah. It is one way to look at it.

There are a few differences between Span(trace) and Event in my opinion.

  1. The end/finish time of an event is changeable (determined later)
  2. Events are related across the whole IaaS, It is hard to relate(instrument) them using Span Id.
  3. The query scenarios are more diverse than Span. Statistical analysis is one case. Such as how many deployments last week, how many network breakdown last year?
  4. Events may be merged later on. For example, lots of alert events occurred because of the crash of the Cache service, we might want to merge all these related alert events into one incident, It would be hard to do this using Span model.

Alternatively, could we treat events as special logs with a known structure?

Alternatively, could we treat events as special logs with a known structure?

Yes, We could.

At the same time, We can treat Logs as Spans(Trace) too.
Why haven't we done that?

Bumping this up since it was referenced from #67.

What are Events if not just Logs?

Alternatively, could we treat events as special logs with a known structure?

Structured logs have been a thing for a long while. There is an RFC that defines how structured logs should be represented in text files [1] and AFAIK many modern logging backends support this format, in addition to other ways to ingest structured logs (e.g. as JSON).

On Windows the system and application logs are even called just that: Event Logs [2] and are structured.

So, the question is how are Events different from Logs? In what way?

[1] https://tools.ietf.org/html/rfc5424#page-15
[2] https://docs.microsoft.com/en-us/windows/win32/wes/windows-event-log

Yes, Events are Logs.

I feel that Logs are a special class of events, where the payload is just {time: "yyyy-mm-dd hh:mm:ss +Z", data: "entire log line goes here"}

I feel that Logs are a special class of events, where the payload is just {time: "yyyy-mm-dd hh:mm:ss +Z", data: "entire log line goes here"}

Call it "structured logging" then, where structured attributes can be associated.

Whether you want to call it events or logs, there isn't a reason to think of them separately.

Such as machine reboot, process core dump, deployment, a system kernel error, and even Java Exceptions.

All in logs.

The way I see it, metrics track events (and other things), while traces and logs describe events. It's events all the way down. I think it would be beneficial in making that relationship clear in the documentation.

I think this issue is not actionable as-is. Can it be closed?

I think this issue is not actionable as-is. Can it be closed?

I vote for closing for reasons I outlined above.

Closing this issue as it is non actionable and because there are existing reasons to do so.

In any case, feel free to re-open (or open a new issue) if you think this still needs to be addressed in some form.

Was this page helpful?
0 / 5 - 0 ratings