OpenTracing and OpenCensus provide observability instrumentation. Athens currently uses OpenTracing through bketelsen/buffet (@bketelsen). See related issues.
We should evaluate which library we want to use in Athens. There are pros and cons to each.
Note: I am biased because I work at Google.
Of course, OpenTracing is already instrumented, so requires the least up-front work.
OpenCensus is supported by Google (announcement) and Microsoft (announcement). GCP and Azure libraries are (or will be) instrumented with OpenCensus. Meaning, if you make a call to the GCP storage API, you will get a detailed view into the GCP library and which part is contributing to latency. We could probably switch by adding OpenCensus support to buffet and changing some of the setup done in Athens (and how traces are recorded).
Note: In my opinion, this is the biggest reason to use OpenCensus over OpenTracing -- better support in the libraries Athens uses.
Here is how you create a span with OpenCensus:
ctx, span := trace.StartSpan(ctx, "spanName")
defer span.End()
// Child spans are created using the newly created ctx.
See https://opencensus.io/quickstart/go/tracing/.
And in OpenTracing:
parent := opentracing.GlobalTracer().StartSpan("hello")
defer parent.Finish()
child := opentracing.GlobalTracer().StartSpan("world", opentracing.ChildOf(parent.Context()))
defer child.Finish()
Alternatively, using context.Context:
span, ctx := opentracing.StartSpanFromContext(ctx, "operation_name")
defer span.Finish()
See https://github.com/opentracing/opentracing-go.
Application and request metrics are important indicators of availability. Custom metrics can provide insights into how availability indicators impact user experience or the business. Collected data can help automatically generate alerts at an outage or trigger better scheduling decisions to scale up a deployment automatically upon high demand.
Source: https://opencensus.io/core-concepts/metrics/
Metrics will be important for Athens to keep track of latency, errors, package size, and so on. Eventually, we'll need alerts set up based on these metrics to make sure everything stays working. In GCP terms, metrics and traces could be exported to Stackdriver, which can then send alerts. I am not familiar with other exporter/alerting options. I am not sure if OpenTracing can export to Stackdriver.
OpenCensus metrics are independent of tracing. There isn't a tidy snippet of how to define, record, and view metrics. Here is a walk through: https://opencensus.io/quickstart/go/metrics/. Gist:
// Encounters the number of non EOF(end-of-file) errors.
MErrors = stats.Int64("repl/errors", "The number of errors encountered", "1")
// ...
out, err := processLine(ctx, line)
if err != nil {
stats.Record(ctx, MErrors.M(1))
return err
}
// ...
ErrorCountView = &view.View{
Name: "demo/errors",
Measure: MErrors,
Description: "The number of errors encountered",
Aggregation: view.Count(),
}
I'm not sure how to do metrics with OpenTracing (didn't find any docs on http://opentracing.io). As I understand it, the API seems tied to tracing through and through. Maybe provided through third party libs (e.g. https://github.com/jaegertracing/jaeger-client-go#metrics--monitoring)?
The setup for both libraries is quite different. See the quickstarts above for explanations.
Two questions:
โค๏ธ
Iโm looking forward to seeing where this goes.
OpenCensus can export to AWS X-Ray See https://github.com/census-ecosystem/opencensus-go-exporter-aws, https://opencensus.io/faq/#what-exporters-does-opencensus-support, and https://opencensus.io/supported-exporters/go/xray/.
I don't see an exporter from OpenTracing to X-Ray (could be missing it).
I don't see any integrations by the AWS SDK directly with OpenCensus or OpenTracing.
@marwan-at-work, AWS primarily and only recommends their proprietary tracing libraries at this point.
How is it with associating logs with spans in openCensus
You can access the span from the current context and then https://godoc.org/go.opencensus.io/trace#Span.SpanContext and use it when logging.
@tbpg for 2 that sounds logical to me. In that sense then, we should trace the following operations:
go get and go list )@rakyll thanks for the link. It should be straightforward then to have our Error struct include a trace.Span field so that when we call logger.SystemErr(err), the logged json output will include the span and request ids.
I guess my question is, are there other scenarios when we'd want to log a span object? Or is having traces good enough for cloud monitoring/alerting tools to keep us in check?
PS. big fan of your work, happy to see you here!
You generally don't want to both log the traces if you are collecting them at a tracing backend to avoid duplication but it is up to you. A span contains more than just the identifiers such as annotations and attributes. If Error struct adds a trace.SpanContext field, it will be good enough to cover the correlation case.
Thanks much for the kind words!
That's great. Keeping tracing and logging separated makes sense to me.
If there are errors of the "context timed out" kind, including a trace id could be helpful or we could have potentially already gotten to such errors through our cloud tracing tools once they are set up correctly.
You can also annotate the interesting events as a part of the trace if you think it will help debugging when reading traces. Annotations are little log lines that fits traces when debugging better than the regular log. See the following:
https://godoc.org/go.opencensus.io/trace#Span.Annotate
https://godoc.org/go.opencensus.io/trace#Span.Annotatef
I've got a vote for OpenCensus. OpenTracing leaves a lot of ambiguity to the implementation.
In terms of tracing, I do not see a lot of difference between OpenTracing and OpenCensus. However, I like OpenCensus better because it has metrics included. To add a new metric all we really have to do is push a commit and start tracking it within the code repository rather than relying on an external system.
For the backend exporters, my vote would be Prometheus for metrics and Jaeger / Zipkin for tracing simply because they are opensource as opposed to Stackdriver or X-Ray
@rakyll you're definitely a subject matter expert here. would you be willing to advise and help us with this?
I would be definitely more than happy to help and contribute.
@rakyll happy to hear it. If you have any questions, Athens maintainers are usually available on the #athens channel on the Gophers slack. We also have a weekly meeting every Thursday 11AM PST on Zoom to have a more face-to-face time to discuss any topic. And Github Issues is also a valid place for things. Pick whichever medium you prefer ๐
Reading this issue, it seems that no one has any objections to switch to Open Census.
I think we should stick to initially tracing just the following calls like I mentioned above:
This should be enough info for the initial implementation unless anyone has further suggestions.
That sounds like an awesome plan to start @marwan-at-work
@rakyll thanks!! Shout when you want reviews too ๐
On Mon, Aug 6, 2018 at 12:09 Marwan Sulaiman notifications@github.com
wrote:
@rakyll https://github.com/rakyll happy to hear it. If you have any
questions, Athens maintainers are usually available on the #athens
channel on the Gophers slack. We also have a weekly meeting every Thursday
11AM PST on Zoom to have a more face-to-face time to discuss any topic. And
Github Issues is also a valid place for things. Pick whichever medium you
prefer ๐Reading this issue, it seems that no one has any objections to switch to
Open Census.I think we should stick to initially tracing just the following calls like
I mentioned above:
- Calls from Proxy -> Registry (we don't have those calls in place
yet so, skip)- Calls to backend.Storage (ready to trace)
- Calls to download.Protocol (go get and go list ) (ready to trace)
This should be enough info for the initial implementation unless anyone
has further suggestions.โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/gomods/athens/issues/392#issuecomment-410820033, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEU0amMp4_L2qWO0s2eMDb4comNwhpuks5uOJRygaJpZM4VuVJg
.
@marwan-at-work, I will try to dial on Thursday! I am in Germany this week and cannot gurantee if it is not going to conflict with other stuff but fingers crossed :)
@arschles I am open to reviews anytime!!! Haha ๐๐๐
Most helpful comment
@marwan-at-work, I will try to dial on Thursday! I am in Germany this week and cannot gurantee if it is not going to conflict with other stuff but fingers crossed :)
@arschles I am open to reviews anytime!!! Haha ๐๐๐