We use Zipkin today but we have not looked into Jaeger deeply to make a fully informed decision. Look into both of these backends and replace Zipkin with Jaeger if appropriate.
FYI: There was a memory leak that prevent us looking at jaeger earlier in Knative's life
See: https://github.com/knative/serving/issues/415#issuecomment-391778851
Following up - it looks like the issues are resolved.
https://github.com/istio/istio/issues/5782
https://github.com/jaegertracing/jaeger/issues/842
One requirement that will be important for various scenarios is to provide the option to choose between jaeger/zipkin or some other tracing solution.
Additionally, another requirement or use case would be to secure the jaeger UI. Jaeger expects to have solution for secure access built outside of jaeger using an approach similiar to https://www.google.com/url?q=https%3A%2F%2Fmedium.com%2Fjaegertracing%2Fprotecting-jaeger-ui-with-an-oauth-sidecar-proxy-34205cca4bb1&sa=D&sntz=1&usg=AFQjCNEEbEcyTNx2AL3QHuGEK_b2dOxONw. The same approach can be integrated with Istio JWT and a pluggable identity provider. However, this would require deploying jaeger in a namespace other than istio-system since it will need the envoy sidecar to be injected. That would imply we should move away from hardcoding the zipkin address to make it configurable.
heh what a balanced assessment! carry on!
zipkin on the other hand already being used by many old solutions.
I am curious where does this statement comes from?
Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle stale
/remove-lifecycle stale
(Is this issue still relevant? Looking around, it seems like Jaeger integration is supported now)
Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle stale
Stale issues rot after 30 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle rotten.
Rotten issues close after an additional 30 days of inactivity.
If this issue is safe to close now please do so by adding the comment /close.
Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.
/lifecycle rotten
@mdemirhan things in the tracing space have moved AT LOT. With first having OpenTracing as an interface being available to allow plugging in different tracers (Zipkin, Jaeger, ...) there now is the OpenTelemetry project (https://opentelemetry.io/) which is OpenTracing and OpenCensus joining forces and merging all their efforts into a universal tracing framework - including collectors and tracers. Jaeger and Zipkin could then focus on the analytics and user frontends.
In either case - OpenTelemetry is already in BETA state and plans on making OpenTelemetry obsolete very soon. Also the Jaeger project is already heavily working on adding OTEL support.
Please kindly re-evalute the current approach for Knative - would be awesome to have OpenTelemetry metrics there as well.
/remove-lifecycle rotten
as mentioned otel is (barely) beta, and also the only things still mentioned here support the same ingest format emitted by this tool. Why not wait until things stabilize then make a proposal? There's no reason to rush into instability.
OT is in beta, also it is supposed to ingest data from zipkin or jaeger too. I don't think there is a need to introduce yet another unestable API.
@adriancole @jcchavezs to be quite blunt, you are missing my point. This should not be a "vs." discussion.
Certainly keeping Jaeger or Zipkin as an instrumentation of choice is cool. But the point about OpenTelemetry is exactly NOT to have yet another API (see: https://opentelemetry.io/about/) - but one that is stable and universal to save devs from messing up their code with multiple flavors of tracing. From reading your comment (https://github.com/openzipkin/zipkin/issues/2598#issuecomment-492930226) I take that you, @adriancole, do not seem to believe in this mission.
Jaeger (on the instrumentation as well as on the collection side of things) seems to be quite open to adapt OpenTelemetry: https://medium.com/jaegertracing/jaeger-and-opentelemetry-1846f701d9f2 (I refrain from linking to all the recent otel related PRs).
OpenTelemetry, joint forces with a lot of players in the tracing ecosystem and they aim to make tracing an XKCD 927 success story - https://fosdem.org/2020/schedule/event/beam_opentelemetry_xkcd_927_success_story/attachments/slides/3728/export/events/attachments/beam_opentelemetry_xkcd_927_success_story/slides/3728/OpenTelemetry_FOSDEM_2020.pdf
But in the end is boils down to people following a standardization movement. And isn't the non-uniform ecosystem of instrumentation APIs as well as context propagation the reason an issue like this is even open and discussed?
I'm not going to get into a paste link war. I don't believe in the execution, as evidenced by the way the previous two one apis played out, and how the current next one api is going. I don't believe that you can pick something that calls itself a standard and trust its execution blindly. This is probably a difference we have. My opinions are stronger because I've also had to clean up after the confusion and calamity of empty promises in this space for the last 5 years.
frankly speaking this is not "my" project anyway. People can ignore me like they did into the last two cul-de-sacs, and learn this lesson on their own. If the maintainers here desire to run into the fire of a <1.0 library and data format that routinely changes.. intentionally taking away the stability they have now.. in the name of standardization! they can already do that.
To save pain and suffering my advice remains.. let this group prove execution and stability, give it till the end of the year. This is just advice, this is not a project I maintain.
In ToC & Observability meetings, we decided to offload the responsibility of exporting data to monitoring & logging backends via OpenTelemetry. Going forward, Knative will no longer support Zipkin or Jaeger or any other monitoring solution as part of its release bundle. Instead, we will send all the data to OpenTelemetry agent, and users can then pick their backend of choice as part of their OpenTelemetry agent configuration. @evankanderson who is working on this.
I think more specifically, we won't be shipping any recommended or example configuration for collecting traces or metrics, and instead we'll rely on cluster configuration for that.
Most helpful comment
In ToC & Observability meetings, we decided to offload the responsibility of exporting data to monitoring & logging backends via OpenTelemetry. Going forward, Knative will no longer support Zipkin or Jaeger or any other monitoring solution as part of its release bundle. Instead, we will send all the data to OpenTelemetry agent, and users can then pick their backend of choice as part of their OpenTelemetry agent configuration. @evankanderson who is working on this.