Envoy: Add support for sending data in Zipkin v2 format

Created on 24 Oct 2018  Ā·  27Comments  Ā·  Source: envoyproxy/envoy

Title: Add support for sending data in Zipkin format

Description:
It would be really helpful to support sending data in zipkin v2 json for reasons of efficiency and reduced tech debt in trace data pipelines. This could go alongside the existing functionality. While proto3 is even more efficient, it could be optional as most don't use that.


Zipkin's data format has been historically criticized for its heft. Through community effort, last August 2017 we formalized a compact v2 json, accepted on all transports including http, kafka, rabbit, etc. Later, we introduced a proto3 encoding of the same, also accepted on all transports.

Between last August and now, this format has become the primary and preferred format, especially to those trying to work with the data. For example, at a recent meeting we've found that envoy is a key piece of the network that still emits the v1 format, requiring "rosetta stone" style proxies.

This is fine in pure zipkin installs as the server reads all historical formats, but is limiting for those using different pipelines. A switch to v2 would also work with zipkin clones such as jaeger who also support that format.

original spec for zipkin and envoy
json v2 encoding
proto3 encoding
trace data pipeline meeting
jaeger's v2 support

enhancement help wanted

Most helpful comment

OK, I am happy to help the community.

All 27 comments

@zyfjeff wondering if you have some time to help with this. It would be really appreciated by the community, especially those who write their own proxies.

sorry to nag you @zyfjeff but you've done the only major work on zipkin side recently. there are few people with c++ experience in the ecosystem.. would others be able to help you with something you need in order to clear time to help them? For example, I recently help with alibaba dubbo. I think others may be able to help you'all some how to make mutual benefit. I fear no one will work on this unless someone like you does.

@adriancole
Sorry, I have been doing Envoy's secondary development for the company recently. I will take some time this month to familiarize myself with the Zipkin v2 format. I am willing to help the community to complete this issue, but I can't guarantee that it will be completed soon.

@mattklein123 please assign to me

@zyfjeff thanks for even considering this.. starting is the first step in finishing (or so my span says)

if any questions ask on https://gitter.im/openzipkin/zipkin I've alerted the team to watch for you!

OK, I am happy to help the community.

@zyfjeff If you can help get me started on where I would want to change I might be able to start the ball rolling on this. I've got a working dev environment from my small contribution about a year ago, so I only have to update that.

@devinsba I can help you.

@devinsba Are you still continue to work? I recently ready to solve this problem

@adriancole
If the v2 format is supported, how is the format of v1 handled? Or is it made configurable?

If you have the bandwidth go for it. I was going to have to squeeze it into downtime during the holidays.

Any progress on this issue? @zyfjeff

If the v2 format is supported, how is the format of v1 handled? Or is it made configurable?

@zyfjeff if you are asking about Zipkin Server: v1 write endpoints are still available on the server, in addition to v2 read/write endpoints. If you mean tracers, they typically include a configuration option for the endpoint and encoding to send spans.

@Dudi119 any chance you can help with this? everyone in the ecosystem including zipkin clones are affected by lack of v2 support and it seems the project maintainers are still not doing this on own

ps the next version of zipkin (2.13) will also have a grpc endpoint with exact same proto3 message as http post /api/v2/spans

https://github.com/apache/incubator-zipkin-api/blob/master/zipkin.proto#L224

that said I think most clones likely will want the json endpoint (not that it is mutually exclusive)

Another option may be https://github.com/envoyproxy/envoy/pull/5387 - as I believe there is a zipkin v2 exporter. Looks pretty close to being merged.

If I were a maintainer of envoy, I would not merge that request due to the upcoming merger of census and OT, I would let that settle and then bring in the merged c++ client. But that is just my 2p

@adriancole sure, will be happy to help.

We've found issues with the OpenCensus and OT tracers, as they do not add the B3 propagation headers to the request sent upstream. I think that supporting the Zipkin v2 format with the existing Zipkin tracer should not be that difficult?? If someone can point me at the files I would need to change, maybe I can make a PR for it.

my guess is someone could pickup where these last stalled due to people's lives getting too busy. I know @basvanbeek was interested in this, too https://github.com/envoy-zipkin/envoy/pull/3

Yeah. Sorry for this. @cetanu let me prepare some playground for it and let you know. I’ll sync with @basvanbeek as well.

So, this is effectively overcome if you use the opencensus driver, which will export via zipkin v2, like this:

EDIT: "effectively overcome" is an overly-ambitious qualifier. This example is only relevant if you want to implement tracing via opencensus, and at the time of writing, this is the only way to ship off _any_ spans from envoy in zipkin v2 format.

tracing:
  http:
    name: envoy.tracers.opencensus
    typed_config:
      "@type": type.googleapis.com/envoy.config.trace.v2.OpenCensusConfig
      zipkin_exporter_enabled: true
      zipkin_url: http://127.0.0.1:19000/trace
      zipkin_service_name: myservice
      outgoing_trace_context: [ "TRACE_CONTEXT", "GRPC_TRACE_BIN" ]

stdout_exporter_enabled: true is also useful here for debugging.

My configuration is such that I also require adding an API key to the trace server, so I feed it through envoy as a listener (I got this from another gh issue, but thought it'd be handy here too):

static_resources:
  listeners:
  - name: trace
    address:
      socket_address: { address: 0.0.0.0, port_value: 19000 }
    filter_chains:
      filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          stat_prefix: zipkin_http
          route_config:
            name: local_route
            request_headers_to_add:
            - header: { key: "Api-Key", value: "MYAPIKEY" }
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match: { path: "/trace" }
                route: { auth_host_rewrite: true, cluster: "zipkin_outbound" }
          http_filters:
          - name: envoy.router
  clusters:
    - name: zipkin_inbound
      connect_timeout: 1s
      type: STATIC
      http_protocol_options: {}
      load_assignment:
        cluster_name: zipkin_inbound
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: 127.0.0.1
                  port_value: 19000
    - name: zipkin_outbound
      connect_timeout: 1s
      type: LOGICAL_DNS
      lb_policy: ROUND_ROBIN
      dns_lookup_family: V4_ONLY
      respect_dns_ttl: true
      http_protocol_options: {}
      tls_context: {}
      load_assignment:
        cluster_name: zipkin_outbound
        endpoints:
        - lb_endpoints:
          - endpoint:
              address:
                socket_address:
                  address: ACTUAL_ZIPKIN_SERVER.com
                  port_value: 443

The added benefit here is that if this is acting as a sidecar, the actual upstream server can also use this exposed listener and not have to be concerned with the api key or other configuration.

This is sortof an advertisement for something else. notice the example
also suggests to use TRACE_CONTEXT and not B3 which would definitely
break almost everyone's propagation setup.

While it is a shame folks have been unable to muster writing json, it
seems a bit overkill to suggest replacing the entire component as a
solution.

On Wed, Aug 14, 2019 at 11:04 PM Matt Bailey notifications@github.com wrote:
>

So, this is effectively overcome if you use the opencensus driver, which will export via zipkin v2, like this:

tracing:
http:
name: envoy.tracers.opencensus
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v2.OpenCensusConfig
zipkin_exporter_enabled: true
zipkin_url: http://127.0.0.1:19000/trace
zipkin_service_name: myservice
outgoing_trace_context: [ "TRACE_CONTEXT", "GRPC_TRACE_BIN" ]

My configuration is such that I also require adding an API key to the trace server, so I feed it through envoy as a listener:

static_resources:
listeners:

  • name: trace
    address:
    socket_address: { address: 0.0.0.0, port_value: 19000 }
    filter_chains:
    filters:

    • name: envoy.http_connection_manager

      typed_config:

      "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager

      stat_prefix: zipkin_http

      route_config:

      name: local_route

      request_headers_to_add:

      - header: { key: "Api-Key", value: "MYAPIKEY" }

      virtual_hosts:

      - name: local_service

      domains: ["*"]

      routes:

      - match: { path: "/trace" }

      route: { auth_host_rewrite: true, cluster: "zipkin_outbound" }

      http_filters:



      • name: envoy.router


        clusters:





        • name: zipkin_inbound



          connect_timeout: 1s



          type: STATIC



          http_protocol_options: {}



          load_assignment:



          cluster_name: zipkin_inbound



          endpoints:



        • lb_endpoints:





      • endpoint:


        address:


        socket_address:


        address: 127.0.0.1


        port_value: 19000





        • name: zipkin_outbound



          connect_timeout: 1s



          type: LOGICAL_DNS



          lb_policy: ROUND_ROBIN



          dns_lookup_family: V4_ONLY



          respect_dns_ttl: true



          http_protocol_options: {}



          tls_context: {}



          load_assignment:



          cluster_name: zipkin_outbound



          endpoints:



        • lb_endpoints:





      • endpoint:


        address:


        socket_address:


        address: ACTUAL_ZIPKIN_SERVER.com


        port_value: 443



The added benefit here is that if this is acting as a sidecar, the actual upstream server can also use this exposed listener and not have to be concerned with the api key or other configuration.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Uh I’m not sure what you mean. I’m not ā€œadvertisingā€ anything. I have no existing trace setup, but a constraint on reporting to a zipkin v2 endpoint. I thought it might be useful to someone in a similar situation.

I called the "use this instead thing instead" approach advertising, but I can see how that's kindof a distracting term for the technique. It is good that there are parts of the codebase people are willing to maintain, and it is helpful to know they exist.

The "use this thing instead" approach is quite a huge hammer considering the small amount of effort this needs. If it weren't cpp, I think this would have been done ages ago.

One other problem I had with the OpenCensus tracer is that it doesn't support 64 bit trace ids, whereas I believe the current zipkin tracer does allow this to be set via a boolean field.

I am hopeful that this will remain an option if changes are made to support the Zipkin v2 format via the original, or new, tracer.

Also, I don't want to just be a burden and push for this change, If I can help in any way I can be set on a small task. I am very inexperienced with cpp.

The above pattern with processing via another listener opens some doors for me which I may use in future, but it wouldn't address allow me to modify the trace id, especially since envoy would have already sent propagation headers to the upstream prior to emitting a trace to the tracing cluster.

@adriancole you're right, I should have qualified my example with more specific caveats about what it's appropriate for. I dropped it in this issue mostly because this is the issue I found when I came across the problem.

@cetanu if you want to contribute to the current PR, I'll add you to the https://github.com/envoy-zipkin/envoy repo.

Was this page helpful?
0 / 5 - 0 ratings