opentelemetry-specification 🚀 - Trace/Span ids base64 encoded when used in proto-JSON representation

In theory, I would love for W3C to use base64 instead of hex. After all it's shorter and at some cloud vendor setups, I've heard, you are paying per byte you send/receive (e.g. if the HTTP request is ultimately serialized to a messaging system).

Oberon00 on 16 Mar 2020

I would love for W3C to use base64 instead of hex.

@Oberon00 I don't think changing W3C is feasible at this stage. For history - base64 vs hex was heavily debated. Unfortunately this debate hasn't made it to the rationale, but as overview - we strived for human-readability and consistency with prior art.

@bogdandrutu what's the pros/cons of each option?

Change trace/span id to string hex representation;

This is 4x size and CPU on serialization/deserialization, correct? Also potential abuse with other characters. Any other issues?

Define a different json schema for otlp HTTP;

Have you looked at the whole schema? Any other issues with JSON to justify the custom schema?

Try to push to protobuf support custom encoding for bytes fields;

I have no context on this. Can you comment on pros/cons?

SergeyKanzhelev on 17 Mar 2020

Change trace/span id to string hex representation;

I don't think we should do this and slow-down / increase wire size for all other languages.

tigrannajaryan on 18 Mar 2020

@tigrannajaryan to assess the second option - can you generate an example of a JSON from proto? So we can review for other potential issues?

Can you please also comment on option 3 viability?

SergeyKanzhelev on 18 Mar 2020

@SergeyKanzhelev unfortunately I don't know how 2 and 3 done, I have no experience with that. It is best that someone who knows what they are doing works on this.

tigrannajaryan on 19 Mar 2020

@SergeyKanzhelev @tigrannajaryan This is an example of a JSON from proto.

{
    "resource_spans": [
        {
            "instrumentation_library_spans": [
                {
                    "spans": [
                        {
                            "trace_id": "kDMI7LTxLxTj220awNARJw==",
                            "span_id": "9ir6veJ4Hdw=",
                            "parent_span_id": "bgnsqqPvjYQ=",
                            "name": "Sample-8",
                            "kind": 1,
                            "start_time_unix_nano": 1588334156464409000,
                            "end_time_unix_nano": 1588334156470454639,
                            "attributes": [
                                {
                                    "key": "attr",
                                    "string_value": "value3"
                                },
                                {
                                    "key": "attr3",
                                    "string_value": "value4"
                                }
                            ],
                            "events": [
                                {
                                    "time_unix_nano": 464430000,
                                    "name": "event1",
                                    "attributes": [
                                        {
                                            "key": "key1",
                                            "type": 3,
                                            "bool_value": true
                                        }
                                    ]
                                },
                                {
                                    "time_unix_nano": 464438000,
                                    "name": "event2",
                                    "attributes": [
                                        {
                                            "key": "key2",
                                            "string_value": "value2"
                                        }
                                    ]
                                }
                            ],
                            "status": {
                                "message": "ok"
                            }
                        }
                    ]
                }
            ]
        }
    ]
}

In spec https://github.com/open-telemetry/oteps/blob/master/text/0099-otlp-http.md

Does it only support application/x-protobuf, not application/json?

tensorchen on 1 May 2020

@tensorchen Correct, OTLP/HTTP currently defines only "application/x-protobuf". If there is a desire to send JSON payload that needs to be explored and added as an extension to the protocol. I am not sure what exactly are the implications of that so that will require some investigation.

tigrannajaryan on 1 May 2020

@bogdandrutu @tigrannajaryan What's the current state of HTTP/JSON support in OTLP?
Can we start working on supporting it on the client (SDK) side, or should we wait until this issue is resolved?

nilebox on 14 May 2020

Would not start to support yet HTTP/JSON as another protocol, we currently have gRPC/Protobuf, gRPC over HTTP (with the limitation of the trace/span id), and HTTP/Protobuf.

bogdandrutu on 18 May 2020

Raised https://github.com/open-telemetry/opentelemetry-collector/issues/1177 to disable support for JSON encoding in the Collector until we decide to officially support it.

Shall we also remove the Swagger code generation in https://github.com/open-telemetry/opentelemetry-proto/blob/e43e1abc40428a6ee98e3bfd79bec1dfa2ed18cd/makefile#L50-L55 ?

nilebox on 24 Jun 2020

Since Collector is now using gogo/protobuf, we may configure protos to generate trace/span IDs with a custom type via https://github.com/gogo/protobuf/blob/master/custom_types.md

It shouldn't lead to any issues with binary wire compatibility, but the solution is specific to Go, so SDKs for other languages won't benefit from this.

nilebox on 25 Jun 2020

I am going to assign this to myself to have a look. My preliminary feeling is that since W3C Trace context uses hex then we should aim to do the same (as @SergeyKanzhelev rightfully points out it is too late to change W3C Trace context). I do not know how feasible it is to change Protobuf's to use hex encoding instead of base64, I will look into this.

tigrannajaryan on 11 Aug 2020

Traceid/spanid are also part of the Trace protocol which we declared stable. I think it is OK that we make this change (should we decide that it is desirable and doable) and we are not actually breaking our promise for Trace protocol stability which is only about binary Protobuf encoding and not JSON encoding which was introduced very recently (https://github.com/open-telemetry/oteps/pull/122) and where this issue was not debated at all.

tigrannajaryan on 11 Aug 2020

So a few findings:

It is technically possible to change the encoding of traceid/spanid to hex instead of base64 if we use Gogoproto Prototobuf library. It means it is doable in the Collector (I borrowed some code from Jaeger to do quick proof-of-concept).
The official Go Prototobuf library does not seem to have any capability to customize JSON encoding, so everyone who uses official Protobuf libraries (and likely not just Go, but other languages as well) will not be able to use OTLP/ HTTP JSON.
OpenTelemetry Javascript library appears to use manual encoding for traceid/spanid and supposedly it should be possible to modify it to use hex. @open-telemetry/javascript-approvers can you please confirm my understanding?

I believe Javascript SDK and the Collector are the 2 most important places where we want OTLP/HTTP JSON to be supported (in Javascript SDK because it is very natural to use JSON there and in the Collector because we have to since Javascript SDK can send to the Collector).

Now, the question is, do we care about other language SDK's? Our recommended encoding should be binary Protobufs for all languages. I do not see why any OpenTelemetry SDK exporter would want to use OTLP/HTTP JSON.

I am on the fence on this one, so would like more opinions. Do people think that OTLP/HTTP JSON needs to be universally supported across all OpenTelemetry or we can keep it as a niche implementation for the web / JavaScript (which grpc-gateway really is) and not even offer any implementations for the rest of the languages?

If the answer is that we want to support OTLP/HTTP JSON for all languages then I think we are stuck with standard base64 encoding because it is too much burden for every language SDK to figure out how to customize the encoding (and there may not always be a reasonable way to do it).

Thoughts?

tigrannajaryan on 12 Aug 2020

A text encoding of any protocol is very useful I think. One common use case is integration tests - storing payloads as text is much easier to version control than binary. And a test program could be any language. I have heard of other use cases where even in Java-based data pipelines they prefer to stick to JSON, I think since the pipelines have built in features that operate on it. So I definitely wouldn't recommend delegating text encodings to only the web.

Base64 in text protos would be a little strange but it's not the end of the world. But the best user experience would be a custom schema like opentracing, with the huge maintenance work that comes with it indeed, but to optimize for ux, I think that still needs to be on the table since I think ux always needs to be prioritized over maintenances cost.

anuraaga on 12 Aug 2020

@anuraaga good point regarding test data.

I would like to know what the maintainers of various OpenTelemetry SDKs think regarding how complicated it will be for them to support custom (hex) encoding.

tigrannajaryan on 12 Aug 2020

JS supports all 3 different cases for node: grpc, proto over http and json over http, and for web we have json over http. When using json we send base64 for trace id and span id, all works fine.
For generating grpc and proto over http we are using libraries that converts that for us based on proto files - only json converts hex to base64. So in theory I would assume that those libraries will take care of that, if not then we will have to figure out if that is possible and how to.
Mentioned libraries are: for grcp @grpc/proto-loader and for proto/json protobufjs

obecny on 12 Aug 2020

I believe this would be quite easy for Java.

jkwatson on 12 Aug 2020

I would like to know what the maintainers of various OpenTelemetry SDKs think regarding how complicated it will be for them to support custom (hex) encoding.

One way to phrase this is that we wouldn't use protobuf+json encoding anymore, it would be a custom OTel encoding protobuf+oteljson. This seems like a slippery slope, since at that point we may as well fully go into defining a custom JSON schema not tied to protobuf?

I believe this would be quite easy for Java.

We wouldn't be able to use protobuf anymore, I think, the base64 encoding is hard-coded here

https://github.com/protocolbuffers/protobuf/blob/master/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L1237

We can write a custom marshaller, but at that point, I'd suggest not looking at proto at all at that point, for example attributes can be encoded in a more idiomatic way using a JSON object instead of the list of attributes.

anuraaga on 13 Aug 2020

Fair enough. Although, if we were to change the types to String, then it would be quite easy, yes?

jkwatson on 13 Aug 2020

Although, if we were to change the types to String, then it would be quite easy, yes?

Change the types to string in the proto? You know I'm always happy with that :P

anuraaga on 13 Aug 2020

😄1

Although, if we were to change the types to String, then it would be quite easy, yes?

Change the types to string in the proto? You know I'm always happy with that :P

I think that will be a breaking change, so we can't do it, Traces are already declared stable.

tigrannajaryan on 13 Aug 2020

One way to phrase this is that we wouldn't use protobuf+json encoding anymore, it would be a custom OTel encoding protobuf+oteljson. This seems like a slippery slope, since at that point we may as well fully go into defining a custom JSON schema not tied to protobuf?

Correct, that's exactly what it would be. Do note that gRPC Gateway does _not_ use "protobuf+json" Content-Type, it uses "application/json" so I do not think there is a strict expectation that the payload will be exactly following the Proto3 JSON mapping.

tigrannajaryan on 13 Aug 2020

I would like to poll all other language SIGs to weight in regarding the feasibility of implementation. If any of the SIGs says this is not doable or very hard to do that I am inclined to close this as "wont do".

We have the opinions from JS and Java SIGs.

Other language SDK approvers, please tell if you are able to change encoding of TraceID and SpanID fields in the JSON encoding from base64 to hex. Pinging approvers: @open-telemetry/dotnet-approvers @open-telemetry/php-approvers @open-telemetry/rust-approvers @open-telemetry/go-approvers @open-telemetry/cpp-approvers @open-telemetry/ruby-approvers

Dear approvers, I apologize for the noise, but I'd hate to make a decision here which is then an unnecessary burden for you or creates an impossible requirement for you. One voice from each language SIG is highly appreciated. (I hope I didn't miss a language).

tigrannajaryan on 14 Aug 2020

from base64 to hex ? it was opposite ~you can't have hex in json~

obecny on 14 Aug 2020

@obecny let me try to clarify.

By default Protobuf's JSON encoding marshals byte arrays in base64 encoding. Trace id and span id fields are declared as byte arrays in OTLP. Here is for example a Span in Protobuf-JSON encoding:

{
    "trace_id": "W47/95gDgQPSabYzgT/GDA==",
    "span_id": "7uGbfsPBsXM=",
    "name": "testSpan",
    "start_time_unix_nano": 1544712660000000000,
    "end_time_unix_nano": 1544712661000000000,
    "attributes": [
    {
        "key": "attr1",
        "value": { "intValue": 55 }
    }
    ]
}

The trace id is base64 encoded "W47/95gDgQPSabYzgT/GDA==" and represents a byte array of 0x5B, 0x8E, 0xFF, 0xF7, 0x98, 0x3, 0x81, 0x3, 0xD2, 0x69, 0xB6, 0x33, 0x81, 0x3F, 0xC6, 0xC.

This is the current encoding used by OTLP and it is defined by an OTEP: https://github.com/open-telemetry/oteps/blob/master/text/0122-otlp-http-json.md#json-mapping

We are discussing now the possibility to change this behavior and instead use hex encoding for trace id and span id. For example the same span, but with trace id encoded in hex would look like this (span id is unchanged):

{
    "trace_id": "5B8EFFF798038103D269B633813FC60C",
    "span_id": "7uGbfsPBsXM=",
    "name": "testSpan",
    "start_time_unix_nano": 1544712660000000000,
    "end_time_unix_nano": 1544712661000000000,
    "attributes": [
    {
        "key": "attr1",
        "value": { "intValue": 55 }
    }
    ]
}

My question to language SDK authors is: are you able to do this if we standardize this behavior in the protocol specification?

I verified that it is possible to do in the Collector because it is possible to use custom marshaling with Gogoproto (the Protobuf library we use), but I don't know if the same customization is possible for other languages.

tigrannajaryan on 15 Aug 2020

@tigrannajaryan thx for clarifying, so in js it would be very easy as currently we are converting hex to base64 in collector exporter - we can drop it easily.

obecny on 15 Aug 2020

👍2

In JS we also store our trace and span ids as hex encoded strings internally. This means that the exporter has to convert it to a byte array, then re-encode it as base64, which is unnecessarily expensive. Dropping the base64 requirement would actually be a performance benefit to us, and I expect we are likely not the only ones. That said, there are other ways to mitigate this like storing byte arrays internally, which we had already discussed in the past.

FWIW, having the OTLP protocol use a different encoding than the W3C Trace Context HTTP header spec doesn't strike me as necessarily a problem. The only advantage of having them be the same is to make it easier for humans to recognize ids, but is the OTLP JSON representation ever likely to be logged directly? It seems to me that the first thing the collector would do is parse the trace ids into its own internal representation so any logging of ids could easily be in hex regardless of the wire format.
Additionally, the W3C spec is specifically for http headers, but there is no restriction that the ids will always be hex. For instance, the W3C is working on a binary protocol which may represent the ids in some other form.

dyladan on 17 Aug 2020

We don't have any implementation for protobuf in the PHP library and we already store Trace and Span Ids as hex-encoded strings, so I can't see any issue on our side.

beniamin on 17 Aug 2020

👍2

So I poked around a bit trying to see how people take an instance of a protobuf C# type and turn it into json. There is a built-in, unoptimized reflection-based JsonFormatter class supplied with the Google Protofbuf library that does it, but there is no way to extend/intercept certain types and change the json format. I think that changing the default encoding to hex would require some amount of manual custom serialization in the .net sdk if it were to use json at all. That said, I don't see anything in the codebase that is actually serializing to json in the first place. Perhaps somebody might write a test that used a json serialized instance for validation but it hasn't been done yet. I can't really see the need for a json encoding at all for .net, but I could be wrong.

If this is purely for human readability I'm in agreement with @dyladan I would say don't bother.

pcwiese on 17 Aug 2020

The only advantage of having them be the same is to make it easier for humans to recognize ids, but is the OTLP JSON representation ever likely to be logged directly?

Yes, we have at least 2 exporters in the Collector where spans are logged in a text form (logexporter and fileexporter). Both are intended to be used for troubleshooting. logexporter uses a completely custom output (so we can output traceid/spanid in any encoding we choose to), fileexporter uses OTLP JSON format (so this issue directly affects it).

I believe it is important to show traceid/spanid using similar formatting/encoding in different places (as much as possible) to simplify debugging and troubleshooting. Hex is likely the easiest to display everywhere. Even in languages which store traceid/spanid in-memory as byte arrays it is very likely to have a debugger/IDE that shows these byte arrays in hex form than to have one that shows them in base64 (in fact I know several debuggers which can easily show hex representation of in-memory binary data and I don't know any that can show base64 out-of-the box).

So this is for human readability and I think it matters. It can make developers' life easier.

traceid/spanid will also be directly used in log files in the future when logs are supported in Otel and the spec already says that they need to be hex-encoded: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/logs/overview.md#trace-context-in-legacy-formats
So, we will either have to modify this logging spec to use base64 or will have to be inconsistent.

I think we have sufficient evidence that hex encoding is desirable. It is not an absolute requirement, so if it creates extreme burden then we can give it up, but otherwise I think we should pursue it.

tigrannajaryan on 17 Aug 2020

So I poked around a bit trying to see how people take an instance of a protobuf C# type and turn it into json. There is a built-in, unoptimized reflection-based JsonFormatter class supplied with the Google Protofbuf library that does it, but there is no way to extend/intercept certain types and change the json format. I think that changing the default encoding to hex would require some amount of manual custom serialization in the .net sdk if it were to use json at all.

@pcwiese we can also implement the JSON output fully manually without relying on Protobuf's built-in formatter. This is certainly doable, it is just significantly more work (but likely not a ton of work: probably a few days, at most a couple weeks of work with tests and everything).

That said, I don't see anything in the codebase that is actually serializing to json in the first place. Perhaps somebody might write a test that used a json serialized instance for validation but it hasn't been done yet. I can't really see the need for a json encoding at all for .net, but I could be wrong.

I agree, perhaps not all language SDKs even need to output OTLP JSON.

tigrannajaryan on 17 Aug 2020

Since I left such a long comment, I'll bottom line it here. I support this because JS already stores these as hex strings anyways. Seems like PHP is the same.

dyladan on 18 Aug 2020

👍2

My preference would be that we keep the JSON representation compatible with the proto representation, ideally without the need for custom marshalling. It looks like this would also create a compatibility issue between current implementations of OTLP JSON and newer versions of the collector (or vice versa). For example, an id encoded as a hex string is techically valid base64, but when decoding it will mangle the id in the process. This would happen silently and be frustrating to deal with during the transition.

mwear on 19 Aug 2020

👍1

It looks like this would also create a compatibility issue between current implementations of OTLP JSON and newer versions of the collector (or vice versa).

This is true, but at this stage we do not care about the compatibility yet. OTLP JSON was added very recently and is not part of our compatibility guarantees. I believe we are still free to change it.

I am not aware of anyone who uses OTLP JSON in production or any production-like setting.

tigrannajaryan on 19 Aug 2020

The Javascript project has had OTLP/JSON support since April for Browser (see: https://github.com/open-telemetry/opentelemetry-js/pull/901) and late June for Node (see: https://github.com/open-telemetry/opentelemetry-js/pull/1247). There's a possibility that people could be using this in a production-like setting given the amount of time it’s been available. Would there be a way to version this change, or somehow communicate the id format of payload?

mwear on 19 Aug 2020

@mwear Sorry, I was not clear. I think the existence of implementation is not a sufficient reason to assume stability guarantees (it never was in OpenTelemetry, we always explicitly specify the guarantees). OTLP/JSON was proposed as an OTEP122 and merged very recently (July 7, 2020). OTEP122 which standardizes OTLP/JSON was merged _after_ we updated protocol maturity matrix the last time in this repo (arguably our omission here is that we could explicitly add a line item to the maturity matrix to make it clear OTLP/JSON is Alpha). Either way I do not think we can assume OTLP/JSON is a stable protocol, we are imposing unnecessary restriction on ourselves by that.

We can probably differentiate on the receiving side by looking at the length (assuming 16 byte traceid it is easy to tell if it is hex or base64 if there is a big desire to maintain backward compatibility, but it will only be one-way compatibility: old client -> new server, but not new client -> old server).

tigrannajaryan on 19 Aug 2020

I submitted a PR to make sure there is clear visibility about maturity of OTLP/JSON: https://github.com/open-telemetry/opentelemetry-proto/pull/204

tigrannajaryan on 19 Aug 2020

@mwear just to be clear, if there is evidence OTLP/JSON is already used by some people in production I would be willing to spend some reasonable effort not to break their production (despite not previously giving any promises of compatibility about OTLP/JSON). If we make Collector's receiver able to handle both hex and base64 that should likely help (they will just need to make sure they don't update to newer versions of SDK before updating to newer version of Collector).

tigrannajaryan on 19 Aug 2020

From rust perspective, JSON serialization is still work in progress for gRPC in general so we won't be supporting this initially. Once support is finalized we will know more about customizing serialization, but I would assume that it will be possible.

jtescher on 20 Aug 2020

👍1

@tigrannajaryan Thanks for hearing out my concerns from the JS side. From a Ruby perspective, we don't currently support OTLP / JSON. If and when we add it, we would likely do so by creating a package that does not have a protobuf dependency, and would have handle json encoding ourselves. We'd be able handle translations of ids in either format.

mwear on 20 Aug 2020

Let me try to summarize where we are.

Based on comments from language SIGs it looks _feasible_ to make this change and use hex encoded traceid and spanid in OTLP JSON.
It also appears to be _desirable_ for various reasons mentioned above in this thread.
There are concerns that this is a breaking change and there are existing implementations already. However, we are willing to make this breaking change since we have not yet promised any stability guarantees for OTLP JSON.

Given the above I suggest that we go ahead and accept this change in the spec and then make corresponding changes in the implementations.

I understand that we probably have not reached full consensus on this, but I believe we need to make a call and this is the best we can do with the information we have so far. If there are objections to this decision I would like a TC member to tie-break it.

tigrannajaryan on 25 Aug 2020

One point missing from the summary is comparing it to the option of defining a JSON protocol independent of protobuf. My reading of the comments is, for SDKs where there is no protobuf dependency, the change is trivial since they already encode the JSON themselves. For SDKs with a protobuf dependency, most likely it'll have to be replaced with manual encoding of the JSON. In that case, can't we make the UX even nicer by making the protocol take advantage of JSON idioms, such as using a standard JSON map for attributes? It also makes more obvious that it's a different protocol to prevent issues with accidentally base64-decoding the hex field.

anuraaga on 28 Aug 2020

can't we make the UX even nicer by making the protocol take advantage of JSON idioms ...

That's significantly bigger work both from specification perspective and possibly also more work for implementations which use Protobufs today and which have to do more than just changing base64 to hex encoding.

I agree that you have a good point but given our GA timelines I am very reluctant about anything that creates more work at this stage.

such as using a standard JSON map for attributes

I am not quite sure we would want to do that. I would not want to introduce significantly different potential semantics in the protocol. The protocol today allows recording multiple attributes of the same name (although it does not specify what it means). Multiple attributes with the same name are not allowed in the API today but in the future it may become a valid option. If that happens then JSON encoding will no longer be able to record everything that binary Protobuf encoding can record. I would not want to corner ourselves into incompatible encodings that are not capable of expressing precisely equivalent data.

If we want to explore this further then I would suggest that we remove the JSON representation from the specification for now and do not make it part of the GA so that we have more time to figure out the right answers. I do not know if is acceptable to GA without JSON representation. Particularly the JavaScript implementation may need this to be part of GA (I am not sure).

tigrannajaryan on 28 Aug 2020

Agree that if OTLP json is targeting GA, it could be too big of a change - I interpreted the current alpha status of it as not for GA but probably misinterpreted :)

Not a strong statement, but if JSON support is needed for opentelemetry-js in browser, perhaps browser GA can come later? We're lacking strength in testing Android support and I believe language support for iOS, so those could probably be rolled into a nice milestone. Just food for thought since worried about locking in the protocol too soon.

anuraaga on 28 Aug 2020

This issue is labeled release:required-for-ga as of now. I am open to moving this post GA if this is not causing problems for others. In that case we will have more time to explore the options carefuly.

Not a strong statement, but if JSON support is needed for opentelemetry-js in browser, perhaps browser GA can come later?

@open-telemetry/javascript-approvers @open-telemetry/javascript-maintainers can you please chime in as to when do you think you will GA and how important is JSON encoding support for your GA?

tigrannajaryan on 28 Aug 2020

JSON encoding is very important to us because grpc/protobuf is a royal pain in the browser. Currently, JSON collector export is the _only_ supported export mechanism in the browser, although there is an open PR (https://github.com/open-telemetry/opentelemetry-js/pull/1399) to support zipkin in the browser as well. I would say we cannot make GA claims for the browser without it.

In terms of timeline, I would support making node and browser GA at the same time, because they share most of their code and I believe it would be confusing to users to have one GA and the other not. The JS tracing implementation has always tracked fairly close to the spec head so I think we would want to GA tracing fairly shortly after spec is finalized. Our major shortcomings actually have more to do with testing/docs than with the actual implementation.

dyladan on 28 Aug 2020

Thanks @dyladan that sounds like we should stick to the current plan of just replacing the IDs with hex. Out of curiosity, given JSON came from the browser maybe you have some extra insight, do you think the current JSON model is idiomatic enough for users to be comfortable with it?

anuraaga on 29 Aug 2020

Given JavaScript requirements and since there is no new evidence I suggest we move forward as I outlined above. One more time, the suggestion is that we go ahead and accept this change in the spec and then make corresponding changes in the implementations. I will mention this one more time in Maintainers meeting today and absent objections I will submit a PR that amends the spec accordingly.

tigrannajaryan on 31 Aug 2020

Opentelemetry-specification: Trace/Span ids base64 encoded when used in proto-JSON representation

Most helpful comment

All 49 comments

Related issues