Opentelemetry-specification: process.command_line accepts multiple value types

Created on 19 Aug 2020  路  17Comments  路  Source: open-telemetry/opentelemetry-specification

I noticed process.command_line accepts different types of values. I don't think we generally do this but was this intended? I would just use a string for it in all cases.

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/resource/semantic_conventions/process.md#process

semantic-conventions p3 question allowed-for-ga resource

Most helpful comment

@open-telemetry/technical-committee I think this issue should be allowed-for-ga. EDIT: I made a PR #1137 that would fix this.

All 17 comments

This was intentional and was discussed at https://github.com/open-telemetry/opentelemetry-specification/pull/635#discussion_r437573287. In some environments (Windows) a single string (space separated or however it was typed in the shell) is the raw form of this. In others (Linux) a NUL-separated string that is easily translated to a list is the raw form. Additionally, some runtimes support only obtaining the commandline as list of arguments, which would need to be escaped (according to which shell's syntax?) if you want them as single string.
If we only support a single string, we would additionally need an attribute that tells us the format the string is in.

Actually, a much nicer summary of the pros & cons is at https://github.com/open-telemetry/opentelemetry-specification/pull/635#discussion_r440062779 by @james-bebbington :

There is also concern around using two different data types to represent the same attribute value: https://github.com/open-telemetry/opentelemetry-specification/pull/635#discussion_r437573287. It might create additional complexity on back end systems to have to manage potentially different data types for the same attribute, though I'm not sure how much of a concern this would be? I guess parsing null-characters on the back end would likely be equivalently complex.

Note this isn't strictly ambiguous btw. If that was a command with no arguments, and you were looking at the Unix representation, it would have a null-character as the last character in the string. On Windows it would not. Admittedly that is not a great representational difference, and could be difficult to distinguish in some programming languages.

After thinking about this some more, I think I do prefer the list of strings approach, but I'm not sure I'm aware of all the pros/cons. For now I have changed it back to that, and updated the example for each OS accordingly. But for the record here's the three options considered and pros/cons:

  1. Join arguments with a string:
    Advantages: Most consistent representation across OSs.
    Disadvantages: Potentially lossy/non-trivial conversion on Unix - do we need to enquote parameters with spaces in them (and escape internal quotes), etc?
  2. Pass through string returned from OS as is (null-delimited on Unix):
    Advantages: No processing needed by code producing telemetry, can just provide what OS returns.
    Disadvantages: Back end needs to parse null-characters.
  3. Convert to list of strings (for Unix only):
    Advantages: Clean ("properly typed") representation.
    Disadvantages: Inconsistent data type for the same attribute between OSs, and back end needs to handle this differently.

Yeah I can definitely see the motivation for this. At the same time, instrumentation and backend does get quite a bit more complex because of the dual-type so I don't know if this will be easy to implement - generating code from the semantic spec yaml seems especially problematic.

@thisthat Can you check if your semantic spans work be able to cover this case? https://github.com/open-telemetry/opentelemetry-specification/pull/571

At the same time, instrumentation and backend does get quite a bit more complex because of the dual-type

Instrumentation gets simpler because it has a choice, and can do whatever is easier. Or am I missing something?

@thisthat Can you check if your semantic spans work be able to cover this case? #571

Currently the tool doesn't support this, but it can be extended. In fact we had the possibility of such union types in mind from the beginning.

I see - currently in Java we use an abstraction like StringAttributeSetter for operating on attributes. We don't yet use it for resource attributes but I think we should and probably will eventually. So we would need to introduce a clunky StringOrArrayAttributeSetter - it's not the end of the world, but this is what I meant by instrumentation getting more complex (probably only applies to typesafe language).

You can also have two constants for this semantic attribute, e.g. a StringAttributeSetter PROCESS_COMMANDLINE_STR and PROCESS_COMMANDLINE_LIST. Though I begin to see your point. We could instead have two alternative semantic attributes with two different names in the semantic conventions. It would be a bit clunky in the spec then though.

Anyways it's a very small point - I was surprised to see it but since it was already discussed it's fine with me. Thanks for clarifying it.

Actually let me keep this open. The spec text says either primitive or array

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/common/common.md#attributes

So I think we need to tweak that to more formalize the possibility of union types.

The actual Attribute will not have an union type, it will have either one or the other type. Only the semantic convention has the union type.

It took me a minute to parse that but I understand what you mean :) It's true, but I don't think it hurts to clarify this example explicitly. I think dual type convention is going to cause confusion when a newcomer reads and tries to understand so we can probably help. That newcomer this time was me, hence the issue filed ;)

I might miss something but, what is the reason for not using array as the only available type and in windows put it as a single element array?

@thisthat I think the opposite argument could also be made. Why not convert the array representation into a string in the backend, if it's so important to have a consistent type?

@Oberon00 wrote:

Instrumentation gets simpler because it has a choice, and can do whatever is easier. Or am I missing something?

and I agree. I don't think OTel should be in the business of saying what is and is not a valid data representation for these semantic conventions. The conventions are describing semantics. There is no loss of semantics by allowing multiple representations, but there are well-documented benefits to the instrumentation modules that can simply observe their inputs as opposed to transforming their inputs to match a specified data representation.

I don't think OTel should be in the business of saying what is and is not a valid data representation for these semantic conventions.

I'm not against having conventions with multiple types like this if we clearly lay it out. I'm not sure I'd go so far as saying we aren't in the business of describing the data representation though. If we are ok with numeric values being reported as strings, for example, all processors of data whether collector or backend would have to check both types or risk missing the data, which is unnecessary complexity.

I might miss something but, what is the reason for not using array as the only available type and in windows put it as a single element array?

Because the meaning of a single array element currently is: After splitting the argument list, this was the only argument.

E.g. when you have a value of ["myapp myarg"] that would actually mean that the original commandline invocation was something like $ myapp\ myarg (note the escaped space).

@open-telemetry/technical-committee I think this issue should be allowed-for-ga. EDIT: I made a PR #1137 that would fix this.

from the spec sig mtg today, re-triaged for allowed-for-ga

Was this page helpful?
0 / 5 - 0 ratings

Related issues

carlosalberto picture carlosalberto  路  4Comments

naseemkullah picture naseemkullah  路  5Comments

mtwo picture mtwo  路  5Comments

iNikem picture iNikem  路  4Comments

pavolloffay picture pavolloffay  路  4Comments