Jaeger: Allow de-duplicating Span.Process, agent, and collector tags

Created on 6 Sep 2019  路  17Comments  路  Source: jaegertracing/jaeger

Problem - what in Jaeger blocks you from solving the requirement?

When using --jaeger-tags in the Agent, and the same tags as JAEGER_TAGS on a client such as the Java client, the tags are being duplicated:

image

This is probably the relevant code:
https://github.com/jaegertracing/jaeger/blob/fb5505005a21f007792dedbcd2ad49484d1d587e/cmd/agent/app/reporter/grpc/reporter.go#L87-L91

Proposal - what do you suggest to solve the problem or improve the existing situation?

Either:

  • Allow agents to override the client's process tag, or
  • Prevent the agent from adding a process tag to a span's process tag if a tag with the same name exists

Any open questions to address

  • Decide which of the proposals to implement
good first issue hacktoberfest

Most helpful comment

I can take this issue :D

All 17 comments

Aha, good catch!

I'd vote for the second option (prevent agent from adding tag if it already exists), because I can't think of a scenario where agent would have a more accurate value for a tag than a client.

P.S: This can also be marked "good first issue"!

The counter argument is that you want the agent to enforce an official/authoritative value and prevent the client from being able to override it.

Possibly we need a mechanism to enable the agent to know which action to take when a duplicate is found.

So:

  1. provide a flag --duplicate-tags, with the possible values client (keep client's value), agent (keep agent's value), and duplicate (the current behavior).
  2. by default, client is used (or duplicate, if we want to be extra careful in case someone depends on the current behavior)

Sounds good as long as we believe all tags are to be treated the same.

Marking this as "good first issue". I'll implement this in a couple of weeks if we don't get volunteers :)

I can take this issue :D

If something comes up @Pothulapati , I鈥檇 be happy to take it too :)

@dm03514 we have plenty of good first issues. How about this one? https://github.com/jaegertracing/jaeger-operator/issues/654

What is the problem with having duplicate tags? What is the business scenario where agent must override tags provided by the client?

The original intention of agent tags was to provide additional dimensions that may not be known to the client, like which zone/cluster the code is running in.

The original intention of agent tags was to provide additional dimensions that may not be known to the client, like which zone/cluster the code is running in.

The usecase is really to ensure the client cannot override/spoof the values associated with those tags, in cases where they may be relied upon for subsequent security/post processing.

Besides @objectiser's case, agents might be automatically added to pods (like, via the operator) and they have no way to know whether the tracer is already being configured with the same information on its own (pod name and namespace, for instance). If the agent and the client report the same information, we end up with the same information being duplicated.

Having duplicate information is unnecessary noise and storage, and it's probably not what the user expects in most of the cases.

I could swear we had this fixed already :-) Any volunteers for this one?

I tried the --collector.tags configuration, but the tag-duplicates are multiplying fast even if you only test it with the jaeger-query.
Without a way to prevent duplicates, by enforcing that client / agent / collector key wins, those tags produce too much noise to use them.

but the tag-duplicates are multiplying fast

Do you have an example of that?

sure:
My test was on kubernetes, just with jaeger-collector and jaeger-query (v1.18.1, without operator).
Only the jaeger-query-agent-sidecar produces traces and the collector writes them to a separate elasticsearch index with this collector-tags

spec:
  containers:
  - args:
    - --collector.tags=testtag=testvalue

The jaeger-query traces showed

  • 2 testtag-labels for the first trace on path /api/services
  • after a page refresh it showed 4 testtag-labels on path /api/services

after some clicking in the ui, I also found

  • for path /api/services/{service}/operations traces with 2 and traces with 4 testtag-labels
  • for path /api/traces traces with 5 and traces with 9 testtag-labels

And all spans of a trace have the same number of testtag-labels.

image

This is really, really odd. We should indeed work on de-duplicating the tags, but there has to be something else going on...

@jpkrohling I can take this issue.

Was this page helpful?
0 / 5 - 0 ratings