For the first version of this feature, we shall assume the following
duration is monotonically increasingFor e.g., a client can do the following
time = 1s
Report Span[traceID=1, spanID=2, duration=10s, operationName="someOperation"]
time = 2s
Report Span[traceID=1, spanID=2, duration=20s, operationName="someOperation"]
time = 3s
Report Span[traceID=1, spanID=2, duration=40s, operationName="someOperation", tags=...]
To support this, the backend would need the ability to resolve merge conflicts on spans. For V1, this means simply selecting the longest span.
On Cassandra, the jaeger-collector uses the model spanhash, guaranteeing that all partial spans are stored:
https://github.com/jaegertracing/jaeger/blob/412baf60db217d6f41d7fdca04358069362b29ca/plugin/storage/cassandra/spanstore/dbmodel/converter.go#L45
On ElasticSearch, we use the index api, which performs upserts.
https://github.com/jaegertracing/jaeger/blob/e52ecffbf69a572593504ea9acc4ff65854a3e9a/plugin/storage/es/spanstore/writer.go#L179-L182
To make matters more interesting, note that Jaeger supports storing Zipkin spans that share the same spanID for client and server spans. Jaeger adjusts these spanIDs during query time as seen here:
https://github.com/jaegertracing/jaeger/blob/412baf60db217d6f41d7fdca04358069362b29ca/model/adjuster/span_id_deduper.go#L23-L31
Another point to note is that jaeger-collector makes no guarantee that spans are stored in the order that they are received.
For more context on use cases and prior discussion see https://github.com/jaegertracing/jaeger-client-java/issues/231
I am not sure why we need phase 1-2 approach and restrictions. Could we not just merge all spans into one? Rules would be:
endTime(i)=startTime(i)+duration(i)Enhance Span model to store the lineage of a span (whether it was generated from a Zipkin or Jaeger span), this allows for more robust merging behavior
I don't think lineage is really relevant. Zipkin style spans should be deduped using span.kind = client/server distinction.
I am not sure why we need phase 1-2 approach and restrictions.
We don't really, if we agree on the merge rules, it may be simpler to implement the rules in one go.
We might still need the restriction that the final span has the longest duration, (or some other method of denoting the last span sent by a client for a specific span id)
By the proposal made in https://github.com/jaegertracing/jaeger-client-java/issues/231 where you've already referenced this discussion, the last span would be either denoted by a specific flag such as beeing complete or by having a duration at all. I don't know if that's possible but all in between spans would probably still report 0 duration because while still being running the duration information is redundant after all or isn't it? At least that's what i've understood form that discussion.
I was discussing with @black-adder about the approach described in https://github.com/jaegertracing/jaeger-client-java/issues/231 and want to summarize our findings here so far:
One important requirement is not to change the cassandra schema in order to keep updates of jaeger simple. Therefore I want to split the enhancements necessary in "before persistence" and "after persistence".
To allow for easier distinction of partial and complete spans after persistence, I would suggest to change the assumptions to
The type attribute for the domain model is necessary to enable cassandra and es writers to distinguish between replacing partial jaeger spans logic and zipkin server and client have same spanId logic. The type will not be persisted.
The writer needs to distinguish between both aforementioned logics, cause UPSERTs will only work with a stable identifier (cassandra primary key and es document id). Current writer logic would not allow for this, because
In my approach we enhanced both writers as follows:
if jaegerSpanType == model.JaegerSpanType {
_, err = s.client.Index().Index(indexName).Type(spanType).Id(string(jsonSpan.TraceID) + string(jsonSpan.SpanID)).BodyJson(&elasticSpan).Do(s.ctx)
} else {
_, err = s.client.Index().Index(indexName).Type(spanType).BodyJson(&elasticSpan).Do(s.ctx)
}
The Id(stable identifier) call is necessary to allow for UPSERT, but we don't want to replace zipkin spans with the same spanId, because of client/server spans having the same spanId. Therefore the distinction.
//with supporting incomplete spans the hash needs to be unique for a combination
//of spanId and traceId, so the final representation of the same span
//will overwrite the incomplete version in cassandra
if s.Type == JaegerSpanType {
return enc.Encode(JaegerSpanHash{s.TraceID, s.SpanID})
} else {
return enc.Encode(s)
}
The primary key comprising of spanId, traceId and a hashed representation of the spans remains, we just switch to only hashing traceId and spanId for spans received from jaeger clients, thus having a stable identifier regardless of additonal logs/tags or whatsoever.
Since we don't want to change the schema, we won't have any additional information if a span is partial or not. We could stick to the assumption, that a partial span always has a duration of 0 and a completed span always has a duration > 0.
-Jaeger-UI could distinguish and mark spans as partial, when duration is 0
So we probably need to incorporate changes for 2. and 3.
-> for 2 we should just check that childStartTime is adjusted to be >= parentStartTime, checking for endTimes makes no sense, since they are not known at this point
-> for 3 we should check that childStartTime is between parentStartTime and parentEndTime, but what would we want to adjust it too? Maybe parentDuration/2 is already a good approach?
I can provide a Pull Request with the changes described here (except the ClockAdjuster), so you guys can get a more detailled impression of the changes, I just need to fix one last failing test and can reference it here then :)
Why is it necessary to ensure upsert? I think a simpler solution is to merge spans on read.
I took this approach, because only the last span is relevant (at least this is my assumption) and it is much friendlier for storage not to keep all intermediate representations. Do you think storage size is not an issue?
Hasn't been in the cases we encountered. But in some cases it's not even an option, it's a requirement to save all partial spans. E.g. we have a system that executes long running (on the order of days) workflows, and it persists the intermediate state in the database. The start and finish events of a span may not even be in the same process.
Ok, that makes sense. Would have been good to know that before :)
So with this approach, the question is, how to model a final span, right? In https://github.com/jaegertracing/jaeger/pull/728 @black-adder already suggested to use a flag. In my understanding adding a new finalFlag constant like (https://github.com/jaegertracing/jaeger/blob/master/model/span.go#L27)
would allow to detect final spans without altering the db schema of cassandra, because it's a uint32
https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/cassandra/spanstore/dbmodel/model.go#L28
That removes the necessity for any other incomplete attribute and aligns quite well with the current data model (at least if i understood the purpose of flags correctly). The clients could easily implement setting the flag in their span.finish() methods.
With a isFinal flag my suggested approach with a duration of 0 to distinguish between final and partial spans wouldn't be necessary aswell.
My concern with using the flags field is in changing its semantic meaning and scope. Right now we're using it merely to store the value of spanContext.flags, i.e. the thing that is propagated over the wire. @phal0r 's proposal would change it to also carry information that is only meaningful for the backend, and meaningless in the propagation format.
I don't have strong objections to using the duration field. It does have a few drawbacks:
The downsides of introducing a dedicated field on the Span is mostly the implementation churn where we need to add it to the data model, update DB schemas everywhere, etc, I think internally we went with the approach based on duration - @vprithvi can you confirm?
@yurishkuro @vprithvi
I agree with you. As mentioned before, using 0 as indicator is not necessary and all suggestions I made are more or less a workaround for not changing the data model for cassandra.
Actually I do not favor any implementation or implementation detail. I want to help with my suggestions to push the feature of persisting incomplete/running spans. I care for semantical correctness and do understand, that the flags are not meant for that. So how can we go on from here?
The issue I see in the current concept is, that I don't see any way to detect if a span is complete or still running or am I missing something?
My preference is to add a new field to the model, semantically it's the cleanest way.
Yeah, we now have traversed every property, that is flexible enough for enhancements, but came to the point, that there is no semantically acceptable place for this. So did you guys already put some work internally to this or should we go on here with the discussion?
What we did internally was not aimed at capturing partial spans, so I'd ignore that.
Faced an issue when long span that is not finished is not writing logs at all until its finished.
Is this the issue relevant for this behaviour which is planned to take care of?
The issue is still relevant for our use case for long running jobs, is there any plan for this?
@mustafaakin
Yes, this is relevant for use, too. I had a first running implementation with Jaeger + ES and the node-client, but we upserted the traces, what is in conflict with this approach. So my next simpler iteration will be to introduce a incomplete flag for spans and a changed cassandra schema based on merger logic described in this issue. My rough plan is to finish this until the end of june. I don't know if anybody else is actively working on this.
I would like to help if needed @phal0r
@mustafaakin thx, I will send you a message, once I engage this issue.
I have implemented the changes and checked it with cassandra and elasticsearch storage.
Being able to inspect long running requests before they are complete is a really important functionality for us as well! We have tasks that take anywhere from sub-second up to many hours.
Jaeger/opentracing would be the perfect tool if we could trace progress during the execution of the tasks tasks.
I agree. I am working on finishing this implementation. There is one more change for the current pull request required and the discussion in https://github.com/jaegertracing/jaeger/issues/1119 needs to be resolved aswell, since this feature changes the cassandra schema.
My approach to the long-running process traces issue right now is that the span the starts a new trace should indeed be finished shortly after created. New spans for tasks spawned by the long-running process can be a child of the main span.
Maybe the UI rendering is not that clean this way, but at least the span is persisted.
Storing pending spans would be, of course, better.
Do you guys need a helping hand with closing this issue?
Szia @istvanszoboszlai ,
any help is appreciated. We are working in #979 to integrate this feature. There is some discussion going on about the approach and the impacts. @de-robat and me will look for the next time window to move this forward. So, if you want to help, we could discuss this together.
Hallo @phal0r,
Sorry for the slow response. Yes, I am happy to help, I hope I can. I am not that familiar with cassandra, I am more into elastic. Let me know when we can discuss it further.
MfG :)
Most helpful comment
Ok, that makes sense. Would have been good to know that before :)
So with this approach, the question is, how to model a final span, right? In https://github.com/jaegertracing/jaeger/pull/728 @black-adder already suggested to use a flag. In my understanding adding a new finalFlag constant like (https://github.com/jaegertracing/jaeger/blob/master/model/span.go#L27)
would allow to detect final spans without altering the db schema of cassandra, because it's a uint32
https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/cassandra/spanstore/dbmodel/model.go#L28
That removes the necessity for any other incomplete attribute and aligns quite well with the current data model (at least if i understood the purpose of flags correctly). The clients could easily implement setting the flag in their span.finish() methods.
With a isFinal flag my suggested approach with a duration of 0 to distinguish between final and partial spans wouldn't be necessary aswell.