In Java SDK implementation, there is a limit and behavior is to discard in FIFO.
The limitations are not part of the spec and behavior when there are more than max is not defined.
Seems like there鈥檚 a couple of options:
The third option is the most similar to OpenTracing imo. I think the second option might be the best because there鈥檚 always going to be some sort of maximum, might as well put it in a well-known place.
I'd have a slight preference for the third option:
If the limits are user-adjustable via API, aren't we mixing different personas? An application developer (who would set the limit) might not know the exporter and ultimatively the backend that uses the data. The backend might have difficulties with 10 or with 10k attributes. It should be a concern of the exporter/backend only.
That's a good point.
3 also seems like the most reasonable choice to me, though I'm not extremely opinionated about this topic
3 is not a good option because the intent for this is to not grow memory unbounded, this is not to protect the exporter or the backend, is to help Application owners to control how much memory their instrumentation uses.
I think the best compromise is 2, the limit is adjustable via the SDK not via the API (so an instrumentation dev, as you pointed does not know the backend), but the Application owner knows the backend in almost all the situations.
2 and 3 are almost identical in a sense that if exporter can configure this SDK setting, than it is the same. Multiple exporters will just need to set the max of options and then do additional trimming on their side. There is no way to prevent exporters to configure SDK as long as this setting is available for end user.
Removed this from the 0.3 milestone as it's not a blocker. We can add this in the 0.4 milestone.
Do we also want to impose/allow to configure some limits on the _size_ of attributes values?
For attribute and label keys, there is discussion about a maximum length (among other things) on #504.
2 and 3 are almost identical in a sense that if exporter can configure this SDK setting, than it is the same. Multiple exporters will just need to set the max of options and then do additional trimming on their side. There is no way to prevent exporters to configure SDK as long as this setting is available for end user.
I would argue that 2 provides a significant advantage by also enabling limiting of events enforceable by the SDK itself, which ensures that one cannot accidentally flood memory with hundreds of thousands of attributes before the trace reaches the exporter.
For that reason alone I would argue for #2. Even if exporters end up configuring it, it also covers the don't-cause-memory-issues use case.
As another idea too, you could make the maximum count work similar to ulimits in Linux: the value starts at infinite, you can set it to be lower, but can't set it higher. This ensures that the most attributes that will ever be passed to an exporter is a count that all exporters support.
But reading it through, it sounds like it's just overcomplicating things significantly. Most likely someone can configure their SDK appropriately with complete knowledge of their backends.
I suggest to remove release:required-for-ga label since this is an additive, non-breaking change.
+1 on making this release:after-ga .
From the issue triage mtg today, i'm changing the label to release:after-ga since it looks like from the comments this can be punted.
@tigrannajaryan
It depends on the outcome of this discussion if this is an additive, non-breaking change.
If some users are used to recording 10k events on a span and after updating the SDK suddenly 9k of them are dropped because a "sanity limit" of 1k is introduced to the SDK (to limit memory usage in case of usage errors as @bogdandrutu pointed out, for example), this would be a breaking change in my opinion.
If the spec would clearly state, that at least n number of attributes, links and events (each) are supported to be recorded and exported, this would already give users some kind of guarantee and consumers know what they will have to expect at least.
Having a user-configurable limit to go beyond (or above) the sanity limit imposed by the SDK would be additive and non-breaking.
Would setting some relatively arbitrary limit be acceptable, in that case? I do agree that having a sane limit will save some real production headaches, and loss of telemetry (e.g. span attributes) is more desirable than an app crashing due to OOM.
Based on what I've seen 128 would be a good threshold (i've never seen someone go beyond that), but it's very anecdotal.
@toumorokoshi Exactly. I have no idea on what number can be deemed safe for "all" use cases since I fear we know our users not well enough, however. I've seen requests on our Gitter that people intend to use spans for representing long-running actions that span over multiple hours (days?) and I could expect a higher number of events on these. We should probably aim high to not prevent "valid" (who's to decide what's valid anyway) edge cases and only have it as a fallback for error conditions to prevent production crashes. If the limit we decide on turns out to be too low for certain use cases we haven't thought of, we can always go higher or allow overriding by users in a truly non-breaking manner.
@tigrannajaryan @carlosalberto @SergeyKanzhelev since you voted for moving this out of GA - what do you think?
since you voted for moving this out of GA - what do you think?
@arminru I am certainly not against having this capability and believe it is useful but we need to make some sacrifices to cut the scope and deliver the GA.
If some users are used to recording 10k events on a span and after updating the SDK suddenly 9k of them are dropped because a "sanity limit" of 1k is introduced to the SDK (to limit memory usage in case of usage errors as @bogdandrutu pointed out, for example), this would be a breaking change in my opinion.
Some behavioral changes are likely inevitable when we release new versions. Formally, one can call this a breaking change but it is not the sort of the breaking change I think we should worry the most. If we can avoid hard breaking changes that make it impossible to recompile your app with a new version of OpenTelemetry because function signatures are different I think that will likely be good enough.
If we want to absolutely guarantee no changes in the behavior we can set the default to infinity when we introduce the Span data limiting and trimming capability.
[UPDATE] Just to make it clear: I don't advocate for weak compatibility guarantees, but I think we need to strike the right balance between being careful and being able to postpone functionality to after GA. It is very difficult to be completely certain that we will not need to change the behavior after the GA. If we set the absolute certainty of non-breaking behavioral changes after GA as the criteria for the GA then we will slow down a lot and I don't think we can meet our GA timelines.
I would say if we can't come to a fast consensus, then postponing makes sense.
Although I think this could be summed up into two changes to the spec:
And to @arminru's point,we just choose a number that's well beyond 99.99% of use cases. Don't want to throw numbers out without data, but 1000 seems well above the needed amount.
If that would be acceptable, I'm happy to file a PR.
If that would be acceptable, I'm happy to file a PR.
Sounds good, please do it.
Most helpful comment
I'd have a slight preference for the third option:
If the limits are user-adjustable via API, aren't we mixing different personas? An application developer (who would set the limit) might not know the exporter and ultimatively the backend that uses the data. The backend might have difficulties with 10 or with 10k attributes. It should be a concern of the exporter/backend only.