Related: https://github.com/elastic/kibana/issues/26544 and https://github.com/elastic/apm-agent-rum-js/issues/56
Transaction names are defined in the APM agent. Often these are picked up from frameworks but sometimes the user must define a pattern for these themselves. If they do this incorrectly every url will be send up as a unique transaction group which can cause an explosion in the number of transaction groups displayed by the UI.
Calculate number of transaction groups (cardinality of transaction.name
)
The following terms agg will return the number of transaction groups per service. For some customers we've seen this being above 10 million. This makes the UI very inaccurate since we only show the top 200 transactions groups:
GET apm-*-transaction*/_search?terminate_after=10000
```json
{
"size": 0,
"aggs": {
"services": {
"terms": {
"field": "service.name",
"size": 10
},
"aggs": {
"distinct_names": {
"cardinality": {
"field": "transaction.name"
}
}
}
}
}
}
**Suggested solution**
Show a warning in the UI if `sum_other_doc_count` or `doc_count_error_upper_bound` is above 0 (or perhaps above some other threshold).
*Example:*
Given the following terms agg:
GET apm--transaction/_search
```json
{
"size": 0,
"query": {
"term": {
"service.name": "some-service-with-many-transactions"
}
},
"aggs": {
"transactionsGroups": {
"terms": {
"field": "transaction.name",
"size": 200
}
}
}
}
The response from ES could look something like:
{
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"transactionsGroups" : {
"doc_count_error_upper_bound" : 20433,
"sum_other_doc_count" : 20816053,
"buckets" : [
// ...
]
}
}
}
In the above case the number of unaccounted transactions are above 20 million.
Pinging @elastic/apm-ui (Team:apm)
I have a couple of questions;
Do we expect this to be an issue for all agents or only a few?
This is something we've seen a lot with the RUM agent. It could also happen with other agents but I think it's much more rare.
Is there already specific documentation on configuration options that will resolve this issue for the user and where can I find it (in case we want to link to it)?
That's a good question. Not that I know of. @jahtalab @vigneshshanmugam or @bmorelli25 might know.
RUM agent has an option to set pageLoadTransactionName
that would help user configure this only for page load transaction alone.
However, there are also other ways to fix the transaction name for both soft and hard navigations which I have commented here - https://stackoverflow.com/a/60703633/3588136
@vigneshshanmugam Thanks! Do you think it would be useful with documentation that is more language agnostic and more focused on the purpose of transaction.name
(grouping of similar urls by patterns) and how it should be limited to 200 (default max number of transaction groups displayed by the ui). And how this limit can be increased (configurable in kibana).
Thanks for the feedback.
I've created a quick draft PR with a proposed design implementation for how to show the callout. https://github.com/elastic/kibana/pull/67610
I too think it's worth investigating whether we can create a single agent-agnostic documentation article that will help users debug their issues, which we can easily link to from the app.
Do you think it would be useful with documentation that is more language agnostic and more focused on the purpose of transaction.name
Totally, a language agnostic documentation would definitely be useful in this context and help understand the issue. We could probably in the UI detect the agent name and link to the relevant language docs and potential solutions to fix it.
And how this limit can be increased (configurable in kibana).
Huge +1. Do we currently have a limit and would be cause any perf issue ? May be can we have an upper bound here in the UI?
We could probably in the UI detect the agent name and link to the relevant language docs and potential solutions to fix it.
This is typically where we've found that it's better to have a single documentation article to link to and then allow the user to find the solution that fits their application. Perhaps @bmorelli25 can weigh in here on the best approach?
This sounds good to me. I'll add a language-agnostic section to the Troubleshoot common problems documentation that we can link to from the APM app. From there, we can provide links to any relevant Agent docs.
I've opened https://github.com/elastic/kibana/issues/67691 to track the docs.
@sqren
This makes the UI very inaccurate since we only show the top 200 transactions groups:
and how it should be limited to 200 (default max number of transaction groups displayed by the ui).
In the docs, we list the default as 100. Just want to make sure 200 is correct before I update it.
EDIT: I think it's 100?
https://github.com/elastic/kibana/blob/402018856eea66a88785256d1a603a011795613a/x-pack/plugins/apm/server/index.ts#L30
Sorry, I wrote that from memory. You are right that it is 100... sort of. As usual the reality is a bit more complex than one could hope: I went digging in the code and found that the configurable limit of 100 had been changed to a hardcoded limit of 10.000. This was a mistake, and will be fixed together with the warning improvement.
The limit will be made configurable again but perhaps increased to 500. I'll let you know when that happens so we can fix the docs.
Most helpful comment
This sounds good to me. I'll add a language-agnostic section to the Troubleshoot common problems documentation that we can link to from the APM app. From there, we can provide links to any relevant Agent docs.
I've opened https://github.com/elastic/kibana/issues/67691 to track the docs.