Kibana: [APM] Implement nest level expand/collapse toggle for each span row

Created on 11 Jul 2018  ·  16Comments  ·  Source: elastic/kibana

APM UI – Distributed tracing – Timeline visualization meta issue https://github.com/elastic/kibana/issues/20553


Summary

With distributed tracing we want to show the relations between each parent and child span in the visualization, so that users can get a clear overview of how the spans are related.

Designs

00 Timeline enhancements.png

The overall design consists of adding a toggle to each span row that indents with every parent/child relationship that's found by looking at parent.id and id in the trace document.

Toggle

Kapture 2019-02-28 at 14.48.23.gif

The toggle will feature expand and collapse functionality so all spans nested below will either expand or collapse depending on the button state. This will be indicated by an arrow pointing down if expanded, right if collapsed.

A summary number of direct child spans will be displayed next to the arrow icon. Direct child spans are only the spans that are 1 level below the parent, so not including children of child spans.

As the whole row itself will open detail flyouts for the span, the click area for toggling the nested spans will only appear on the actual toggle. Showing a blue highlight on hover.

Collapsed spans

Since we're aiming at condensing the entire visualization as much as possible, we're taking means to collapse consecutive spans that are faster than 95% of the entire trace duration https://github.com/elastic/kibana/issues/20659~~

Descoped for initial implementation.

Relation lines

The line that follow vertically down each span are called relation lines. Each span will carry a line depending on the nest level indentation.

00 Timeline enhancements.png

Upon hovering a single span, the relation line will turn darker and highlight all the related spans on the same indentation level (see example below – mysqldb:select (7) is currently hovered).

apm test-plan-7.10.0 timeline enhancement v7.10.0

Most helpful comment

++ for this one: currently looking at Transactions with 100+ spans. It will be very useful to be able to collapse levels to get a high level overview of top level spans before I decide to drill down into details.

All 16 comments

Pinging @elastic/apm-ui

In order for the UI to still perform well when there are lots of spans (10k+), it should make heavy use of lazy loading. We can totally do this after the initial implementation. But we should at least keep this in mind so that the next iteration is not a total rewrite.

  • Initially, only spans and transactions, which took longer than 5% of the time of the root transaction, should be loaded.
  • Then load their parent recursively, until the current parent_id has already been loaded. Make sure to load all parent_ids in batch for each iteration.
  • Now the tree can be rendered
  • Whenever a user expands a node, only load its direct children
  • The child span count only counts the direct children and can be quieried like SELECT count(*) FROM span, transaction where parent_id = $id
  • Only load the span and transaction context lazy when the user clicks on the span to see the detail information.

That serves both performance and shows the user which spans contributed most to the latency.

@felixbarny Thanks for writing up the spec for this 👍

I hope that helps. If it's too confusing I can elaborate or we can set up a meeting.

Then load their parent recursively, until the current parent_id has already been loaded. Make sure to load all parent_ids in batch for each iteration.

Couldn't this end up being a lot of ES queries?

I was thinking about just getting the top level transactions/spans by doing a bucket aggregation on parent_id, and sorting by @timestamp or offset/start.

That way I end up with just the top level item for each. Haven't tried it out in practice, but sounds like less overhead compared to recursively calling ES.

Btw. A meeting sounds like a good idea. Let's set something up for next week.

I guess you would very rarely have to actually do recursive calls because if a child span takes longer than 5% of the root transaction duration, it's parents most likely do as well. So you have already loaded them and don't need to fetch them again.

Updated description with new screenshots from the latest designs

Deferred for 6.6

Notes from UI weekly: I'll task myself with updating the screens with examples where we don't have the single line span row design implemented. So this becomes a little clearer on how it should look in the current design.

Does that mean we have discarded the idea of single-line spans rows? If so, why? I think it would greatly help to analyze bigger traces.

@felixbarny We haven't discarded it, but the above task is not dependent on the single span row design being implemented.

I've updated the visuals to feature the current span row size and format. Added a GIF that demos the toggle effect. I've descoped the "collapse spans faster than 95% of entire trace duration" from this issue (not sure it was meant to be a part of the implementation for this, but I've made it clear it is).

++ for this one: currently looking at Transactions with 100+ spans. It will be very useful to be able to collapse levels to get a high level overview of top level spans before I decide to drill down into details.

++ for this one. Most important feature for large stack/span levels.

+1 absolutely important feature for microservices

Was this page helpful?
0 / 5 - 0 ratings