Kibana: [APM] Language specific stack frame formatting

Created on 28 Oct 2019  路  40Comments  路  Source: elastic/kibana

Blocked until opbeans apps send classname property for applicable agents

Currently stack frames are displayed identically across all languages/agents. The current format is

<filename> in <function> at line <lineno>

In https://github.com/elastic/kibana/issues/48360 a user has made us aware that the stacktraces for .NET doesn't look like they would expect. I suspect this is also the case for other languages.

@elastic/apm-agent-devs Please chime in if you have inputs/suggestions to how the stacktrace for your language should be presented in the UI.

Examples of how a stack frame looks across different agents

Node.js
image.png

Python
image.png

Java

Go

.NET
image

Solution

This can be fixed fairly easily by making FrameHeading take language into account and then return a language-specific stackframe component
https://github.com/elastic/kibana/blob/d9d8398fb125ade50ac29541aed6af432e843d4b/x-pack/legacy/plugins/apm/public/components/shared/Stacktrace/FrameHeading.tsx#L39-L48

Java

<classname>.<function>(<filename>:<lineno>)

.NET

<classname> in <function> in <filename> at line <lineno> 

Ruby

<filename>:<lineno>:in `<function>'

Go
_keep as-is_

Node.js

    at <classname>.<function> (<filename>:<lineno>:<columnno>)
    at <function> (<filename>:<lineno>:<columnno>)
    at <filename>:<lineno>:<columnno>

RUM
_same as Node.js_

Python
_keep as-is_

apm enhancement good first issue v7.10.0

Most helpful comment

Sounds great, S酶ren! Really getting into details here, but notice that the char before the function is a

`

and the one after is a straight '.

(Really struggled with the formatting in this comment)

All 40 comments

.NET

image

What we do here is that we put the fully qualified class name into the filename - which I know is not good, so we should do something about that :)

2 reasons we did this:

  • We don't always have a file name, which is currently a required field, but we always have a class name.
  • For .NET it'd be very strange to not show the classname, every tool shows that.

For .NET something like this would feel natural (that's what in #48360 would be expected with some additional things)

<fully qualified classname> in <function> in <filename> at line <lineno> 

Currently there is no field to store the fully qualified class name. Some suggested to use module - I'd suggest defining a new field - maybe call it class, because module has a different meaning in every language. E.g. a .NET or a Java class is different from a module.

I'm afraid this change would also mean that having only the filename as required should be changed - maybe something like either classname (or whatever we decide on) or filename could be required?

+1 on that from the Java side. Also +1 on creating a new class field

The most native way of rendering Java stack frames would be

<full class name>.<function>(<filename>:<lineno>)

Example:

org.example.MyClass.method(MyClass.java:42)

But what Greg proposed would also work in Java.

But what Greg proposed would also work in Java.

I'm not a TypeScript expert, but by looking at the code in Kibana and doing what @sqren suggested:

making FrameHeading take language into account and then return a language-specific stackframe component

I think it'd be easy to implement plus best for our users to return something that feels really natural for every language.

If that's true I think we should try to return <full class name>.<function>(<filename>:<lineno>) for Java.

@felixbarny @gregkalapos Thanks both! I think having separate formatting per language should be do-able. So doing something different for java than for .NET is fine.

Sounds like all we need is to agree whether to add a class/classname field. @elastic/apm-server WDYT?

Btw. I updated the issue description to reflect the proposed format for each agent

Ruby is fine as is. Example from actual errror:

/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/set.rb:338:in `each_key'
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/set.rb:338:in `each'
/usr/local/Homebrew/Library/Homebrew/cmd/upgrade.rb:263:in `block in broken_dependents'
/usr/local/Homebrew/Library/Homebrew/cache_store.rb:17:in `use'
/usr/local/Homebrew/Library/Homebrew/cmd/upgrade.rb:262:in `broken_dependents'

Note use of _in_, but swapped order of function and line number.

@mikker Would you prefer this format for Ruby?

<filename>:<lineno>:in `<function>`

@sqren That matches the Ruby stacktrace format exactly so that'd be great, if @mikker also agrees.

Sounds great, S酶ren! Really getting into details here, but notice that the char before the function is a

`

and the one after is a straight '.

(Really struggled with the formatting in this comment)

Pinging @elastic/apm-ui (Team:apm)

I don't see an issue with introducing an additional field classname and require to either send filename or classname.

For RUM we offer to set an exclude_from_grouping and a library_pattern regex, that are matched against the filename. Would the data that are sent by the RUM agent also change, and should we therefore try to match one of these or both patterns also against classname if available or would the current logic still apply to match against filename?

but notice that the char before the function
@mikker

馃槷That's a funky syntax!

Would the data that are sent by the RUM agent also change
@simitt

I don't think so. I think only java and .NET agents will start using the classname field. But @jahtalab or @vigneshshanmugam can answer this better than I.

Not for RUM, We will stick to the existing format since it matches the error format.

Node.js stack traces look like one of:

    at <classname>.<function> (<filename>:<lineno>:<columnno>)
    at <function> (<filename>:<lineno>:<columnno>)
    at <filename>:<lineno>:<columnno>

@Qard Cool, so if the stackframe has a classname property it should use the first format. Else if it has a function property it should use the second one. And if none of the above it should default to the last one?

@simitt @felixbarny @gregkalapos I'm not seeing the classname in Elasticsearch yet. Is this because agent have not started to send it, or because the APM Server has not added support for it?

@gregkalapos In general there's no errors from opbeans-dotnet when running apm-integration-testing. As I recall you were looking into this at some point?

Oh sorry, I didn't even create an issue in the APM Server, completely lost track of this.

Created https://github.com/elastic/apm-server/issues/3006 now.
Should this go into 7.6? The issue doesn't have any milestone or version label.

@simitt No worries. Thanks for creating that issue!

This came up as a user request but has not been prioritized for a particular version. If we can fit it into 7.6 that's great else 7.7 is also fine.

In general there's no errors from opbeans-dotnet when running apm-integration-testing. As I recall you were looking into this at some point?

Oh, sorry, doing it now! https://github.com/elastic/opbeans-dotnet/issues/22

This came up as a user request but has not been prioritized for a particular version. If we can fit it into 7.6 that's great else 7.7 is also fine.

I +1 this, we have 1 user request for .NET, but if this is too late for 7.6 I'd say we should not worry about it and 7.7 is also fine.

APM-Server PR if up, targeting 7.6 https://github.com/elastic/apm-server/pull/3096

Awesome, thanks @simitt. I'll leave the "blocked" label until we have at least one agent implementation of classname. Since FF is 14th of January the UI changes might be pushed to 7.7.

Awesome, thanks @simitt. I'll leave the "blocked" label until we have at least one agent implementation of classname. Since FF is 14th of January the UI changes might be pushed to 7.7.

Opened https://github.com/elastic/apm-agent-dotnet/issues/677 for .NET. I feel it'd be too late to add this to 7.6 - we already have too many not-finished issues there -, so realistically I'd tackle this in the 7.7 timeframe.

I have implemented it for the Java agent: https://github.com/elastic/apm-agent-java/pull/984

When connected to an APM Server >= 7..6.0, it will send classname instead of filename

@felixbarny AFAICT, this will change the grouping key of already existing error groups, something that might be considered a breaking change.

You'd still get the same key if you send both the filename and the class name:

https://github.com/elastic/apm-server/blob/5f7b1f44d8799c4d33cce32a4b7faa0ce455645d/model/error/event.go#L391

(not sure if the precedence here is by design or by coincidence, probably worth tying it down in a test if that hasn't happened already)

not sure if the precedence here is by design or by coincidence

It is by design, to avoid creating a new grouping key when the same information is uploaded as before the changes.

@felixbarny AFAICT, this will change the grouping key of already existing error groups, something that might be considered a breaking change.

You'd still get the same key if you send both the filename and the class name:

Oh, that's good to know. However, I'm a bit worried about the increase in storage as adding things to a stack trace element gets multiplied by the number of frames. When can we get rid of the filename? Only in a major version? One option would be to send both the filename and the classname for errors but for only the classname for spans. That would preserve the grouping key but not increase storage for spans. It would introduce some inconsistency, however.

When can we get rid of the filename?

That is not planned. For some languages filename perfectly makes sense. The requirement was to _add_ a new field classname not to generally replace filename with it.

When sending the exact same information in classname as you sent before in filename and leaving filename empty, the grouping key does not change.

I have to take back my last comment, Did some quick checking across browsers and the format would be quite similar to how Node.js proposal with some differences.

    at <function> (<filename>:<lineno>:<columnno>)
    at <classname>.<function> (<filename>:<lineno>:<columnno>)
    at <classname> (<filename>:<lineno>:<columnno>)
    at <classname> (<filename>:<lineno>) // columno & lineno is not always present

/cc @simitt @sqren Will it be a breaking change for RUM?

@vigneshshanmugam the main question related to RUM agent sending classname is about sourcemapping. Would the classname be minified and needed to be taken into account when applying sourcemaps? (I don't see anything related to classname in the sourcemap library we use, but also didn't investigate any further, as it looked like RUM is not adapting to use classname).

And generally, as pointed out in the discussions above, sending classname not only additionally to what has been sent so far, but instead of a filename, or changing the value of the filename would lead to a different grouping key. Agents should probably align if this is considered a breaking change or not.

Ok, I'm convinced it's better to send both. Due to compression, the size difference shouldn't be too big and the proposed Java stack trace format actually does contain both the filename and the classname:

    at <classname>.<function>(<filename>:<lineno>)

@simitt Thanks for the detailed explanation, Agree with you on the classname part, We can ignore the classname as it is not a property we would need to look for in the sourcemaps and it wont be available as well.

@vigneshshanmugam then using classname for RUM events would be no different than using it for any other agent.

@sqren can we schedule this to 7.8? Java already implemented this and I also opened the PR for .NET.

@gregkalapos 7.8 is unfortunately not possible since feature freeze was last week. We've recently settled on the roadmap for 7.9 but I'll take this back to the team and see if we can squeeze it in.

Hi @gregkalapos I am looking at this issue since it's planned for 7.10, I still see that the PR (https://github.com/elastic/apm-agent-dotnet/pull/848) hasn't been merged do you have plans to work on that for this version?

@cauemarcondes yes! The only missing part in that PR was the version - I needed to know which version supports this feature; I'll add the missing version check with version 7.10 and merge.

This means if the agent sees an APM Server with version 7.10 (or newer), it'll send all fields needed, if it's older, then it won't.

@mikker

Sounds great, S酶ren! Really getting into details here, but notice that the char before the function is a ` and the one after is a straight '.

I think that's just a thing Ruby does.

Correct. The purpose is to look more like curly quotes, for better or for worse.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cafuego picture cafuego  路  3Comments

MaartenUreel picture MaartenUreel  路  3Comments

celesteking picture celesteking  路  3Comments

stacey-gammon picture stacey-gammon  路  3Comments

bhavyarm picture bhavyarm  路  3Comments