-XX:(+/-)PreserveFramePointer:
The x86-specific PreserveFramePointer flag selects between using the RBP register as a general purpose register (-XX:-PreserveFramePointer) and using the RBP register to hold the frame pointer of the currently executing method (-XX:+PreserveFramePointer). If the frame pointer is available, external profiling tools (e.g., Linux perf) can construct more accurate stack traces.
Are JVM options -Xjit:perfTool in OpenJ9 and -XX:+PreserveFramePointer in Hotspot similar?
fyi - @zl-wang @mstoodle @DanHeidinga
I don't think so. @ymanton recently answered this on Slack:
https://openj9.slack.com/archives/C8312LCV9/p1540929332005700
As @fjeremic mentioned, the answer is:
It's not documented, but it causes the JIT to produce a `/tmp/perf-<pid>.map` file for JIT compiled code, which the perf command uses to associate samples to symbols. It's not documented in the perf man page either, but there are plenty of pages on the web that describe that stuff
(Reproduced here as I don't know how long slack links are good for)
@mstoodle / @ymanton had looked at the the equivalent of -XX:+PreserveFramePointer a number of years back but I'm not sure what became of that work.
@mstoodle / @ymanton had looked at the the equivalent of -XX:+PreserveFramePointer a number of years back but I'm not sure what became of that work.
I came to the conclusion that it would be very difficult (on all platforms, not just x86). Preserving the frame pointer is a minor x86 problem, our bigger problems (on all platforms) are that:
We have two separate stacks (Java and native) and execution can weave between the two. Any general tool that wants to walk the stack to show you an accurate backtrace or to construct a call graph profile is going to be thwarted by that.
We also probably don't produce ABI-compliant stack frames so even if they were all on one stack these kinds of tools wouldn't know how to walk them.
On top of that profiling tools sample threads at arbitrary points, where as our stacks are really only safely walkable at GC points.
@ymanton Does the code to preserve the frame pointer on x86 still exist? I would expect keeping the framepointer and saving it in the appropriate stackslot would be sufficient for perf to walk the jitted frames.
Providing only the jitted stack frames would give a great deal more info than is currently available and would cover the most interesting frames for users.
At least on x86, part of the problem was using rbp for the vmThread reg. I think @lmaisons looked into this issue at one point.
perf doesn't know anything about OpenJ9/OMR frame shapes at all. It cannot weave between java and native frames. it possibly can deal with single-stack mixed-mode VM. On the other hand, is there a stack-walking JVM TI service available for perf? jprof SCS (Sampled Call Stack) seemed able to do it.
It cannot weave between java and native frames.
I don't think it matters. The user is likely to care about the currently executing frames and as much of the stack as we can give them. Something is better than the nothing we currently display.
For x86 at least, a valid instruction pointer along with being able to chain through basepointer should be sufficient to get a good sample of the stack.
If we limit it initially to x86, is this something doable in a reasonable amount of time?
I don't think it is something that can be done in a reasonable amount of time for all the reasons listed by @ymanton - register conventions are not in place to match what perf expects, GC maps are not reliable outside of GC points, ABI compliance of frames is unknown and likely to need more work, handling stack swap and C code on the java stack for fast helpers is not trivial. From the point of view of supporting perf the best we have managed in the past is to get perf method instruction range information for pc mapping - we have not made the frames walkable by perf itself and the private linkage and other associated engineer mean it is not likely we ever will without major re-engineering.
I don't think it is something that can be done in a reasonable amount of time for all the reasons listed by @ymanton - register conventions are not in place to match what perf expects, GC maps are not reliable outside of GC points, ABI compliance of frames is unknown and likely to need more work, handling stack swap and C code on the java stack for fast helpers is not trivial. From the point of view of supporting perf the best we have managed in the past is to get perf method instruction range information for pc mapping - we have not made the frames walkable by perf itself and the private linkage and other associated engineer mean it is not likely we ever will without major re-engineering.
Unfortunately most tools out there rely on being able to walk through C native frames on the particular platform. OpenJ9 having a separate Java stack and custom linkage conventions really hinders that. We have similar problems on z/OS and various tooling which is built around walking C native stack frames in OS linkage.
I agree this problem will likely never be solved with the separate stack and custom linkage conventions. My best guess is that HotSpot is able to do this as they likely use the system stack for Java frames and make them look like C frames with the same linkage, etc.
I think the original question raised by this issue has been answered - namely the two options are not similar and are not aliasable. Since the discussion seems to have resolved I am going to close this issue. If we want to continue discussing changes to the internal linkage etc to try and support perf that is probably better done in an issue dedicated to that task since we are diverging from the original question here.
Most helpful comment
I came to the conclusion that it would be very difficult (on all platforms, not just x86). Preserving the frame pointer is a minor x86 problem, our bigger problems (on all platforms) are that:
We have two separate stacks (Java and native) and execution can weave between the two. Any general tool that wants to walk the stack to show you an accurate backtrace or to construct a call graph profile is going to be thwarted by that.
We also probably don't produce ABI-compliant stack frames so even if they were all on one stack these kinds of tools wouldn't know how to walk them.
On top of that profiling tools sample threads at arbitrary points, where as our stacks are really only safely walkable at GC points.