Xamarin-macios: xamarin_get_frame_length causes serious performance issues

Created on 14 Feb 2018  路  19Comments  路  Source: xamarin/xamarin-macios

Steps to Reproduce

When you have many base class calls with a stret (and possibly other scenarios), it appears Xamarin.Mac uses xamarin_get_frame_length which is now taking up most of the processing time for something that would otherwise go very fast. This causes a very large performance issue when overriding things like NSView.Frame, NSView.SetFrameSize, etc as they can be called a very large number of times by our code and also framework code.

This is a huge performance degradation from using MonoMac which we just switched from.

We have a large native application with embedded mono, Xamarin.Mac, and libxammac.dylib.

I can try to provide a sample if necessary.

Expected Behavior

Calling the base Frame or SetFrameSize should not have such a large overhead. The frame size should probably be cached so it does not have to be computed every time.

Actual Behavior

Should be at least as fast as MonoMac.

Environment

Visual Studio Professional 2017 for Mac
Version 7.3.3 (build 12)
Installation UUID: d9050f0b-87f7-4e38-9d83-c01eaf2f82fc
Runtime:
    Mono 5.4.1.7 (2017-06/e66d9abbb27) (64-bit)
    GTK+ 2.24.23 (Raleigh theme)

    Package version: 504010007

NuGet
Version: 4.3.1.4445

.NET Core
Runtime: /usr/local/share/dotnet/dotnet
Runtime Versions:
    2.0.5
    2.0.0
    2.0.0-preview1-002111-00
    2.0.0-beta-001791-00
    1.1.1
    1.1.0
    1.0.4
SDK: /usr/local/share/dotnet/sdk/2.1.4/Sdks
SDK Versions:
    2.1.4
    2.0.0
    2.0.0-preview1-005977
    2.0.0-preview1-005645
    1.0.1
    1.0.0-preview2-1-003177
MSBuild SDKs: /Library/Frameworks/Mono.framework/Versions/5.4.1/lib/mono/msbuild/15.0/bin/Sdks

Xamarin.Profiler
Version: 1.6.0
Location: /Applications/Xamarin Profiler.app/Contents/MacOS/Xamarin Profiler

Apple Developer Tools
Xcode 9.2 (13772)
Build 9C40b

Xamarin.iOS
Version: 11.6.1.4 (Visual Studio Professional)
Hash: db807ec9
Branch: xcode9.2
Build date: 2018-01-10 16:45:48-0500

Xamarin.Android
Not Installed

Xamarin.Mac
Version: 4.0.0.216 (Visual Studio Professional)

Xamarin Inspector
Version: 1.4.0
Hash: b3f92f9
Branch: master
Build date: Fri, 19 Jan 2018 22:00:34 GMT
Client compatibility: 1

Build Information
Release ID: 703030012
Git revision: b07492f1e48be596bad92dc4b7a3bc2d128ed0f9
Build date: 2018-01-30 13:15:55-05
Xamarin addins: 7c8f967d67207118dd99a1d0cc9c228045b30c5f
Build lane: monodevelop-lion-d15-5

Operating System
Mac OS X 10.13.3
Darwin 17.4.0 Darwin Kernel Version 17.4.0
    Sun Dec 17 09:19:54 PST 2017
    root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64

Enabled user installed addins
Eto.Forms Support Addin 2.4.0.0
AddinMaker 1.4.1
RhinoCommon Plugin Support 7.3.3.0
Internet of Things (IoT) development (Preview) 7.1
enhancement macOS

Most helpful comment

I tried to use NSCache diff but it actually slowed the performance, i.e. the computation is not slow enough to cover the caching cost.

Iterating code doing an alloc, init and release for 30 seconds (higher number are better) gave me:

Using the cache:
2540691
2519835
2564116
2559761
2585994

Existing code:
3806456
3787268
3872508
3945135
3830501

That's a best case scenario (for caching) since all 3 values are cached for all iterations.

I'm closing this since the original test case is already fixed (in large part because of the Snippet removal) and the issue/fix is likely not caching, at least inside xamarin_get_frame_length.

If you have another test case that shows a significant slowdown we'll be happy to look at it and see what can be done.

All 19 comments

Hi @cwensley, thanks for the bug report.

Yes a test case would really help us confirm the issue and then investigate quickly. Could you provide it?

Interesting, maybe we should use NSCache there - so we don't fill up memory with all variations. Still we need measurements (so a test case) first to see if (and how much) caching helps.
c.c. @rolfbjarne

Try enabling the static registrar (add --registrar:static to the additional mmp arguments in the project's Mac Build options) and see if that helps.

Hi @rolfbjarne, thanks for the suggestion but unfortunately it's not possible as we are embedding Xamarin.Mac into our own native application. We also need to support having plugins which wouldn't be able to be statically linked afaik. Unless there's a way to do that that I'm not aware of?

This is being done for Visual Studio for Mac which also uses plugins and also have a special launcher. So it's not the _common_ way to build an XM application but you should be able to reap some benefits from using the static registrar.

@spouliot oh wow, that sounds fantastic. I'll have to look into that.

As for this particular issue, I am quite embarrassed to say that it appears to be due to our custom launcher not actually calling xamarin_initialize_dynamic_runtime and xamarin_initialize properly as libxammac.dylib was moved. After getting those to be called correctly this issue has completely disappeared.

I really appreciate the help guys, and sorry for making noise!

Ok, I think I need to go home. It's been a long day. This issue is actually still happening, I was looking at the wrong thing. ); I'll try to get a repro together, but I can't reproduce it with a "standard" Xamarin.Mac application so it'll take me a bit to do that.

Ok, I've finally narrowed this down to a concrete example from our large codebase. The following code demonstrates the problem. Note that I know that 1000 items in an NSPopUpButton is unreasonable, but the performance problem spans various controls and callbacks which takes up about 80% of the launch time of our app.

```c#
var popup = new NSPopUpButton();

for (int i = 0; i < 1000; i++)
{
var item = new NSMenuItem("Some string");
popup.Menu.AddItem(item);
}
```

With my tests, Xamarin.Mac Classic takes 0.34 seconds for the above, whereas Xamarin.Mac unified it takes 2.4 seconds. This is about 7x slowdown. It also appears exponential, for example changing 1000 to 2000 will take 8 seconds. I've done some analysis in Instruments and found that the performance problem lies mainly with the use of protocol_getMethodDescription and class_copyProtocolList, shown here:

screen shot 2018-02-19 at 2 23 17 pm

Here is an example project that demonstrates this (look in Main.cs):
XamMacUnifiedPerformance.zip

This is slow because after every AddItem we fetch the entire array of items: https://github.com/xamarin/xamarin-macios/blob/2a964030a88efc35acfa987778da7950bb32d1f3/src/appkit.cs#L7651

and since the array increases in size after every AddItem, this turns out to be exponential.

That said, you can avoid the call to xamarin_dyn_objc_msgSend (and thus the call to xamarin_get_frame_length) by passing --marshal-objectivec-exceptions=disable as an additional mmp argument in the project's Mac Build options (the downside of this option is that if there are any Objective-C exceptions in your app, it may crash, or otherwise behave randomly wrong. If there are no Objective-C exceptions nothing changes). This might solve your original problem (because I doubt you're really adding thousands of entries to an NSMenu, and the exponential problem should be limited to NSMenu).

@rolfbjarne, thanks for the information! That makes sense with the exponential problem thanks for pointing that out. However, this is still a problem for us as that was just one example of xamarin_get_frame_length being called (it appears to be called from many places). Also, since XamMac classic also copies the array each time, I would imagine that would still be a general issue given the very big difference in performance between the two.

you can avoid the call to xamarin_dyn_objc_msgSend (and thus the call to xamarin_get_frame_length) by passing --marshal-objectivec-exceptions=disable as an additional mmp argument

This sounds excellent, since we already do trap objective c exceptions in our native app! We are not using mmp as we have Xamarin.Mac embedded in our native app, is there a way to set this option directly through the native api?

@cwensley note that XamMac (classic) != MonoMac.
The former has a lot (even most) of Xamarin.Mac features/fixes but remains 32bits only.
The later is even older and development/updates stopped the XamMac (classic) started.

note that XamMac (classic) != MonoMac.

Fair enough, hence is why my performance tests (and the example I posted) is between XamMac (classic) and Xamarin.Mac (unified). We want to get off of using MonoMac since we understand it is ancient (and has many problems) in comparison.

ok, I thought your baseline was MonoMac :)
the array creation _cost_ should be nearly identical then.

As for --marshal-objectivec-exceptions=disable your best bet is to build a _regular_ XM app (using that argument) and see the generated ObjC code (.m files). You'll see something like

xamarin_marshal_managed_exception_mode = MarshalManagedExceptionModeDisabled;

among other things that could interest you. Your own embedding code needs to do the same.

ok, I thought your baseline was MonoMac :)

Right, sorry I made that a little confusing. MonoMac is our baseline as that's what we're switching from, but I don't expect it to be identical in performance (though we are hoping for better).

I didn't realize that building a regular XM app generates .m files, I'll take a look at that next time, thanks!!

xamarin_marshal_managed_exception_mode = MarshalManagedExceptionModeDisabled;

Awesome, this is exactly what I was looking for - I'll give it a go. I really appreciate your help!

@spouliot Just to confirm, using MarshalManagedExceptionModeDisabled has solved our problem at this time. I still think it'd be worth investigating how to make get_method_description work faster with classes that have many(any?) protocols as it'd be nice to be able to make use of the exception wrapping.

Thanks everyone for the help!

You're right, we should look into xamarin_get_frame_length to see if we can speed it up (maybe store results in a hash table instead of calculating them again and again?).

Taking off the need-info label, since we have enough information here.

@cwensley thanks for the confirmation! I'm glad it solved the issue in your application.

@rolfbjarne yep, that goes back to my original NSCache comment ;-)

With the latest enhancements this code path is not used anymore when adding items to NSMenu.

I do have a patch that adds caching but I'll need to create another test case to determine how much it helps (vs the memory it takes).

I tried to use NSCache diff but it actually slowed the performance, i.e. the computation is not slow enough to cover the caching cost.

Iterating code doing an alloc, init and release for 30 seconds (higher number are better) gave me:

Using the cache:
2540691
2519835
2564116
2559761
2585994

Existing code:
3806456
3787268
3872508
3945135
3830501

That's a best case scenario (for caching) since all 3 values are cached for all iterations.

I'm closing this since the original test case is already fixed (in large part because of the Snippet removal) and the issue/fix is likely not caching, at least inside xamarin_get_frame_length.

If you have another test case that shows a significant slowdown we'll be happy to look at it and see what can be done.

Was this page helpful?
0 / 5 - 0 ratings