Xamarin-macios: Remove workaround for ThreadPool not working

Created on 23 Sep 2019  路  10Comments  路  Source: xamarin/xamarin-macios

Xamarin on iOS currently has a workaround for the Mono Thread Pool not working:

https://github.com/xamarin/xamarin-macios/pull/5463

That workaround is necessary because the Mono's ThreadPool stops working after the app is suspended for 10 seconds:

https://xamarin.github.io/bugzilla-archives/58/58633/bug.html#c7

The workaround effectively kills all background support from Xamarin.iOS.

There is a repro that shows this problem in action (but might require disabling the incorrect bandaid that was put in place):

https://github.com/xamarin/xamarin-macios/issues/5333

enhancement iOS

All 10 comments

I undid the band-aid and built against mono-2019-06, using the sample provided by https://github.com/xamarin/xamarin-macios/issues/5333. After ~15-20 different tries, I couldn't reproduce the problem. Each time after 10+ seconds, the threadpool threads would kick back up and resume normal execution (Http requests too).

Has anyone else been able to reproduce? If so, what device(s), versions, etc?

@steveisok did you put your app into background mode on the device?

The workaround effectively kills all background support from Xamarin.iOS.

@migueldeicaza could you be more specific on this^ ?

@marek-safar Yeah, I'd swipe up and wait at least 10 seconds (often longer). In addition, I opened other apps, did something, and then went back.

Yes, this means that the NSUrlSession which is an API that can perform downloads in the background, even if the application is stopped does not work. So the hack cancels the downloads.

@steveisok perhaps the heuristics for the OS have changed, and the app is not backgrounded, might be worth checking.

@mandel-macaque can provide more details on the side effects.

There's a similar issue on Android on at least one device (after a while, a background service can't schedule new Tasks to run). I've been looking at this piece of code:
https://github.com/mono/mono/blob/d371432a1030db86d1ed6afb9bc174fb1587191f/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs#L649-L654
My theory is that when the app is in the background we have requests backing up somehow and we end up with insufficient number of calls to ThreadPool.RequestWorkerThread - that icall ends up here:
https://github.com/mono/mono/blob/da02f1330f1dd872f1e723a1f6e35997979df27c/mono/metadata/threadpool-worker-default.c#L340-L349
and the call to work_item_push is critical because that's the code that signals a worker to unpark and to try and process a job from its queue.
If we somehow end up with numOutstandingThreadRequests >= ThreadPoolGlobals.processorCount new jobs wouldn't run.

This EnsureThreadRequested/MarkThreadRequestSatisfied code is pretty critical - CoreCLR had to revert some changes (so they're back to matching referencesource) when they tried to optimize it: https://github.com/dotnet/coreclr/commit/b8dda0cbf7eae770fc685378ad7c542e2468a209

The android device where I saw this has processorCount = 2. I'm trying to repro on desktop by hardcoding 2 for the count in a hacked up corlib.

To followup on my theory that ThreadPool.RequestWorkerThread isn't getting called enough.
I changed this loop to while (true) instead https://github.com/mono/mono/blob/b45d0a108474bafc56d853b74d89ac82800a38f8/mcs/class/referencesource/mscorlib/system/threading/threadpool.cs#L648-L658

and also added some logging to note whether we ever call it when count >= ThreadPoolGlobals.processorCount and:

  1. before the app is backgrounded, count < ThreadPoolGlobals.processorCount
  2. after the app is backgrounded:
    a. I get my warning that count >= ThreadPoolGlobals.processorCount
    b. the task starts running.

So on the particular Android device that I'm watching, it certainly seems like we get into a situation where the native threadpool machinery was flooded with requests (count >= processorCount) but it s workers didn't unpark or start up to process them, and so numOutstandingThreadRequests never got decremented and as a result queueing new Tasks doesn't actually cause the threadpool to wake up and pick up work anymore.

I'm planning to look into a couple of things:

  1. What the heck is going in the unmanaged threadpool code
  2. What exactly is count < processorCount protecting against?
  3. Does CoreCLR do something different with ThreadPool.RequestWorkerThread? In Mono it seems pretty critical to call this periodically to nudge the unmanaged code to actually try to pick up work items. Maybe it doesn't do that in CoreCLR and the threadpool doesn't idle the same way. (Or maybe they just didn't consider the OS suspending the application since desktop OSes probably don't do that the same way).

But maybe we should check in something like:

#if MONO && MOBILE
    while (true) {
#else
   while (count < ThreadPoolGlobals.processorCount) {
#endif

And just see what happens?

Just a brief update. My previous comment is not the direction we ended up taking for the possibly-related Android issue.

The fix we ultimately ended up going with is mono/mono#17927. Once backports of that PR are available for xamarin.ios, we could investigate removing the workaround in XI.

Although, given that we haven't been able to reproduce the issue recently, we will first need to identify a way to induce the bad threadpool behavior on an ios device.

@lambdageek - Git suggests that mono 2020-02 has this (via b6a351c3308). Could we look at removing the work around now?

@chamons the PR also landed in 2019-10 and 2019-12 so you should be able to already try it on your stable bits.

The Xamarin iOS and Mac team would like some customer testing with the this fix.

Here are the instructions:

  1. On latest stable (or preview) Xamarin.iOS / Mac modify your HttpClient creation to the following:
var handler = new NSUrlSessionHandler () { BypassBackgroundSessionCheck = true };
var client = new HttpClient (handler);
  1. Test your networking, in particular by back-grounding and for-grounding your application in various network conditions (wi-fi, cellular, etc).
  2. If you run into issues, change BypassBackgroundSessionCheck back to false (the default) and try to reproduce.
  3. Report any differences here in a comment. In theory, they should behave exactly the same now.

We'd like to change the default in future versions (https://github.com/xamarin/xamarin-macios/pull/8296) and hearing from some customers would be very helpful.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

whitneyschmidt picture whitneyschmidt  路  3Comments

ormaa picture ormaa  路  3Comments

michaelstonis picture michaelstonis  路  3Comments

chamons picture chamons  路  4Comments

orryverducci picture orryverducci  路  4Comments